Transcript
Page 1: CSE P501 – Compilers

CSE P501 – Compilers

Parsing

Context Free Grammars (CFG)

Ambiguous Grammars

Next

Spring 2014 Jim Hogg - UW - CSE - P501 C-1

Page 2: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-2

Source TargetFront End Back End

Scanchars

tokens

AST

IR

AST = Abstract Syntax TreeIR = Intermediate Representation

‘Middle End’

Optimize

Select Instructions

Parse

Convert

Allocate Registers

Emit

Machine Code

IR

IR

IR

IR

IR

Parsing

Page 3: CSE P501 – Compilers

Valid Tokens != Valid Program

Spring 2014 Jim Hogg - UW - CSE - P501 C-3

So a MiniJava Scanner would happily accept the following program:

int ; = true { while ( x < true * if { or 123 ) goto count_99

We rely on a MiniJava Parser to reject this kind of gibberish

MiniJava includes the following tokens (among many others):

class int [ ( . true < this ) + * ; while = if id ilit ! / new {

But how do we specify what makes a valid MiniJava program?

Page 4: CSE P501 – Compilers

What is Parsing?

Analogous to parsing an English sentence Analyze words into subject, verb, object, etc

How to parse a program? Analyze tokens into language constructs:

assignment, if-clause, function-call, while-loop, etc

Spring 2014 Jim Hogg - UW - CSE - P501 C-4

Page 5: CSE P501 – Compilers

Parsing – so what’s the problem?

1. The set of valid programs is infinite2. The set of invalid programs is infinite

Q: How to specify all valid programs, succinctly?

A: Define a Grammar More specifically, a Context-Free Grammar (CFG)

Spring 2014 Jim Hogg - UW - CSE - P501 C-5

Page 6: CSE P501 – Compilers

Grammar for the Hokum Language1. Prog Stm ; Prog | Stm2. Stm AsStm | IfStm 3. AsStm Var = Exp4. IfStm if Exp then AsStm5. VorC Var | Const6. Exp VorC | VorC + VorC7. Var [a-z]8. Const [0-9]

Spring 2014 Jim Hogg - UW - CSE - P501 C-6

• Context-Free Grammar ~ CFG ~ Grammar ~ Backus-Naur Form ~ BNF

• Productions, or Rules• Terminals & Non-Terminals; Start (Symbol)• Multiple languages present in the description

Context-Free Grammars (CFG)

Page 7: CSE P501 – Compilers

Example Hokum Programs

Prog Stm ; Prog | StmStm AsStm | IfStm AsStm Var = ExpIfStm if Exp then AsStmVorC Var | ConstExp VorC | VorC + VorCVar [a-z]Const [0-9]

Spring 2014 Jim Hogg - UW - CSE - P501 C-7

a = 1; b = a + 4;z = 1;if b + 3 then z = 2

a = x < 20b = a + 4 + 5 ;z = 1if (a == 33) z < 2 ;

Hokum BNF Grammar

But how do we know which programs are legal or illegal, in Hokum?

Legal

Illegal

Page 8: CSE P501 – Compilers

DerivationProg => Stm ; Prog=> AsStm ; Prog=> Var = Exp ; Prog=> a = Exp ; Prog=> a = VorC ; Prog=> a = Const ; Prog=> a = 1 ; Prog=> a = 1 ; Stm=> a = 1 ; IfStm=> a = 1 ; if Exp then AsStm=> a = 1 ; if VorC + VorC then AsStm=> a = 1 ; if Var + VorC then AsStm

Spring 2014 Jim Hogg - UW - CSE - P501 C-8

=> a = 1 ; if a + VorC then AsStm=> a = 1 ; if a + Const then AsStm=> a = 1 ; if a + 1 then AsStm=> a = 1 ; if a + 1 then Var = Exp=> a = 1 ; if a + 1 then b = Exp=> a = 1 ; if a + 1 then b = VorC=> a = 1 ; if a + 1 then b = Const=> a = 1 ; if a + 1 then b = 2

=> versus Leftmost, rightmost, middlemost Sentential Form & Sentence What is a Context-Sensitive

Grammar?

Prog Stm ; Prog | StmStm AsStm | IfStm AsStm Var = ExpIfStm if Exp then AsStmVorC Var | ConstExp VorC | VorC + VorCVar [a-z]Const [0-9]

Page 9: CSE P501 – Compilers

Parse TreeProg

Stm

1

Spring 2014

; Prog

AsStm

= Exp

Var

a VorC

Const

Stm

IfStm

thenExp

if AsStm

+ VorCVorC

Var

a 1

Const

2

= Exp

Var

b VorC

Const

Prog Stm ; Prog | StmStm AsStm | IfStm AsStm Var = ExpIfStm if Exp then AsStmVorC Var | ConstExp VorC | VorC + VorCVar [a-z]Const [0-9]

Jim Hogg - UW - CSE - P501 C-9

Page 10: CSE P501 – Compilers

Junk Nodes in the Parse TreeProg

Stm

1

Spring 2014

; Prog

AsStm

= Exp

Var

a VorC

Const

Stm

IfStm

thenExp

if AsStm

+ VorCVorC

Var

a 1

Const

2

= Exp

Var

b VorC

Const

Jim Hogg - UW - CSE - P501 C-10

Page 11: CSE P501 – Compilers

AST (Abstract Syntax Tree)

Prog

Spring 2014

=

Var:a Const:1

IfStm

+

Var:a Const:1

=

Var:b Const:2

Jim Hogg - UW - CSE - P501 C-11

Page 12: CSE P501 – Compilers

Why Not Just RegEx?

Spring 2014 Jim Hogg - UW - CSE - P501 C-12

v = [a-z] // variableo = + | - | | // operator

v ( o v )* // derives a + b c ok, but now add ( )

(? v (o v )? )* // derives (a + b) c// but also gibberish like: a + b) c (

Try to invent a regex description for arithmetic expressions: single-character variable names; operators + - and [Note: red

denotes terminals, below]

Almost every programming language includes such balanced pairs: ( ), { }, begin end. Conclusion: regex won’t work.

More generally, regex correspond to DFAs. They can only ‘count’ pairs up to a finite limit.

Page 13: CSE P501 – Compilers

Context-Sensitive Grammar? All compiler work uses Context-Free Grammars, or CFGs

Why so-called? Alternatively: What is a non-context-free grammar? (ie, a Context-Sensitive

Grammar)

Suppose production B CFG: we can replace B by , no matter what eg: B =>

CSG: we can replace B by only in certain contexts. Ie, only when B is preceded and/or followed by certain strings

eg: c B d

Spring 2014 Jim Hogg - UW - CSE - P501 C-13

Language Kind ProductionsRegular B and B CContext-Free B Context-Sensitive B ' '

Page 14: CSE P501 – Compilers

Example CSG

1. S a b c 2. | a S B c3. c B W B4. W B W X5. W X B X6. B X B c7. b B b b

Spring 2014 Jim Hogg - UW - CSE - P501 C-14

The following CSG generates the language an bn cn for n >= 1

Note: CSGs will not be discussed further, nor examined, as part of P501

Page 15: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-15

Parsing

The syntax of most programming languages can be specified by a Context-Free Grammar or CFG

Parsing = "How to fill the gap between Start Symbol and Sentence"

L(G) = the language generated by G = the set of sentences generated by G

Parsing: Given G and a sentence w in L(G ), construct the derivation, (parse tree) for w in some order

As we parse, do something useful at each node in the tree

Page 16: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-16

"in some order"

Top-down Start with the root Traverse the parse tree depth-first; scan tokens, Left-

to-right; create a Leftmost derivation LL(k)

Bottom-up Start at leaves and build up to the root; scan tokens,

Left-to-right; create a Rightmost derivation (in reverse) LR(k) and subsets: LALR(k) and SLR(k)

Page 17: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-17

"do something useful at each node"

Perform some semantic action:

Construct nodes of full parse tree (rare)

Construct abstract syntax tree (common)

Construct linear, lower-level representation like assembler code

Generate target code on the fly 1-pass compiler not common in production compilers: poor code quality

Page 18: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-18

Context-Free Grammars – Formal Description

A grammar G is a tuple <N, T, P, S> where

N a finite set of Non-terminal symbols

T a finite set of Terminal symbols

P a finite set of Productions A subset of N × (N T) *

S the start symbol, a distinguished element of N If not specified otherwise, this is taken as the Non-

Terminal on the LHS of the first production

Page 19: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-19

Standard Notations

a, b, c element of T A, B, C element of N w, x, y, z elements of T* X, Y, Z element of N T , , elements of (N T)*

(A, ) P => A

Page 20: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-20

Derivation Relations (1)

if B P then B => simply affirms G is context-free

A =>* denotes there is a chain of zero-or-more

productions, starting with A, that generates transitive closure

Page 21: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-21

Derivation Relations (2)

if B P then w B =>lm w derives leftmost prefix of A is all terminals (by construction)

if B P then B w =>rm w derives rightmost prefix of A may include terminals and non-terminals

We will only be interested in leftmost and rightmost derivations – not random orderings

Page 22: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-22

Languages

All the sentences (strings of Terminals) I can generate from NonTerminal A:

For A N, L(A) = { w | A =>* w }

All the sentences (strings of Terminal) I can generate from start symbol S:

If S is the start symbol of grammar G, define L(G ) = L(S)

Page 23: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-23

Reduced Grammars

Grammar G is reduced iff for every productionA in P there is some derivation

S =>* x A z => x z =>* xyz ie, no production is useless

Convention: we will use only reduced grammars

Page 24: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-24

Grammar G is unambiguous iff every w in L(G ) has a unique leftmost (or rightmost) derivation

Fact: unique leftmost or unique rightmost implies the other

A grammar lacking this property is ambiguous Note: other grammars that generate the same language

may be unambiguous So, "ambiguous" applies to a grammar – not a language

We need unambiguous grammars for parsing (well mostly: see later)

Ambiguous Grammars

Page 25: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-25

Example: Ambiguous Grammar

Exp Exp Op Exp | DigOp + | - | * | /Dig 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Exercise: show that this is ambiguous

How? Show two different leftmost or rightmost derivations for the same string

Equivalently: show two different parse trees for the same string

Page 26: CSE P501 – Compilers

Spring 2014

2+3*4 – part 1

Give a leftmost derivation of 2+3*4; show the parse tree

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Exp

Exp

Jim Hogg - UW - CSE - P501 C-26

Page 27: CSE P501 – Compilers

Spring 2014

2+3*4 – part 2

Exp

Exp

Exp

+

Exp => Exp + Exp

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Jim Hogg - UW - CSE - P501 C-27

Page 28: CSE P501 – Compilers

Spring 2014

2+3*4 – part 3

Exp

Exp

Exp

ExpExp => Exp + Exp => Dig + Exp

Dig

+

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Jim Hogg - UW - CSE - P501 C-28

Page 29: CSE P501 – Compilers

Spring 2014

2+3*4 – part 4

Exp

Exp

Exp

ExpExp => Exp + Exp => Dig + Exp => 2 + Exp

Dig

2 +

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

C-29Jim Hogg - UW - CSE - P501

Page 30: CSE P501 – Compilers

Spring 2014

2+3*4 – part 5

Exp

Exp

Exp

ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp

Dig

2 +

Exp ::= Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig ::= [0-9]

Exp Exp

*C-30Jim Hogg - UW - CSE - P501

Page 31: CSE P501 – Compilers

Spring 2014

2+3*4 – part 6

Exp

Exp

Exp

Dig

2 +

ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp => 2 + Dig * Exp Exp Ex

p

*

Dig

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

C-31Jim Hogg - UW - CSE - P501

Page 32: CSE P501 – Compilers

Spring 2014

2+3*4 – part 7

ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp => 2 + Dig * Exp => 2 + 3 * Exp

Exp

Exp

Exp

Dig

2 +

Exp Exp

*

Dig

3

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

C-32Jim Hogg - UW - CSE - P501

Page 33: CSE P501 – Compilers

Spring 2014

2+3*4 – part 8

ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp => 2 + Dig * Exp => 2 + 3 * Exp => 2 + 3 * Dig

Exp

Exp

Exp

Dig

2 +

Exp Exp

*

Dig

3

Dig

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

C-33Jim Hogg - UW - CSE - P501

Page 34: CSE P501 – Compilers

2+3*4 – part 9

Exp

Exp

Exp

ExpExp => Exp Op Exp => Dig Op Exp => 2 Op Exp => 2 + Exp => 2 + Exp Op Exp => 2 + Dig Op Exp => 2 + 3 Op Exp => 2 + 3 * Exp => 2 + 3 * Dig => 2 + 3 * 4

Dig

2 +

Exp Exp

Op

Dig

3 *

Dig

4

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Spring 2014 C-34Jim Hogg - UW - CSE - P501

Page 35: CSE P501 – Compilers

2+3*4 – part 10

Give a different leftmost derivation of 2+3*4

Exp

Exp

Exp

Exp => Exp * Exp => Exp + Exp * Exp => 2 + Exp * Exp => 2 + 3 * Exp => 2 + 3 * 4 Dig

4*

Exp Exp

Dig

2 +

Dig

3

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Spring 2014 C-35Jim Hogg - UW - CSE - P501

Page 36: CSE P501 – Compilers

Are derivations equivalent?

+

2 3

4

* +

2

3 4

*

Result = 2 + (3 * 4) = 14Result = (2 + 3) * 4 = 20

Spring 2014 Jim Hogg - UW - CSE - P501 C-36

Page 37: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-37

Another example

Give two different derivations of 5 – 6 – 7

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Page 38: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-38

Another example

Give two different rightmost derivations of 5 – 6 – 7

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Exp => Exp - Exp => Exp - Exp - Exp => Exp - Exp - 7 => Exp - 6 - 7 => 5 - 6 - 7

-

-5

6 7

-

-

5 6

7

result = 6

result = -8

Exp => Exp - Exp => Exp - 7 => Exp - Exp - 7 => Exp - 6 - 7 => 5 - 6 - 7

Page 39: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-39

Another example

Give two different leftmost derivations of 5 – 6 – 7

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Exp => Exp - Exp => 5 - Exp => 5 - Exp - Exp => 5 - 6 - Exp => 5 - 6 - 7

-

-5

6 7

-

-

5 6

7

result = 6

result = -8 Exp => Exp - Exp

=> Exp - Exp - Exp => 5 - Exp - Exp => 5 - 6 - Exp => 5 - 6 - 7

Page 40: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-40

What went wrong?

Grammar did not capture precedence or associativity Eg: 2 + (3 * 4) = 14 versus (2 + 3) * 4 =

20 Eg: 5 - (6 - 7) = 6 versus (5 - 6) - 7 = -8

Solution Create a non-terminal for each level of precedence Isolate the corresponding part of the grammar Force the parser to recognize higher precedence

sub-expressions first

Page 41: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-41

Classic Expression Grammar

exp exp + term | exp – term | termterm term * factor | term / factor |

factorfactor int | ( exp )int 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7

E E + T | E – T | TT T * F | T / F | FF ( E ) | DD [0-9]

Page 42: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-42

Derive 2 + 3 * 4 E E + T | E – T | TT T * F | T / F | FF ( E ) | DD [0-9]

E => E + T => E + T * F => E + T * D => E + T * 4 => E + F * 4 => E + D * 4 => E + 3 * 4 => T + 3 * 4 => F + 3 * 4 => D + 3 * 4 => 2 + 3 * 4

+

2

3 4

*

Result = 2 + (3 * 4) = 14

This grammar yields the correct, expected (school algebra) result

Page 43: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-43

Derive 5 - 6 - 7

E => E - T => E - F => E - D => E - 7 => E - T - 7 => E - F - 7 => E - D - 7 => E - 6 - 7 => F - 6 - 7 => D - 6 - 7 => 5 - 6 - 7

E E + T | E – T | TT T * F | T / F | FF ( E ) | DD [0-9]

-

-

5 6

7

result = -8

• This grammar yields the correct, expected (school algebra) result• Note how left-recursive rules yield left-associativity

Page 44: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-44

Classic Example of Ambiguous Grammar

Grammar for conditional statements

stm if ( cond ) stm | if ( cond ) stm else stm

Exercise: show that this is ambiguous How?

“The Dangling Else” - a 'weakness' in C, Pascal, etc

Page 45: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-45

Two Derivations

if (cond) if (cond) stm else stm

stm if ( cond ) stm | if ( cond ) stm else stm

stm

if stm( cond

)

if stm( cond

) else stm

stm

if stm( cond

)

if stm( cond

) else stm

Page 46: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-46

Solving the Dangling Else

Fix the grammar to separate if statements with else clause from those without

Done in Java reference grammar Adds lots of non-terminals

Use some ad-hoc rule in parser “else matches closest unpaired if”

Change the language Only possible if you 'own' the language

Page 47: CSE P501 – Compilers

Resolving Ambiguity with Grammar (1)

Stm IfElse | IfNoElse IfElse if ( Exp ) IfElse else IfElse IfNoElse if ( Exp ) Stm | if ( Exp ) IfElse else IfNoElse

formal, no additional rules beyond syntax

sometimes obscures original grammar

Spring 2014 Jim Hogg - UW - CSE - P501 C-47

Page 48: CSE P501 – Compilers

Resolving Ambiguity with Grammar (2)

If you can (re-)design the language, avoid the problem entirely

IfStm if Exp then Stm end | if Exp then Stm else Stm end

formal, clear, elegant allows sequence of Stms in then and else branches, no

{ } needed extra end required for every if

Spring 2014 Jim Hogg - UW - CSE - P501 C-48

Page 49: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-49

Parser Tools and Operators

Most parser tools cope with ambiguous grammars Earlier productions chosen before later ones Longest match used if there is a choice Makes life simpler if used with discipline But be sure the tool does what you really want

Specify operator precedence & associativity Allows simpler, ambiguous grammar with fewer non-

terminals Used in CUP

Page 50: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-50

Next LR (bottom-up / shift-reduce) parsing

Reading Continue Cooper&Torczon chapter 3

Note Note: LR parsing is the toughest session in

P501

Next


Top Related