cse p501 – compilers

50
CSE P501 – Compilers Parsing Context Free Grammars (CFG) Ambiguous Grammars Next Spring 2014 Jim Hogg - UW - CSE - P501 C-1

Upload: nairi

Post on 24-Feb-2016

87 views

Category:

Documents


0 download

DESCRIPTION

CSE P501 – Compilers. Parsing Context Free Grammars (CFG) Ambiguous Grammars Next. Parsing. ‘Middle End’. Back End. Target. Source. Front End. chars. IR. IR. Scan. Select Instructions. Optimize. tokens. IR. Allocate Registers. Parse. IR. AST. Emit. Convert. IR. IR. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSE P501 – Compilers

CSE P501 – Compilers

Parsing

Context Free Grammars (CFG)

Ambiguous Grammars

Next

Spring 2014 Jim Hogg - UW - CSE - P501 C-1

Page 2: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-2

Source TargetFront End Back End

Scanchars

tokens

AST

IR

AST = Abstract Syntax TreeIR = Intermediate Representation

‘Middle End’

Optimize

Select Instructions

Parse

Convert

Allocate Registers

Emit

Machine Code

IR

IR

IR

IR

IR

Parsing

Page 3: CSE P501 – Compilers

Valid Tokens != Valid Program

Spring 2014 Jim Hogg - UW - CSE - P501 C-3

So a MiniJava Scanner would happily accept the following program:

int ; = true { while ( x < true * if { or 123 ) goto count_99

We rely on a MiniJava Parser to reject this kind of gibberish

MiniJava includes the following tokens (among many others):

class int [ ( . true < this ) + * ; while = if id ilit ! / new {

But how do we specify what makes a valid MiniJava program?

Page 4: CSE P501 – Compilers

What is Parsing?

Analogous to parsing an English sentence Analyze words into subject, verb, object, etc

How to parse a program? Analyze tokens into language constructs:

assignment, if-clause, function-call, while-loop, etc

Spring 2014 Jim Hogg - UW - CSE - P501 C-4

Page 5: CSE P501 – Compilers

Parsing – so what’s the problem?

1. The set of valid programs is infinite2. The set of invalid programs is infinite

Q: How to specify all valid programs, succinctly?

A: Define a Grammar More specifically, a Context-Free Grammar (CFG)

Spring 2014 Jim Hogg - UW - CSE - P501 C-5

Page 6: CSE P501 – Compilers

Grammar for the Hokum Language1. Prog Stm ; Prog | Stm2. Stm AsStm | IfStm 3. AsStm Var = Exp4. IfStm if Exp then AsStm5. VorC Var | Const6. Exp VorC | VorC + VorC7. Var [a-z]8. Const [0-9]

Spring 2014 Jim Hogg - UW - CSE - P501 C-6

• Context-Free Grammar ~ CFG ~ Grammar ~ Backus-Naur Form ~ BNF

• Productions, or Rules• Terminals & Non-Terminals; Start (Symbol)• Multiple languages present in the description

Context-Free Grammars (CFG)

Page 7: CSE P501 – Compilers

Example Hokum Programs

Prog Stm ; Prog | StmStm AsStm | IfStm AsStm Var = ExpIfStm if Exp then AsStmVorC Var | ConstExp VorC | VorC + VorCVar [a-z]Const [0-9]

Spring 2014 Jim Hogg - UW - CSE - P501 C-7

a = 1; b = a + 4;z = 1;if b + 3 then z = 2

a = x < 20b = a + 4 + 5 ;z = 1if (a == 33) z < 2 ;

Hokum BNF Grammar

But how do we know which programs are legal or illegal, in Hokum?

Legal

Illegal

Page 8: CSE P501 – Compilers

DerivationProg => Stm ; Prog=> AsStm ; Prog=> Var = Exp ; Prog=> a = Exp ; Prog=> a = VorC ; Prog=> a = Const ; Prog=> a = 1 ; Prog=> a = 1 ; Stm=> a = 1 ; IfStm=> a = 1 ; if Exp then AsStm=> a = 1 ; if VorC + VorC then AsStm=> a = 1 ; if Var + VorC then AsStm

Spring 2014 Jim Hogg - UW - CSE - P501 C-8

=> a = 1 ; if a + VorC then AsStm=> a = 1 ; if a + Const then AsStm=> a = 1 ; if a + 1 then AsStm=> a = 1 ; if a + 1 then Var = Exp=> a = 1 ; if a + 1 then b = Exp=> a = 1 ; if a + 1 then b = VorC=> a = 1 ; if a + 1 then b = Const=> a = 1 ; if a + 1 then b = 2

=> versus Leftmost, rightmost, middlemost Sentential Form & Sentence What is a Context-Sensitive

Grammar?

Prog Stm ; Prog | StmStm AsStm | IfStm AsStm Var = ExpIfStm if Exp then AsStmVorC Var | ConstExp VorC | VorC + VorCVar [a-z]Const [0-9]

Page 9: CSE P501 – Compilers

Parse TreeProg

Stm

1

Spring 2014

; Prog

AsStm

= Exp

Var

a VorC

Const

Stm

IfStm

thenExp

if AsStm

+ VorCVorC

Var

a 1

Const

2

= Exp

Var

b VorC

Const

Prog Stm ; Prog | StmStm AsStm | IfStm AsStm Var = ExpIfStm if Exp then AsStmVorC Var | ConstExp VorC | VorC + VorCVar [a-z]Const [0-9]

Jim Hogg - UW - CSE - P501 C-9

Page 10: CSE P501 – Compilers

Junk Nodes in the Parse TreeProg

Stm

1

Spring 2014

; Prog

AsStm

= Exp

Var

a VorC

Const

Stm

IfStm

thenExp

if AsStm

+ VorCVorC

Var

a 1

Const

2

= Exp

Var

b VorC

Const

Jim Hogg - UW - CSE - P501 C-10

Page 11: CSE P501 – Compilers

AST (Abstract Syntax Tree)

Prog

Spring 2014

=

Var:a Const:1

IfStm

+

Var:a Const:1

=

Var:b Const:2

Jim Hogg - UW - CSE - P501 C-11

Page 12: CSE P501 – Compilers

Why Not Just RegEx?

Spring 2014 Jim Hogg - UW - CSE - P501 C-12

v = [a-z] // variableo = + | - | | // operator

v ( o v )* // derives a + b c ok, but now add ( )

(? v (o v )? )* // derives (a + b) c// but also gibberish like: a + b) c (

Try to invent a regex description for arithmetic expressions: single-character variable names; operators + - and [Note: red

denotes terminals, below]

Almost every programming language includes such balanced pairs: ( ), { }, begin end. Conclusion: regex won’t work.

More generally, regex correspond to DFAs. They can only ‘count’ pairs up to a finite limit.

Page 13: CSE P501 – Compilers

Context-Sensitive Grammar? All compiler work uses Context-Free Grammars, or CFGs

Why so-called? Alternatively: What is a non-context-free grammar? (ie, a Context-Sensitive

Grammar)

Suppose production B CFG: we can replace B by , no matter what eg: B =>

CSG: we can replace B by only in certain contexts. Ie, only when B is preceded and/or followed by certain strings

eg: c B d

Spring 2014 Jim Hogg - UW - CSE - P501 C-13

Language Kind ProductionsRegular B and B CContext-Free B Context-Sensitive B ' '

Page 14: CSE P501 – Compilers

Example CSG

1. S a b c 2. | a S B c3. c B W B4. W B W X5. W X B X6. B X B c7. b B b b

Spring 2014 Jim Hogg - UW - CSE - P501 C-14

The following CSG generates the language an bn cn for n >= 1

Note: CSGs will not be discussed further, nor examined, as part of P501

Page 15: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-15

Parsing

The syntax of most programming languages can be specified by a Context-Free Grammar or CFG

Parsing = "How to fill the gap between Start Symbol and Sentence"

L(G) = the language generated by G = the set of sentences generated by G

Parsing: Given G and a sentence w in L(G ), construct the derivation, (parse tree) for w in some order

As we parse, do something useful at each node in the tree

Page 16: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-16

"in some order"

Top-down Start with the root Traverse the parse tree depth-first; scan tokens, Left-

to-right; create a Leftmost derivation LL(k)

Bottom-up Start at leaves and build up to the root; scan tokens,

Left-to-right; create a Rightmost derivation (in reverse) LR(k) and subsets: LALR(k) and SLR(k)

Page 17: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-17

"do something useful at each node"

Perform some semantic action:

Construct nodes of full parse tree (rare)

Construct abstract syntax tree (common)

Construct linear, lower-level representation like assembler code

Generate target code on the fly 1-pass compiler not common in production compilers: poor code quality

Page 18: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-18

Context-Free Grammars – Formal Description

A grammar G is a tuple <N, T, P, S> where

N a finite set of Non-terminal symbols

T a finite set of Terminal symbols

P a finite set of Productions A subset of N × (N T) *

S the start symbol, a distinguished element of N If not specified otherwise, this is taken as the Non-

Terminal on the LHS of the first production

Page 19: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-19

Standard Notations

a, b, c element of T A, B, C element of N w, x, y, z elements of T* X, Y, Z element of N T , , elements of (N T)*

(A, ) P => A

Page 20: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-20

Derivation Relations (1)

if B P then B => simply affirms G is context-free

A =>* denotes there is a chain of zero-or-more

productions, starting with A, that generates transitive closure

Page 21: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-21

Derivation Relations (2)

if B P then w B =>lm w derives leftmost prefix of A is all terminals (by construction)

if B P then B w =>rm w derives rightmost prefix of A may include terminals and non-terminals

We will only be interested in leftmost and rightmost derivations – not random orderings

Page 22: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-22

Languages

All the sentences (strings of Terminals) I can generate from NonTerminal A:

For A N, L(A) = { w | A =>* w }

All the sentences (strings of Terminal) I can generate from start symbol S:

If S is the start symbol of grammar G, define L(G ) = L(S)

Page 23: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-23

Reduced Grammars

Grammar G is reduced iff for every productionA in P there is some derivation

S =>* x A z => x z =>* xyz ie, no production is useless

Convention: we will use only reduced grammars

Page 24: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-24

Grammar G is unambiguous iff every w in L(G ) has a unique leftmost (or rightmost) derivation

Fact: unique leftmost or unique rightmost implies the other

A grammar lacking this property is ambiguous Note: other grammars that generate the same language

may be unambiguous So, "ambiguous" applies to a grammar – not a language

We need unambiguous grammars for parsing (well mostly: see later)

Ambiguous Grammars

Page 25: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-25

Example: Ambiguous Grammar

Exp Exp Op Exp | DigOp + | - | * | /Dig 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Exercise: show that this is ambiguous

How? Show two different leftmost or rightmost derivations for the same string

Equivalently: show two different parse trees for the same string

Page 26: CSE P501 – Compilers

Spring 2014

2+3*4 – part 1

Give a leftmost derivation of 2+3*4; show the parse tree

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Exp

Exp

Jim Hogg - UW - CSE - P501 C-26

Page 27: CSE P501 – Compilers

Spring 2014

2+3*4 – part 2

Exp

Exp

Exp

+

Exp => Exp + Exp

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Jim Hogg - UW - CSE - P501 C-27

Page 28: CSE P501 – Compilers

Spring 2014

2+3*4 – part 3

Exp

Exp

Exp

ExpExp => Exp + Exp => Dig + Exp

Dig

+

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Jim Hogg - UW - CSE - P501 C-28

Page 29: CSE P501 – Compilers

Spring 2014

2+3*4 – part 4

Exp

Exp

Exp

ExpExp => Exp + Exp => Dig + Exp => 2 + Exp

Dig

2 +

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

C-29Jim Hogg - UW - CSE - P501

Page 30: CSE P501 – Compilers

Spring 2014

2+3*4 – part 5

Exp

Exp

Exp

ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp

Dig

2 +

Exp ::= Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig ::= [0-9]

Exp Exp

*C-30Jim Hogg - UW - CSE - P501

Page 31: CSE P501 – Compilers

Spring 2014

2+3*4 – part 6

Exp

Exp

Exp

Dig

2 +

ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp => 2 + Dig * Exp Exp Ex

p

*

Dig

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

C-31Jim Hogg - UW - CSE - P501

Page 32: CSE P501 – Compilers

Spring 2014

2+3*4 – part 7

ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp => 2 + Dig * Exp => 2 + 3 * Exp

Exp

Exp

Exp

Dig

2 +

Exp Exp

*

Dig

3

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

C-32Jim Hogg - UW - CSE - P501

Page 33: CSE P501 – Compilers

Spring 2014

2+3*4 – part 8

ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp => 2 + Dig * Exp => 2 + 3 * Exp => 2 + 3 * Dig

Exp

Exp

Exp

Dig

2 +

Exp Exp

*

Dig

3

Dig

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

C-33Jim Hogg - UW - CSE - P501

Page 34: CSE P501 – Compilers

2+3*4 – part 9

Exp

Exp

Exp

ExpExp => Exp Op Exp => Dig Op Exp => 2 Op Exp => 2 + Exp => 2 + Exp Op Exp => 2 + Dig Op Exp => 2 + 3 Op Exp => 2 + 3 * Exp => 2 + 3 * Dig => 2 + 3 * 4

Dig

2 +

Exp Exp

Op

Dig

3 *

Dig

4

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Spring 2014 C-34Jim Hogg - UW - CSE - P501

Page 35: CSE P501 – Compilers

2+3*4 – part 10

Give a different leftmost derivation of 2+3*4

Exp

Exp

Exp

Exp => Exp * Exp => Exp + Exp * Exp => 2 + Exp * Exp => 2 + 3 * Exp => 2 + 3 * 4 Dig

4*

Exp Exp

Dig

2 +

Dig

3

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Spring 2014 C-35Jim Hogg - UW - CSE - P501

Page 36: CSE P501 – Compilers

Are derivations equivalent?

+

2 3

4

* +

2

3 4

*

Result = 2 + (3 * 4) = 14Result = (2 + 3) * 4 = 20

Spring 2014 Jim Hogg - UW - CSE - P501 C-36

Page 37: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-37

Another example

Give two different derivations of 5 – 6 – 7

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Page 38: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-38

Another example

Give two different rightmost derivations of 5 – 6 – 7

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Exp => Exp - Exp => Exp - Exp - Exp => Exp - Exp - 7 => Exp - 6 - 7 => 5 - 6 - 7

-

-5

6 7

-

-

5 6

7

result = 6

result = -8

Exp => Exp - Exp => Exp - 7 => Exp - Exp - 7 => Exp - 6 - 7 => 5 - 6 - 7

Page 39: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-39

Another example

Give two different leftmost derivations of 5 – 6 – 7

Exp Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig Dig [0-9]

Exp => Exp - Exp => 5 - Exp => 5 - Exp - Exp => 5 - 6 - Exp => 5 - 6 - 7

-

-5

6 7

-

-

5 6

7

result = 6

result = -8 Exp => Exp - Exp

=> Exp - Exp - Exp => 5 - Exp - Exp => 5 - 6 - Exp => 5 - 6 - 7

Page 40: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-40

What went wrong?

Grammar did not capture precedence or associativity Eg: 2 + (3 * 4) = 14 versus (2 + 3) * 4 =

20 Eg: 5 - (6 - 7) = 6 versus (5 - 6) - 7 = -8

Solution Create a non-terminal for each level of precedence Isolate the corresponding part of the grammar Force the parser to recognize higher precedence

sub-expressions first

Page 41: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-41

Classic Expression Grammar

exp exp + term | exp – term | termterm term * factor | term / factor |

factorfactor int | ( exp )int 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7

E E + T | E – T | TT T * F | T / F | FF ( E ) | DD [0-9]

Page 42: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-42

Derive 2 + 3 * 4 E E + T | E – T | TT T * F | T / F | FF ( E ) | DD [0-9]

E => E + T => E + T * F => E + T * D => E + T * 4 => E + F * 4 => E + D * 4 => E + 3 * 4 => T + 3 * 4 => F + 3 * 4 => D + 3 * 4 => 2 + 3 * 4

+

2

3 4

*

Result = 2 + (3 * 4) = 14

This grammar yields the correct, expected (school algebra) result

Page 43: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-43

Derive 5 - 6 - 7

E => E - T => E - F => E - D => E - 7 => E - T - 7 => E - F - 7 => E - D - 7 => E - 6 - 7 => F - 6 - 7 => D - 6 - 7 => 5 - 6 - 7

E E + T | E – T | TT T * F | T / F | FF ( E ) | DD [0-9]

-

-

5 6

7

result = -8

• This grammar yields the correct, expected (school algebra) result• Note how left-recursive rules yield left-associativity

Page 44: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-44

Classic Example of Ambiguous Grammar

Grammar for conditional statements

stm if ( cond ) stm | if ( cond ) stm else stm

Exercise: show that this is ambiguous How?

“The Dangling Else” - a 'weakness' in C, Pascal, etc

Page 45: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-45

Two Derivations

if (cond) if (cond) stm else stm

stm if ( cond ) stm | if ( cond ) stm else stm

stm

if stm( cond

)

if stm( cond

) else stm

stm

if stm( cond

)

if stm( cond

) else stm

Page 46: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-46

Solving the Dangling Else

Fix the grammar to separate if statements with else clause from those without

Done in Java reference grammar Adds lots of non-terminals

Use some ad-hoc rule in parser “else matches closest unpaired if”

Change the language Only possible if you 'own' the language

Page 47: CSE P501 – Compilers

Resolving Ambiguity with Grammar (1)

Stm IfElse | IfNoElse IfElse if ( Exp ) IfElse else IfElse IfNoElse if ( Exp ) Stm | if ( Exp ) IfElse else IfNoElse

formal, no additional rules beyond syntax

sometimes obscures original grammar

Spring 2014 Jim Hogg - UW - CSE - P501 C-47

Page 48: CSE P501 – Compilers

Resolving Ambiguity with Grammar (2)

If you can (re-)design the language, avoid the problem entirely

IfStm if Exp then Stm end | if Exp then Stm else Stm end

formal, clear, elegant allows sequence of Stms in then and else branches, no

{ } needed extra end required for every if

Spring 2014 Jim Hogg - UW - CSE - P501 C-48

Page 49: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-49

Parser Tools and Operators

Most parser tools cope with ambiguous grammars Earlier productions chosen before later ones Longest match used if there is a choice Makes life simpler if used with discipline But be sure the tool does what you really want

Specify operator precedence & associativity Allows simpler, ambiguous grammar with fewer non-

terminals Used in CUP

Page 50: CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE - P501 C-50

Next LR (bottom-up / shift-reduce) parsing

Reading Continue Cooper&Torczon chapter 3

Note Note: LR parsing is the toughest session in

P501

Next