edan65: compilers, lecture03 context...

52
EDAN65: Compilers, Lecture 03 Context-free grammars, introduction to parsing Görel Hedin Revised: 2017-09-04

Upload: others

Post on 20-Jan-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

EDAN65:Compilers,Lecture 03

Context-free grammars,introductionto parsingGörelHedinRevised:2017-09-04

Page 2: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Courseoverview

Semantic analyzer

Intermediatecode generator

Optimizer

Targetcodegenerator

2

Lexical analyzer(scanner)

Syntactic analyzer(parser)

Regularexpressions

Context-freegrammar

Attributegrammar

machine

runtime system

stack

heap

codeanddata

objects

activationrecords

Interpreter

target code

tokens

Attributed AST

intermediate code

sourcecode (text)

AST(Abstractsyntaxtree)

intermediate code

garbagecollection

Virtualmachine

This lecture

Page 3: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Analyzing programtext

3

sum = sum + k ;

AssignStmt

Exp

Add

Exp Exp

ID EQ ID PLUS ID SEMIprogram text

tokens

parse tree

Page 4: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Recall:Generatingthecompiler:

Semantic analyzer

Lexical analyzer(scanner)

Syntactic analyzer(parser)

Regularexpressions

ScannergeneratorJFlex

Context-freegrammar

ParsergeneratorBeaver

Attributegrammar

Attribute evaluatorgenerator

We will use aparsergeneratorcalled Beaver

4

tokens

text

tree

Page 5: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

5

Context-Free Grammars

Page 6: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Regular ExpressionsvsContext-Free Grammars

6

AnREcan have iteration

ACFGcan also have recursion(itispossible toderive asymbol,e.g.,Stmt,fromitself)

Example REs:WHILE = "while"ID = [a-z][a-z0-9]*LPAR = "("RPAR = ")"PLUS = "+"...

Example CFG:Stmt –> WhileStmtStmt –> AssignStmtWhileStmt –> WHILE LPAR Exp RPAR StmtExp –> IDExp –> Exp PLUS Exp...

Page 7: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

ElementsofaContext-Free Grammar

7

Production rules:X –> s1 s2 … sn

where sk isasymbol(terminalornonterminal)

Nonterminal symbols

Terminalsymbols(tokens)

Startsymbol(one ofthenonterminals,usually theleft-handside of thefirst production)

Example CFG:Stmt –> WhileStmtStmt –> AssignStmtWhileStmt –> WHILE LPAR Exp RPAR StmtAssignStmt –> ID EQ Exp SEMIC…

Page 8: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Shorthand foralternatives

8

Stmt –> WhileStmtStmt –> AssignStmt

Stmt –> WhileStmt | AssignStmt

isequivalent to

Page 9: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Shorthand forrepetition

9

Stmt*

StmtList –> e | Stmt StmtList

isequivalent to

StmtList

where

Page 10: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

ExerciseConstruct agrammar covering this programandsimilar ones:

10

Example program:while (k <= n) {sum = sum + k; k = k+1;}

CFG:Stmt –> WhileStmt | AssignStmt | CompoundStmtWhileStmt –> "while" "(" Exp ")" StmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> ...Exp –> ...LessEq –> ...Add –> ...

(Often,simpletokensarewritten directly astextstrings)

Page 11: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

SolutionConstruct agrammar covering this programandsimilar ones:

11

CFG:Stmt –> WhileStmt | AssignStmt | CompoundStmtWhileStmt –> "while" "(" Exp ")" StmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> "{" Stmt* "}"Exp –> LessEq | Add | ID | INTLessEq –> Exp "<=" ExpAdd –> Exp "+" Exp

Example program:while (k <= n) {sum = sum + k; k = k+1;}

Page 12: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

ParsingUse thegrammar toderive atree foraprogram:

12sum = sum + k ;

StmtExample program:sum = sum + k;

Startsymbol

Page 13: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Parse treeUse thegrammar toderive aparse tree foraprogram:

13sum = sum + k ;

Stmt

AssignStmt

Exp

Add

Exp Exp

Example program:sum = sum + k;

Nonterminalsareinnernodes

Startsymbol

Terminalsareleafs

Aparse tree includes allthetokensasleafs.

Page 14: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Corresponding abstractsyntaxtree(will bediscussed inlaterlecture)

14sum = sum + k ;

AssignStmt

Add

IdExp IdExp

Example program:sum = sum + k;

IdExp

Anabstractsyntaxtree issimilarto aparse tree,but simpler.

Itdoes notinclude allthetokens.

Page 15: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

EBNFvsCanonical Form

15

EBNF:Stmt –> AssignStmt | CompoundStmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> "{" Stmt* "}"Exp –> Add | IDAdd –> Exp "+" Exp

Canonical form:Stmt –> ID "=" Exp ";"Stmt –> "{" Stmts "}"Stmts –> eStmts –> Stmt StmtsExp –> Exp "+" ExpExp –> ID

(Extended)Backus-Naur Form:• Compact,easytoreadandwrite• EBNFhasalternatives,repetition,optionals,parentheses (likeREs)

• Commonnotationforpracticaluse

Canonical form:• Core formalismforCFGs• Useful forproving properties andexplaining algorithms

Page 16: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Realworldexample:TheJavaLanguageSpecification

16

See http://docs.oracle.com/javase/specs/jls/se8/html/index.html• See Chapter 2about theJavagrammar notation.• Lookatsome other chapters to see other syntaxexamples.

CompilationUnit:[PackageDeclaration]{ImportDeclaration}{TypeDeclaration}

PackageDeclaration:{PackageModifier}package Identifier {.Identifier};

PackageModifier:Annotation

Page 17: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

17

Formaldefinitionof CFGs

Page 18: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

FormaldefinitionofCFGs (canonical form)

18

Acontext-free grammar G=(N,T,P,S),whereN– thesetofnonterminalsymbolsT– thesetofterminalsymbolsP– thesetofproduction rules,each withtheform

X–>Y1 Y2 …Ynwhere X∈ N,n≥ 0,andYk∈ N∪ T

S– thestartsymbol(one ofthenonterminals).I.e.,S∈ N

So,theleft-hand side Xofarule isanonterminal.

Andtheright-hand side Y1 Y2 …Yn isasequence of nonterminalsandterminals.

If therhs foraproduction isempty,i.e.,n=0,we writeX–>e

Page 19: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

AgrammarG defines alanguageL(G)

19

A context-free grammar G=(N,T,P,S),whereN– thesetofnonterminalsymbolsT– thesetofterminalsymbolsP– thesetofproduction rules,each withtheform

X–>Y1 Y2 …Ynwhere X∈ N,n≥ 0,andYk∈ N∪ T

S– thestartsymbol(one ofthenonterminals).I.e.,S∈ N

Gdefines alanguage L(G) overthealphabet T

T*isthesetofallpossible sequences ofTsymbols.

L(G)isthesubsetofT*thatcan bederived fromthestartsymbolS,byfollowing theproduction rules P.

Page 20: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Exercise

20

G = (N, T, P, S)

P = {Stmt –> ID "=" Exp ";",Stmt –> "{" Stmts "}" ,Stmts –> e ,Stmts –> Stmt Stmts ,Exp –> Exp "+" Exp ,Exp –> ID

}

N = { }

T = { }

S =

L(G) = {

}

Page 21: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Solution

21

G = (N, T, P, S)

P = {Stmt –> ID "=" Exp ";",Stmt –> "{" Stmts "}" ,Stmts –> e ,Stmts –> Stmt Stmts ,Exp –> Exp "+" Exp ,Exp –> ID

}

N = {Stmt, Exp, Stmts}

T = {ID, "=", "{", "}", ";", "+"}

S = Stmt

L(G) = {"{" "}","{" "{" "}" "}",ID "=" ID ";","{" ID "=" ID ";" "}",ID "=" ID "+" ID ";","{" "{" "}" "{" "}" "}","{" "{" "{" "}" "}" "}","{" ID "=" ID "+" ID ";" "}",ID "=" ID "+" ID "+" ID ";",...

}

Thesequences inL(G)areusually called sentences orstrings

Page 22: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

22

Derivations

Page 23: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Derivationstep

23

If we have asequence ofterminalsandnonterminals,e.g.,

XaYY b

we can replace one ofthenonterminals,applying aproductionrule.Thisiscalled aderivationstep.(Swedish:Härledningssteg)

Supposethere isaproduction

Y–>Xa

andwe apply itforthefirstYinthesequence.We write thederivationstepasfollows:

XaYY b=>XaXaYb

Page 24: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Derivation

24

Aderivation,issimply asequence ofderivationsteps,e.g.:

g0 =>g1 =>…=>gn (n≥0)

where each gi isasequence ofterminalsandnonterminals

Ifthere isaderivationfromg0 togn,we can write thisas

g0 =>*gn

Sothismeans itispossible togetfromthesequence g0 tothesequence gn byfollowing theproduction rules.

Page 25: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Definitionofthelanguage L(G)

25

Recall that:

G=(N,T,P,S)

T* isthesetofallpossible sequences ofT symbols.

L(G)isthesubsetofT*thatcan bederived fromthestartsymbol S,byfollowing theproduction rules P.

Using theconcept ofderivations,we can formally define L(G) asfollows:

L(G)={w∈ T*| S=>*w}

Page 26: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Exercise:Prove thatasentence belongs toalanguage

26

Prove that

INT+INT*INT

Proof (byshowing allthederivationstepsfromthestartsymbolExp):

Exp=>

belongs tothelanguage ofthefollowing grammar:

p1: Exp –>Exp "+" Expp2: Exp –>Exp "*" Expp3: Exp –> INT

Page 27: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Solution:Prove thatasentence belongs toalanguage

27

Prove that

INT+INT*INT

Proof:(byshowing allthederivationstepsfromthestartsymbolExp)

Exp=>p1 Exp "+" Exp=>p3 INT"+"Exp=>p2 INT"+"Exp "*"Exp=>p3 INT"+"INT"*"Exp=>p3 INT"+"INT"*"INT

belongs tothelanguage ofthefollowing grammar:

p1: Exp –>Exp "+" Expp2: Exp –>Exp "*" Expp3: Exp –> INT

Page 28: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Leftmost andrightmost derivations

28

Inaleftmost derivation,theleftmost nonterminalisreplacedineach derivationstep,e.g.,:

Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

LLparsingalgorithms use leftmost derivation.LRparsingalgorithms use rightmost derivation.Willbediscussed inlaterlectures.

Inarightmost derivation,therightmost nonterminalisreplaced ineach derivationstep,e.g.,:

Exp =>Exp "+" Exp =>Exp "+"Exp "*"Exp =>Exp "+"Exp "*"INT=>Exp "+"INT"*"INT=>INT"+"INT"*"INT

Page 29: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Aderivationcorresponds tobuilding aparse tree

29

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Example derivation:

Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

Exercise:build theparse tree(also called derivationtree).

Page 30: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Aderivationcorresponds tobuilding aparse tree

30

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Example derivation:

Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

Parse tree (derivationtree):

Exp

Exp Exp

Exp Exp

"+"

INT"*"

INT INT

Page 31: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

31

Ambiguities

Page 32: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Exercise:Canwe do another derivationofthesamesentence,

thatgivesadifferentparse tree?

32

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Parse tree:

Anotherderivation:

Exp =>

Page 33: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Solution:Canwe do another derivationofthesamesentence,

thatgivesadifferentparse tree?

33

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Parse tree:

Exp

Exp"*"

INT

Exp

Exp Exp"+"

INT INT

Anotherderivation:

Exp =>Exp "*" Exp =>Exp "+"Exp "*"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

Which parse tree would we prefer?

Page 34: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Ambiguous context-freegrammars

34

ACFGisambiguous if asentence inthelanguage can bederived bytwo (ormore)differentparse trees.

ACFGisunambiguous if each sentence inthelanguage canbederived byonly one parse tree.

(Swedish:tvetydig,otvetydig)

Note!There can bemany differentderivationsthat give thesameparse tree.

Page 35: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

HowcanweknowifaCFGisambiguous?

35

Ifwe find anexample of anambiguity,we know thegrammar isambiguous.

There are algorithms fordeciding if aCFGbelongs to certainsubsets of CFGs,e.g.LL,LR,etc.(See laterlectures.)Thesegrammarsare unambiguous.

But inthegeneralcase,theproblemisundecidable:itisnotpossible to construct ageneralalgorithm that decidesambiguity foranarbitrary CFG.

Strategies foreliminating ambiguities,next lecture.

Page 36: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

36

Parsing

Page 37: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Differentparsingalgorithms

37

Ambiguous

Unambiguous

Allcontext-freegrammars

LR

LL

LL:Left-to-rightscanLeftmost derivationBuilds tree top-downSimpleto understand

LR:Left-to-rightscanRightmost derivationBuilds tree bottom-upMore powerful

Page 38: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

LLandLRparsers:main idea

38

...if IDthen ID=ID;ID...

LR(1):decidestobuildAssignafterseeingthefirsttokenfollowingitssubtree.Thetreeisbuiltbottomup.

Id Assign

Id Id

Thetokeniscalled lookahead.LL(k)andLR(k)use k lookahead tokens.

...if IDthen ID=ID;ID...

IfStmt

Id Assign

LL(1):decidestobuildAssignafterseeingthefirsttokenofitssubtree.Thetreeisbuilttopdown.

CompoundStmt

Page 39: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Recursive-descentparsingAwayofprogramminganLL(1)parserbyrecursivemethodcalls

39

Assume aBNFgrammar with exactly one production rule foreach nonterminal.(Can easily begeneralized to EBNF.)

Each production rule RHSiseither1. asequence of token/nonterminal symbols,or2. asetof nonterminal symbolalternatives

Foreach nonterminal,amethod isconstructed.Themethod1. matches tokensandcallsnonterminal methods,or2. callsone of thenonterminal methods – which one depends onthe

lookahead token.

Ifthelookahead tokendoes notmatch,aparsingerror isreported.

A–>B|C|DB–>eCfDC–>...D–>...

Page 40: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

ExampleJavaimplementation:overview

40

statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACE...

class Parser{privateint token; //current lookahead tokenvoid accept(int t){...} //accepttandreadinnext tokenvoid error(Stringstr){...} //generate error messagevoid statement(){...}void assignment (){...}void compoundStmt (){...}...

}

Page 41: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Example:recursivedescentmethods

41

statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACE

class Parser{void statement(){switch(token){case ID:assignment();break;case LBRACE:compoundStmt();break;default:error("Expecting statement,found:"+token);}}void assignment(){accept(ID);accept(ASSIGN);expr();accept(SEMICOLON);}void compoundStmt(){accept(LBRACE);while (token!=RBRACE){statement();}accept(RBRACE);}...}

Page 42: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Example:Parserskeletondetails

42

statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEexpr –>...

class Parser{finalstatic int ID=1,WHILE=2,DO=3,ASSIGN=4,...;privateint token; //current lookahead tokenvoid accept(int t){ //accepttandreadinnext tokenif (token==t){token=nextToken();}else {error("Expected "+t+",but found "+token);}}void error(Stringstr){...} //generate error messageprivateint nextToken(){...}//readnext tokenfromscannervoid statement()......

}

Page 43: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

ArethesegrammarsLL(1)?

43

expr –>name params |name

Whatwouldhappeninarecursive-descentparser?

Could they beLL(2)?LL(k)?

Commonprefix

expr –>expr "+" term Left recursion

Page 44: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Dealingwithcommonprefixoflimitedlength:Locallookahead

44

LL(2)grammar:statement –>assignment |compoundStmt |callStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEcallStmt –> IDLPAR expr RPAR SEMICOLON

voidstatement()...

Page 45: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

45

LL(2)grammar:statement –>assignment |compoundStmt |callStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEcallStmt –> IDLPAR expr RPAR SEMICOLON

voidstatement(){switch(token){caseID:if(lookahead(2) ==ASSIGN){assignment();}else{callStmt();}break;caseLBRACE:compoundStmt();break;default:error("Expectingstatement,found:"+token);}}

Dealingwithcommonprefixoflimitedlength:Locallookahead

Page 46: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Generatingtheparser:

Syntactic analyzer(parser)

Context-freegrammar Parsergenerator

46

tokens

tree

Page 47: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Beaver:anLR-based parsergenerator

ParserinJava

Context-freegrammar,

with semanticactionsinJava

Beaver

47

tokens

tree

Page 48: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Example beaver specification

48

%class "LangParser";%package "lang";...%terminalsLET,IN,END,ASSIGN,MUL,ID,NUMERAL;

%goal program;//Thestartsymbol

//Context-free grammarprogram=exp;exp =factor |exp MULfactor;factor =let |numeral |id;let =LETidASSIGNexp INexp END;numeral =NUMERAL;id=ID;

Lateron,we will extend this specification with semantic actionsto build thesyntaxtree.

Page 49: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

49

RE CFGTypicalAlphabet

characters terminalsymbols(tokens)

Language isasetof ...

strings(charsequences)

sentences(tokensequences)

Used for... tokens parsetreesPower iteration recursionRecognizer DFA DFAwith stack

RegularExpressionsvsContext-FreeGrammars

Page 50: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

50

Grammar Rulepatterns Typeregular X –>aY orX –>a orX –>e 3

contextfree X–> g 2context sensitive a X b –>a g b 1

arbitrary g –>d 0

TheChomskyhierarchy of formalgrammars

a – terminalsymbola, b, g, d – sequences of (terminalornonterminal)symbols

Type(3)⊂ Type (2)⊂ Type(1)⊂ Type(0)

Regular grammarshave thesamepower asregular expressions(tail recursion =iteration).

Type 2and3are of practicaluse incompiler construction.Type 0and1are only of theoretical interest.

Page 51: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Courseoverview

Semantic analyzer

51

Lexical analyzer(scanner)

Syntactic analyzer(parser)

Regularexpressions

Context-freegrammar

Attributegrammar

tokens

sourcecode (text)

AST(Abstractsyntaxtree)

What we have covered:Context-free grammars,derivations,parse treesAmbiguous grammarsIntroduction to parsing,recursive-descent

You can now finishassignment 1

Page 52: EDAN65: Compilers, Lecture03 Context …fileadmin.cs.lth.se/cs/Education/EDAN65/2017/lectures/L...Course overview Semanticanalyzer Intermediate codegenerator Optimizer Target code

Summary questions

52

• Construct aCFGforasimplepartof aprogramming language.• What isanonterminal symbol?Aterminalsymbol?Aproduction?Astartsymbol?Aparse tree?• What isaleft-handside of aproduction?Aright-handside?• Givenagrammar G,what ismeant bythelanguage L(G)?• What isaderivationstep?Aderivation?Aleftmost derivation?Arighmostderivation?• How does aderivationcorrespond to aparse tree?• What does itmean foragrammar to beambiguous?Unambiguous?• Give anexample anambiguous CFG.• What isthedifference between anLLandanLRparser?• What isthedifference between LL(1)andLL(2)?Orbetween LR(1)andLR(2)?• Construct arecursive descent parserforasimplelanguage.• Give typical examples of grammarsthat cannot behandledbyarecursive-descent parser.• Explain why context-free grammarsare more powerful than regularexpressions.• Inwhat senseare context-free grammars"context-free"?