lesson 10 cdt301 – compiler theory, spring 2011 teacher: linus källberg

45
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Upload: griselda-clark

Post on 31-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Lesson 10

CDT301 – Compiler Theory, Spring 2011Teacher: Linus Källberg

Page 2: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

2

Outline

• Flex• Bison• Abstract syntax trees

Page 3: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

FLEX

3

Page 4: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Flex

• Tool for automatic generation of scanners• Open-source version of Lex• Takes regular expressions as input• Outputs a C (or C++) file for the scanner

4

Page 5: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Flex

5

Regexps

mylexer.l

int yylex() …

mylexer.c

Flex C compiler01101000110101010…

mylexer.obj

Page 6: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The input file to Flex

Definitions%%Rules%%User code

6

Page 7: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The definitions section• Macro definitions:

– Specify a letter:letter [A-Za-z]

– Specify a delimiter:delimiter [ ,:;.]

– Specify a digit:digit [0-9]

– Specify an identifier:id letter(letter|digit)*

7

Page 8: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The definitions section

• User code:%{#include <stdio.h>int a_nice_global_variable = 0;int my_favourite_function(void) {return 42;}%}

8

Page 9: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The rules section

• Rule = regexp + C code• Longest matching pattern is used• If two equally long patterns match, the first one in

the file is used• Examples:=|>=?|<(=|>)? { return RELOP; }{id} { return ID; }

9

Page 10: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The regexp language of Flex

? Previous regexp is optional{} Macro expansion (defined in the definitions

section). Matches any character that is not end of

line$ Matches the end of a line^ Matches the beginning of a line[] Matches any enclosed character

10

Page 11: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The [] syntax

• Similar to | but more powerful• Example:

digit [0123456789]is the same as

digit 0|1|2|3|4|5|6|7|8|9• Special characters inside the brackets: – and ^

digit [0-9] letter [A-Za-z]non_digit [^0-9]

11

Page 12: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The user code section

• Only C code valid here• Will be copied unchanged to the

generated C file

12

Page 13: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The generated scanner

• By default, a function called yylex() is defined– Works similar to your GetNextToken() from lab 1– The name can be changed with options

• Some globals are defined as well (can be changed into local variables with options):

yyin The file to read from yytext The matched lexeme (char*) yyleng The length of yytext yylineno Line number of the match

13

Page 14: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The yywrap() function

• Called upon end-of-file• Should be supplied by the user• Suppressed with %option noyywrap

or --noyywrap

14

Page 15: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Scanner states in Flex

• Affects what tokens should be recognized• Example from the language ALF:{ fref 32 DEADC0DE } <- Identifier{ hex_val DEADC0DE } <- Hex

constant

15

Page 16: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Scanner states in Flex

• Declare state:%x READ_HEX

• Use the state to make rules conditional:hex_val { BEGIN(READ_HEX); return HEX_VAL_KW; }[a-zA-Z_][a-zA-Z0-9_]* { return ID; }<READ_HEX>[0-9a-fA-F]+ { BEGIN(INITIAL); return NUM; }

16

Page 17: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Online resources

http://flex.sourceforge.net/manual/index.html

17

Page 18: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

BISON

18

Page 19: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Bison

• Tool for automatic generation of parsers• Open-source alternative to Yacc• Takes an SDT scheme as input• Outputs C (or C++) source code for an LALR

parser• Commonly used together with Flex

19

Page 20: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Bison

20

SDT scheme

myparser.yint parse() …

myparser.c

Bison C compiler01101000110101010…

myparser.obj

Token definitions

myparser.h

Page 21: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The input file to Bison

Definitions%%SDT scheme%%User code

21

Page 22: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Definitions section

• Define tokens• Define operator precedence• Define operator associativity• Define the types of grammar symbol attributes• Write C code between %{ and %}• Issue certain commands to Bison

22

Page 23: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Token definition

• Normal case:%token IDENTIFIER%token WHILE

• Token, precedence, associativity, and type:%left <Operator> RELOP%left <Operator> MINUSOP PLUSOP%right <Operator> NOTOP

• Enables use of ambiguous grammars!23

Page 24: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Defining types

• Just enter the type inside <> before the list of tokens:

%left <Operator> RELOP%left <Operator> MULOP%right <Operator> NOTOP UNOP%token <String> ID STRING

• Or the same for non-terminals:%type <Node> stmnt expr actuals exprs24

Page 25: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The variable yylval• Used by the lexical analyzer to store token attributes• Default type is int• May be given another type(s) using %union:

%union {int Operator;char *String;NODE_TYPE Node;}

• The type (member name) is then used like this:%token <String> ID STRING

25

Page 26: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Code provided by the user

• yyerror(char* msg)– Function called on syntax errors

• yylex()– Function called to get the next token

26

Page 27: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Options to Bison

• Given on the command line or in the grammar file• --defines or %defines: Output a C header file with

definitions useful to a scanner– Tokens (#defines) and the type on yylval

• %error-verbose: More detailed error messages• --name-prefix or %name-prefix: Change the default

“yy” prefix on all names• %define api.pure: Do not use globals• --verbose or %verbose: Write detailed information to

extra output file27

Page 28: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Translation scheme sectiondecl : BASIC_TYPE idents ';'

;

idents : idents ',' ident| ident;

ident : ID;

28

Page 29: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Semantic actions

• Written in C• Executed when the production is used in a

reduction• $$, $1, $2, etc. refer to the attributes of the

grammar symbols– Can be used as regular C variables– $$ refer to the attribute of the head, $1 to the

attribute of the first symbol in the body, etc.E : E '+' T { $$ = $1 + $3; } ;

29

Page 30: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Using ambiguousgrammars in Bison

• Default actions:– Reduce/reduce: choose first rule in file– Shift/reduce: always shift

• With explicit precedence and associativity:– Shift/reduce: Compare prec/ass of rule with

that of lookahead token

30

Page 31: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

The %expect declaration

• To suppress shift/reduce warnings:%expect n

where n is the exact nr of conflicts

31

Page 32: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Contextual precedence

• Same token might have different precedence depending on context:

expr → expr – expr| expr * expr| – expr| id

32

Stack Input

… – expr* expr …

Page 33: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Contextual precedence

• Define dummy token:%left '-'%left '*'%left UMINUS

• Use the %prec modifier:expr → – expr %prec UMINUS

33

Page 34: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Examples of parser configurations

Stack Input Action… if (cond) stmt else … shift

Stack Input Action… expr + expr * … shift

Stack Input Action… expr * expr + … red. expr → expr * expr

Stack Input Action… expr * expr * … red. expr → expr * expr

34

Page 35: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Online resources

http://www.gnu.org/software/bison/manual/html_node/index.html

35

Page 36: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

ABSTRACT SYNTAX TREES

36

Page 37: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Abstract syntax trees

• “AST” or just “syntax tree”

37

E

E E

a

+

E E

b5 *

+

*a

b5

Page 38: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Syntax trees vs. parse trees

Parse trees:• Interior nodes are

nonterminals, leaves are terminals

• Rarely constructed as an explicit data structure

• Represents the concrete syntax

Syntax trees:• Interior nodes are

“operators”, leaves are operands

• Commonly constructed as an explicit data structure

• Represents the abstract syntax

38

Page 39: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Why syntax trees?

• Simplifies subsequent analyses• Independent on the parsing strategy• Makes it easier to add new analysis passes

without having to modify the parser• More compact representation than parse

trees

39

Page 40: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Syntax tree exampleif (a < 1) b = 2 + 3;else { c = d * 4; e(f, 5); }

40

if

< =

a

=

c

call e

f*1 b +

2 3 d 4

null

nullnull

5 null

Page 41: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Exercise (1)

• Draw an abstract syntax tree for the statement

while (i < 100) { x = 2 * x; i = i + 1; }

41

Page 42: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Constructing a syntaxtree in Bison

expr : expr '+' expr { $$ = createOpNode($1, '+' ,$3); }| expr '*' expr { $$ = createOpNode($1, '*' ,$3); }| ID { $$ = createIdNode($1.name); };

42

Page 43: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Constructing a syntaxtree in Bison

stmt : RETURN expr ';' { $$ = mReturn($2, $1); } ;

stmts : stmts stmt { $$ = connectStmts($1, $2); }| { $$ = NULL; };

43

Page 44: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Conclusion

• Flex generates C source code for a scanner given a set of regular expressions

• Bison generates C source code for a bottom-up parser given a syntax-directed translation scheme

• Building syntax trees simplifies subsequent analyses of the program

• Syntax trees can be built in semantic actions44

Page 45: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg

Next time

• Syntax-directed definitions and translation schemes

• Semantic analysis and type analysis

45