umbc introduction to compilers cmsc 431 shon vick 01/28/02

28
UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

Post on 22-Dec-2015

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

UMBC

Introduction to Compilers

CMSC 431Shon Vick

01/28/02

Page 2: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

2

UMBCWhat is a compiler?

• Translates source code to target code– Source code is typically a high level

programming language (Java, C++, etc) but does not have to be

– Target code is often a low level language like assembly or machine code but does not have to be

• Can you think of other compilers that you have used – according to this definition?

Page 3: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

3

UMBCOther Compilers

• Javadoc -> HTML• SQL Query output -> Table• Poscript -> PDF• High level description of a circuit -

> machine instructions to fabricate circuit

Page 4: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

The C

om

pila

tion P

roce

ss

Page 5: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

5

UMBCThe analysis Stage

• Broken up into four phases– Lexical Analysis (also called scanning

or tokenization)– Parsing– Semantic Analysis– Intermediate Code Generation

Page 6: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

6

UMBCLexing Example

double d1;double d2;d2 = d1 * 2.0;

double TOK_DOUBLE reserved wordd1 TOK_ID variable name; TOK_PUNCT has value of “;”double TOK_DOUBLE reserved wordd2 TOK_ID variable name ; TOK_PUNCT has value of “;”d2 TOK_ID variable name = TOK_OPER has value of “=”d1 TOK_ID variable name* TOK_OPER has value of “*”2.0 TOK_FLOAT_CONST has value of 2.0; TOK_PUNCT has value of “;”

lexemes

Page 7: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

7

UMBCSyntax and Semantics

• Syntax - the form or structure of the expressions – whether an expression is well formed

• Semantics – the meaning of an expression

Page 8: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

8

UMBCSyntactic Structure

• Syntax almost always expressed using some variant of a notation called a context-free grammar (CFG) or simply grammar– BNF– EBNF

Page 9: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

9

UMBCA CFG has 4 parts

• A set of tokens (lexemes), known as terminal symbols

• A set of non-terminals• A set of rules (productions) where each

production consists of a left-hand side (LHS) and a right-hand side (RHS) The LHS is a non-terminal and the RHS is a sequence of terminals and/or non-terminal symbols.

• A special non-terminal symbol designated as the start symbol

Page 10: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

10

UMBCAn example of BNF

syntax for real numbers

<r> ::= <ds> . <ds><ds> ::= <d> | <d> <ds><d> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7| 8 | 9

< > encloses non-terminal symbols::= 'is' or 'is made up of ' or 'derives' (sometimes denoted with an arrow ->) | or

Page 11: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

11

UMBCExample

• On the example from the previous slide:– What are the tokens?– What are the lexemes?– What are the non terminals?– What are the productions?

Page 12: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

12

UMBCBNF Points

• A non terminal can have more than RHS or an OR can be used

• Lists or sequences are expressed via recursion

• A derivation is just a repeated set of production (rule) applications

• Examples

Page 13: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

13

UMBCExample Grammar

<program> -> <stmts><stmts> -> <stmt> | <stmt> ; <stmts><stmt> -> <var> = <expr><var> -> a | b | c | d<expr> -> <term> + <term> | <term> - <term><term> -> <var> | const

Page 14: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

14

UMBCExample Derivation

<program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const

Page 15: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

15

UMBCParse Trees

• Alternative representation for a derivation

• Example parse tree for the previous example

var expr=

term+

var

b

const

stmts

stmt

terma

Page 16: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

16

UMBCAnother Example

Expression -> Expression + Expression | Expression - Expression | ... Variable | Constant |...Variable -> T_IDENTIFIERConstant -> T_INTCONSTANT | T_DOUBLECONSTANT

Page 17: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

17

UMBCThe Parse

Expression -> Expression + Expression -> Variable + Expression

-> T_IDENTIFIER + Expression -> T_IDENTIFIER + Constant -> T_IDENTIFIER + T_INTCONSTANT

a + 2

Page 18: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

18

UMBCParse Trees

PS -> P | P PS

P -> | '(' PS ')' | '<' PS '>' | '[' PS ']'

What’s the parsetree for this statement ? < [ ] [ < > ] >

Page 19: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

19

UMBCEBNF - Extended BNF

• Like BNF except that• Non-terminals start w/ uppercase • Parens are used for grouping terminals • Braces {} represent zero or more

occurrences (iteration ) • Brackets [] represent an optional construct ,

that is a construct that appears either once or not at all.

Page 20: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

20

UMBCEBNF example

Exp -> Term { ('+' | '-') Term }Term -> Factor { ('*' | '/') Factor }Factor -> '(' Exp ')' | variable | constant

Page 21: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

21

UMBCEBNF/BNF

• EBNF and BNF are equivalent

• How can {} be expressed in BNF?

• How can ( ) be expressed?

• How can [ ] be expressed?

Page 22: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

22

UMBCSemantic Analysis

• The syntactically correct parse tree (or derivation) is checked for semantic errors

• Check for constructs that while valid syntax do not obey the semantic rules of the source language.

• Examples:– Use of an undeclared/un-initialized variable– Function called with improper arguments– Incompatible operands and type mismatches,

Page 23: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

23

UMBCExamples

int i;int j;i = i + 2;

int arr[2], c;c = arr * 10;

Most semantic analysis pertains to the checking of

types.

void fun1(int i);double d;d = fun1(2.1);

Page 24: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

24

UMBC Intermediate Code Generation

• Where the intermediate representation of the source program is created.

• The representation can have a variety of forms, but a common one is called three-address code (TAC)

• Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands.

Page 25: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

25

UMBCExample

_t1 = b * c_t2 = b * d_t3 = _t1 + _t2a = _t3

a = b * c + b * d

Note temps

Page 26: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

26

UMBCAnother Example

_t1 = a > b if _t1 goto L0 _t2 = a - c a = _t2L0: t3 = b * c c = _t3

if (a <= b) a = a - c;c = b * c;

Note TempsSymbolic addresses

Page 27: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

27

UMBCNext Time

• Finish introduction to compilation stages

• Read Aho/Sethi/Ullman Chapter 1

Page 28: UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

28

UMBC

Selected References

• Compilers Principles, Techniques and Tools, Aho, Sethi, and Ullman

• http://www.stanford.edu/class/cs143/