umbc introduction to compilers cmsc 431 shon vick 01/28/02
Post on 22-Dec-2015
233 views
TRANSCRIPT
UMBC
Introduction to Compilers
CMSC 431Shon Vick
01/28/02
2
UMBCWhat is a compiler?
• Translates source code to target code– Source code is typically a high level
programming language (Java, C++, etc) but does not have to be
– Target code is often a low level language like assembly or machine code but does not have to be
• Can you think of other compilers that you have used – according to this definition?
3
UMBCOther Compilers
• Javadoc -> HTML• SQL Query output -> Table• Poscript -> PDF• High level description of a circuit -
> machine instructions to fabricate circuit
The C
om
pila
tion P
roce
ss
5
UMBCThe analysis Stage
• Broken up into four phases– Lexical Analysis (also called scanning
or tokenization)– Parsing– Semantic Analysis– Intermediate Code Generation
6
UMBCLexing Example
double d1;double d2;d2 = d1 * 2.0;
double TOK_DOUBLE reserved wordd1 TOK_ID variable name; TOK_PUNCT has value of “;”double TOK_DOUBLE reserved wordd2 TOK_ID variable name ; TOK_PUNCT has value of “;”d2 TOK_ID variable name = TOK_OPER has value of “=”d1 TOK_ID variable name* TOK_OPER has value of “*”2.0 TOK_FLOAT_CONST has value of 2.0; TOK_PUNCT has value of “;”
lexemes
7
UMBCSyntax and Semantics
• Syntax - the form or structure of the expressions – whether an expression is well formed
• Semantics – the meaning of an expression
8
UMBCSyntactic Structure
• Syntax almost always expressed using some variant of a notation called a context-free grammar (CFG) or simply grammar– BNF– EBNF
9
UMBCA CFG has 4 parts
• A set of tokens (lexemes), known as terminal symbols
• A set of non-terminals• A set of rules (productions) where each
production consists of a left-hand side (LHS) and a right-hand side (RHS) The LHS is a non-terminal and the RHS is a sequence of terminals and/or non-terminal symbols.
• A special non-terminal symbol designated as the start symbol
10
UMBCAn example of BNF
syntax for real numbers
<r> ::= <ds> . <ds><ds> ::= <d> | <d> <ds><d> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7| 8 | 9
< > encloses non-terminal symbols::= 'is' or 'is made up of ' or 'derives' (sometimes denoted with an arrow ->) | or
11
UMBCExample
• On the example from the previous slide:– What are the tokens?– What are the lexemes?– What are the non terminals?– What are the productions?
12
UMBCBNF Points
• A non terminal can have more than RHS or an OR can be used
• Lists or sequences are expressed via recursion
• A derivation is just a repeated set of production (rule) applications
• Examples
13
UMBCExample Grammar
<program> -> <stmts><stmts> -> <stmt> | <stmt> ; <stmts><stmt> -> <var> = <expr><var> -> a | b | c | d<expr> -> <term> + <term> | <term> - <term><term> -> <var> | const
14
UMBCExample Derivation
<program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
15
UMBCParse Trees
• Alternative representation for a derivation
• Example parse tree for the previous example
var expr=
term+
var
b
const
stmts
stmt
terma
16
UMBCAnother Example
Expression -> Expression + Expression | Expression - Expression | ... Variable | Constant |...Variable -> T_IDENTIFIERConstant -> T_INTCONSTANT | T_DOUBLECONSTANT
17
UMBCThe Parse
Expression -> Expression + Expression -> Variable + Expression
-> T_IDENTIFIER + Expression -> T_IDENTIFIER + Constant -> T_IDENTIFIER + T_INTCONSTANT
a + 2
18
UMBCParse Trees
PS -> P | P PS
P -> | '(' PS ')' | '<' PS '>' | '[' PS ']'
What’s the parsetree for this statement ? < [ ] [ < > ] >
19
UMBCEBNF - Extended BNF
• Like BNF except that• Non-terminals start w/ uppercase • Parens are used for grouping terminals • Braces {} represent zero or more
occurrences (iteration ) • Brackets [] represent an optional construct ,
that is a construct that appears either once or not at all.
20
UMBCEBNF example
Exp -> Term { ('+' | '-') Term }Term -> Factor { ('*' | '/') Factor }Factor -> '(' Exp ')' | variable | constant
21
UMBCEBNF/BNF
• EBNF and BNF are equivalent
• How can {} be expressed in BNF?
• How can ( ) be expressed?
• How can [ ] be expressed?
22
UMBCSemantic Analysis
• The syntactically correct parse tree (or derivation) is checked for semantic errors
• Check for constructs that while valid syntax do not obey the semantic rules of the source language.
• Examples:– Use of an undeclared/un-initialized variable– Function called with improper arguments– Incompatible operands and type mismatches,
23
UMBCExamples
int i;int j;i = i + 2;
int arr[2], c;c = arr * 10;
Most semantic analysis pertains to the checking of
types.
void fun1(int i);double d;d = fun1(2.1);
24
UMBC Intermediate Code Generation
• Where the intermediate representation of the source program is created.
• The representation can have a variety of forms, but a common one is called three-address code (TAC)
• Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands.
25
UMBCExample
_t1 = b * c_t2 = b * d_t3 = _t1 + _t2a = _t3
a = b * c + b * d
Note temps
26
UMBCAnother Example
_t1 = a > b if _t1 goto L0 _t2 = a - c a = _t2L0: t3 = b * c c = _t3
if (a <= b) a = a - c;c = b * c;
Note TempsSymbolic addresses
27
UMBCNext Time
• Finish introduction to compilation stages
• Read Aho/Sethi/Ullman Chapter 1
28
UMBC
Selected References
• Compilers Principles, Techniques and Tools, Aho, Sethi, and Ullman
• http://www.stanford.edu/class/cs143/