compilers - wordpress.com · compilers mrs k.m.sanghavi snjb’s kbj coe, ... a classification for...
Post on 30-Jul-2018
223 Views
Preview:
TRANSCRIPT
INTRODUCTION
BASIC ELEMENTS RECOGNITION
SYNTACTIC UNIT RECOGNITION& MEANING INTERPRETATION
INTERMEDIATE REPRESENTATION
STORAGE ALLOCATION
CODE GENERATION
OPTIMIZATION
GENERAL MODEL OF COMPILER
PHASES OF COMPILER : Database, Tasks, Algorithm
INTRODUCTION
BASIC ELEMENTS RECOGNITION
SYNTACTIC UNIT RECOGNITION& MEANING INTERPRETATION
INTERMEDIATE REPRESENTATION
STORAGE ALLOCATION
CODE GENERATION
OPTIMIZATION
INTRODUCTION
“Compilation” Translation of a program written in a source language into a semantically equivalent program written in a target language.
Input
Compiler
Error messages
Source Program
Target Program
Output
ROLE OF COMPILER
Compiler
Error messages
Target Program
Output
Recognize Certain Strings as basic elements
Recognize Combination of elements as syntactic units and interpret the meaning
Allocate Storage and Assign Locations
Generate Appropriate Object Code
PHASES OF COMPILER
Compiler
Error messages
Target Program
Output
Lexical Analysis
Syntax Analysis
Semantic Analysis
Intermediate Code
Generation
Code Optimization
Code Generation
Phases
Error Handling
Symbol Table
M/c Independent
M/c dependent
Compiler
Error messages
Target Program
Output
RECOGNISING BASIC ELEMENTS
Parsing the Source Program Into Small Constitute Pieces i.e Scan input and Identify Tokens
This is known as Lexical Analysis
This removes White Space, New Line characters, …
The identified tokens (basic elements) are placed into symbol tables which are the used by other phases.
Discover Lexical Errors (e.g. invalid characters, improper identifiers) and Send Tokens to Parser
Compiler
Error messages
Target Program
Output
BASIC TERMINOLOGIES OF LEXICAL ANALYSIS
A classification for a common set of strings Examples Include <Identifier>, <number>, etc.
Tokens
The rules which characterize the set of strings for a token Recall File and OS Wildcards ([A-Z]*.*)
Pattern
Actual sequence of characters that matches pattern and is classified by a token Identifiers: x, count, name, etc…
Lexemes
Compiler
Error messages
Target Program
Output
RECOGNISING BASIC ELEMENTS
For example, the following code might result in the table given below
program foo(input,output); var x:integer; begin
readln(x); writeln(’value read =’,x) end
Lexeme Token Pattern
program program p, r, o, g, r, a, m
foo id(foo) letter followed by seq. of alphanumerics
( leftpar a left parenthesis
input input i,n,p,u,t
, comma comma
output output o,u,t,p,u,t
) rightpar a right parenthesis
; semicolon a semicolon
program foo(input,output);
Compiler
Error messages%
Target Program
Output
RECOGNISING BASIC ELEMENTS
var x : integer; begin
Lexeme Token Pattern
var var v,a,r
x id(x) letter followed by seq. of alphanumerics
: colon a colon
integer integer i,n,t,e,g,e,r
; semicolon a semicolon
begin begin b,e,g,i,n
Compiler
Error messages%
Target Program
Output
RECOGNISING BASIC ELEMENTS
readln(x); writeln(‘value x = ’,x)
Lexeme Token Pattern
readln readln r,e,a,d,l,n
( leftpar a left parenthesis
x id(x) letter followed by seq. of alphanumaerics
) rightpar a right parenthesis
; semicolon a semicolon
writeln writeln w,r,i,t,e,l,n
( leftpar a left parenthesis
Compiler
Error messages%
Target Program
Output
RECOGNISING BASIC ELEMENTS
readln(x); writeln(‘value read = ’,x) end . Lexeme Token Pattern
‘value read =‘ literal(‘va
lue read
=‘)
seq. of characters enclosed in quotes
, comma a comma
x id(x) letter followed by seq. of alphanumaerics
) rightpar a right parenthesis
end end e,n,d
. fullstop a fullstop
Compiler
Error messages
Target Program
Output
RECOGNISING SYNTACTIC UNITS
Recognize the phrases i.e syntax after getting tokens from Lexical Analyzer
This is known as Syntax Analysis
This is associated with construction of an intermediate form
The rules which specify the syntax of a source language is used to recognize the syntactic units.
Discover Syntactical Error and sometime also recover them
Compiler
Error messages
Target Program
Output
RECOGNISING SYNTACTIC UNITS
program foo(input,output); Valid procedure
var x:integer; Valid declaration
begin Valid begin statement readln(x); Valid function
writeln(’value read =’,x) Valid function end Valid end statement
Compiler
Error messages
Target Program
Output
INTEPRETING MEANINGS FROM
SYNTACTIC UNITS
Interpret the meaning of a construct
This is known as Semantic Analysis
This is also associated with construction of an intermediate form with meanings
This includes type checking of statements
Compiler
Error messages
Target Program
Output
INTERMEDIATE REPRESENTATION
Generation of Object Code
This is known as Intermediate Code Generation
It facilitates optimization of object code
Allows logical seperation between m/c dependent and independent phases
Compiler
Error messages
Target Program
Output
INTERMEDIATE REPRESENTATION OF…..
Arithmetic Statements
Parse Tree
Matrix
Non-Arithmetic
Matrix
Non-Executable (As such has no
intermediate form)
Identifier Tables
Compiler
Error messages
Target Program
Output
INTERMEDIATE
REPRESENTATION OF ARITHMETIC STATEMENTS
Rules for converting Arithmetic
statement into parse tree : Any variable is a terminal node of the tree For every operator, a binary tree is constructed with left node as operand1 and right node as operand2.
Compiler
Error messages
Target Program
Output
INTERMEDIATE
REPRESENTATION OF ARITHMETIC STATEMENTS
a = rate * (inital – final) + 2 * rate * (initial – final) is represented in parse tree form as follows :
=
a +
*
rate -
Initial final
*
*
2 rate
-
Initial final
However this method is not
practical
Compiler
Error messages
Target Program
Output
INTERMEDIATE
REPRESENTATION OF ARITHMETIC STATEMENTS
Second way of representing arithmetic
statements in intermediate form is ‘matrix’.
1
2
3
4
5
6
7
Operator Operand1 Operand2
- initial final
* rate 1
* 2 rate
- initial final
* 3 4
+ 2 5
= a 6
Compiler
Error messages
Target Program
Output
INTERMEDIATE
REPRESENTATION OF NON- ARITHMETIC STATEMENTS
Non-arithmetic statements are statements like : • do - while • return • if , if-else • while • goto etc.
These are represented by ‘matrix’
Compiler
Error messages
Target Program
Output
INTERMEDIATE
REPRESENTATION OF ARITHMETIC STATEMENTS
Operator Operand1 Operand2
return X
end
< a B
if 3
1
2
3
4
Compiler
Error messages
Target Program
Output
INTERMEDIATE
REPRESENTATION OF NON- ARITHMETIC STATEMENTS
Non-executable statements are statements like : • declare • dimension • include
These have no intermediate forms, however their information is stored into tables.
Compiler
Error messages
Target Program
Output
STORAGE ALLOCATION
The semantic analysis phase constructs the updated entries of tokens into symbol table for identifiers along with their type, name ,base etc.
The storage allocation routine then scans the symbol table and assigns a location to each identifier.
For e.g start with location 0 ,4 and so on.
Compiler
Error messages
Target Program
Output
STORAGE ALLOCATION
Name Base Type Location
a binary int 0
rate binary Float 2
start binary int 6
initial binary int 8
Compiler
Error messages
Target Program
Output
CODE GENERATION
One scheme to generate code is associating each type of matrix operation with the object code.
The matrix is then scanned and code is generated for each entry using the table
Compiler
Code Production Table
Target Program
Output
CODE GENERATION
Sr.N
o Operator Target Code
1 - LOAD 1,&OPERAND1
SUB 1,&OPERAND2
STORE 1 , M& N
2 * LOAD 1,&OPERAND1
MUL 0,&OPERAND2
STORE 1 , M& N
3 + LOAD 1,&OPERAND1
ADD 1,&OPERAND2
STORE 1 , M& N
4 = LOAD 1, & OPERAND2
STORE 1,&OPERAND1
Compiler
Error messages
Target Program
Output
CODE GENERATION
As shown previously , a = rate * (inital – final) + 2 * rate * (initial – final) is represented in matrix form as follows .The code is then generated using the code production table.
Operator Operand1 Operand2
- initial final
* rate 1
* 2 rate
- initial final
LOAD 1,initial
SUB 1,final
STORE 1 , M1
LOAD 1,rate
MUL 0, M1
STORE 1 , M2
LOAD 1,#2
MUL 0, rate
STORE 1 , M3
LOAD 1, initial
SUB 1, final
STORE 1 , M4
Compiler Target Program
Output
CODE GENERATION
Operator Operand1 Operand2
* 3 4
+ 2 5
= a 6
LOAD 1,M3
MUL 0,M4
STORE 1 , M5
LOAD 1,M2
ADD 1, M5
STORE 1 , M6
LOAD 1,M6
STORE 1 , a
Compiler Target Program
Output
CODE OPTIMIZATION
Is it good to directly generate code from the matrix
• As it may give rise to redundant code as in Line 1 and 4 above
Is the best use of machine done
• Line 12 and 14 shows M4 is not used further s unnecessary store arises
Can the machine code be generated using other techniques.
This gives rise to Optimization
Compiler Target Program
Output
CODE OPTMIZATION
First issue refers to m/c independent optimization
• Optimality of matrix
Second refers to m/c dependent optimization
• Optimality of m/c code
Compiler Target Program
Output
M/C INDEPENDENT CODE OPTIMIZATION
Using Common Subexpression Elimination
• Remove redundant code as in Line 1 and 4 above and use 1 for 4.
Constant Folding : i.e compile time evaluation of operations whose operands are constants
• For e.g x = 2+ 4 will be x = 6
Code Motion
• Moving the code of computations of loops outside
Compiler
Error messages
Target Program
Output
COMMON SUBEXPRESSION
Replace 4 with 1 and delete 4th statement
1
2
3
4
5
6
7
Operator Operand1 Operand2
- initial final
* rate 1
* 2 rate
- initial final
* 3 4
+ 2 5
= a 6
1
Compiler Target Program
Output
CODE MOTION
For e.g
a=1
while(a+3<=10) { cout<<“Hello”;
}
For e.g
a=1
b = a+ 3
while(b<=10) { cout<<“Hello”;
}
Compiler Target Program
Output
M/C DEPENDENT CODE OPTIMIZATION
Using Proper M/c Instructions like instead of MOV a,R
ADD R,1
MOV R,a , we can just use INC a
Making Efficient use of registers instead of temporaries and hence reducing the number of load and stores
Compiler Target Program
Output
GENERAL MODEL OF A COMPILER
1) Lexical
Analysis
2) Syntax
Analysis
3)Interpretation
4) M/c
independent
optimization
5) Storage
Assignment
6) Code
Selection
7) Assembly
and output
Source Code
Uniform
Symbol Table
Matrix
Optimized
Matrix
Assembly
Code
Relocatable
Machine Code
Terminal Table
Reductions
Identifier
Table Literal
Table
Compiler Target Program
Output
GENERAL MODEL OF A COMPILER
Sr.No Database Description
1 Source Code Any Program
2 Uniform Symbol Table Consists of full or partial list of tokens. Created by lexical analysis
3 Terminal Table Permanent table in which list of keywords and special symbols
4 Identifier table Contains all variables in the program and temporary storage and any information. Created by lexical analysis and modified by interpretation
Compiler Target Program
Output
GENERAL MODEL OF A COMPILER
Sr.No Database Description
5 Literal table Contains all constants Created by lexical analysis and referenced by interpretation
6 Reductions Permanent table of decision rules in the form of patterns for matching with Uniform symbol table to discover the syntax
7 Matrix Intermediate form of program created by action routines, optimized and then used for code generation
8 Code Productions Permanent table of definitions.
Compiler Target Program
Output
Phases of Compiler : Lexical Phase
Tasks Parse source program into tokens
Build literal and identifier table
Build Symbol Table
Databases Source Program, Terminal Table , Literal Table, Identifier Table, Uniform Symbol Table
Algorithm I/p string is separated into tokens by break characters. Consecutive non-break characters are accumulated into tokens.
Compiler Target Program
Output
Phases of Compiler : Lexical Phase
Database Description
Terminal Table
Literal Table
Identifier Table
Symbol Table
Symbol Indicator Precedence
Literal Base Scale Precision Other information
Address
Name Data Attributes
Address
Table Index
Compiler Target Program
Output
Phases of Compiler : Lexical Phase : Algorithm
For an identifier it checks if the entry is already in symbol table. If not new entry is made.
If match is found then Analyzer creates symbol table entry as ‘TRM’ , otherwise checks if literal (‘LIT’) or identifier(IDN)
Compares the tokens against the entries in terminal table.
Consecutive non-break characters are accumulated into tokens.
I/p string is separated into tokens
Compiler Target Program
Output
Phases of Compiler : Syntax Phase
Tasks Recognize major constructs of the language
Call appropriate action routine to generate intermediate form
Interpreter for reductions
Databases Uniform Symbol Table, Reduction Table
Algorithm I/p buffer is checked with stack and reductions are performed according to reduction table entry
Compiler Target Program
Output
Phases of Compiler : Syntax Analysis Phase
Database Description
Reduction Table
Syntax Rules : Label: Top of Stack / Action Routine / New Top of Stack / Reduction
Uniform Symbol Table
Table Index
Compiler Target Program
Output
Phases of Compiler : Syntax Analysis
Phase : Conventions
Label Top of Stack
Action Routine
Reduction
New Top of Stack
Target Program
Output
Phases of Compiler : Syntax Analysis
Phase : Conventions
Sm
Xm
Sm-1
Xm-1
.
.
S1
X1
S0
a1 ... ai ... an $
LR Parsing Algorithm
stack
input
output
Action Table
terminals and $
s t four different a actions t e s
Goto Table
non-terminal
s t each item is a a state number t e s
Compiler Target Program
Output
Phases of Compiler : Syntax Analysis Phase : Algorithm
For an identifier it checks if the entry is already in symbol table. If not new entry is made.
When control returns to syntax analyzer it modifies the Top of Stack to agree with New Top of Stack field
When Match is found the action routines specified in the action fields are executed in order from left to right
Reduction are tested consecutively for match between Top of Stack and Input Buffer until match is found
(SLR) Parsing Tables for Expression Grammar
state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Action Table Goto Table
1) E E+T
2) E T
3) T T*F
4) T F
5) F (E)
6) F id
Actions of A (S)LR-Parser -- Example
stack input action output
0 id*id+id$ shift 5
0id5 *id+id$ reduce by Fid Fid
0F3 *id+id$ reduce by TF TF
0T2 *id+id$ shift 7
0T2*7 id+id$ shift 5
0T2*7id5 +id$ reduce by Fid Fid
0T2*7F10 +id$ reduce by TT*F TT*F
0T2 +id$ reduce by ET ET
0E1 +id$ shift 6
0E1+6 id$ shift 5
0E1+6id5 $ reduce by Fid Fid
0E1+6F3 $ reduce by TF TF
0E1+6T9 $ reduce by EE+T EE+T
0E1 $ accept
Compiler Target Program
Output
Phases of Compiler : Interpretation Phase
A Collection of routine that are called when a construct is recognized in the syntactic phase
It creates an intermediate form of the source program
Adds information to the identifier table
Compiler Target Program
Output
Phases of Compiler : Interpretation Phase : Database
Database Description
Temporary Storage Table
Matrix
Identifier Table
Uniform Symbol Table
Previou
s Info Storage
Class Array
Boun
d
Structure Info
Literal Value
Other information
Address
Table Index
Operator Operand 1 Operand 2
Attribute of Temporary
Computations : Data type Base Scal
e Precis
ion Storage
Class Other
information
Address
Compiler Target Program
Output
Phases of Compiler : Optimization Phase : Database
Database Description
Matrix
Identifier Table
Literal Table
Previou
s Info Storage
Class Array
Boun
d
Structure Info
Literal Value
Other information
Address
Operator Operand 1 Operand 2 Forward Pointer
Backward Pointer
Literal Base Scale Precision Other information
Address
Global: Common Subexpression Elimination (CSE)
r3 = r4 / r7
r2 = r2 + 1
r3 = r3 + 1 r1 = r3 * 7
r5 = r2 * r6
r8 = r4 / r7
r9 = r3 * 7
r1 = r2 * r6 Goal: eliminate recomputations of an expression
Rules: 1. X and Y have the same
opcode and X dominates Y 2. src(X) = src(Y) for all srcs 3. For all srcs, no def of a src on
any path between X and Y (excluding Y)
4. Insert rx = dest(X) immediately after X for new register rx
5. Replace Y with move dest(Y) = rx
r8 = r10
r10 = r3
Phases of Compiler : Optimization Phase : CSE Algorithm
Local: Strength Reduction
r7 = 5
r5 = 2 * r4
r6 = r4 * 4
r6 = r4 << 2
r5 = r4 + r4
Goal: replace expensive operations with cheaper ones
Rules (common): 1. X is an multiplication
operation where src1(X) or src2(X) is a const 2k integer literal
2. Change X by using shift operation
3. For k=1 can use add
Phases of Compiler : Optimization Phase : LSR Algorithm
Global: Code Motion
r4 = M[r5]
r7 = r4 * 3
r8 = r2 + 1
r7 = r8 * r4 r3 = r2 + 1
r1 = r1 + r7
M[r1] = r3
r1 = 0 preheader
header
Goal: move loop-invariant computations to preheader
Rules: 1. Operation X in block that
dominates all exit blocks 2. X is the only operation to
modify dest(X) in loop body 3. All srcs of X have no defs in
any of the basic blocks in the loop body
4. Move X to end of preheader 5. Note 1: if one src of X is a
memory load, need to check for stores in loop body
6. Note 2: X must be movable and not cause exceptions
r4 = M[r5]
Phases of Compiler : Optimization Phase : LSR Algorithm
For More Details : Visit our Blog
Mahesh Sanghavi
Kainjan Sanghavi https://kainjan1.wordpress.com/
Thank You
https://maheshsanghavi.wordpress.com/
top related