chap. 4, intermediate code generation
DESCRIPTION
Chap. 4, Intermediate Code Generation. Compilation in a Nutshell 1. Source code (character stream). if (b == 0) a = b;. Lexical analysis. if. (. b. ==. 0. ). a. =. b. ;. Token stream. Parsing. if. ;. ==. =. Abstract syntax tree (AST). b. 0. a. b. Semantic Analysis. if. - PowerPoint PPT PresentationTRANSCRIPT
Chap. 4, Intermediate Code Generation
Compilation in a Nutshell 1
Source code(character stream)
Lexical analysis
Parsing
Token stream
Abstract syntax tree(AST)
Semantic Analysis
if (b == 0) a = b;
if ( b ) a = b ;0==
if==
b 0
=
a b
if
==
int b int 0
=
int alvalue
int b
boolean
Decorated ASTint
;
;
Compilation in a Nutshell 2
Intermediate Code Generation
Optimization
Code generation
if
==
int b int 0
=
int alvalue
int b
boolean int;
CJUMP ==
MEM
fp 8
+
CONST MOVE
0 MEM MEM
fp 4 fp 8
NOP
+ +
CJUMP ==
CONST
MOVE
0 DX
CX
NOP
CX CMP CX, 0
CMOVZ DX,CX
Outline
• Intermediate Code Representation• Expressions Translation• Array Element Translation• Type Conversion• Boolean Expression Translation• Procedure Translation
Role of Intermediate Code
• Closer to target language. – simplifies code generation.
• Machine-independent.– simplifies retargeting of the compiler.– Allows a variety of optimizations to be
implemented in a machine-independent way.
• Many compilers use several different intermediate representations.
Different Kinds of IRs
• Graphical IRs: the program structure is represented as a graph (or tree) structure.Example: parse trees, syntax trees, DAGs.
• Linear IRs: the program is represented as a list of instructions for some virtual machine.Example: three-address code.
• Hybrid IRs: combines elements of graphical and linear IRs.
Graphical IRs 1: Parse Trees
• A parse tree is a tree representation of a derivation during parsing.
• Constructing a parse tree:– The root is the start symbol S of the grammar.– Given a parse tree for X , if the next derivation
step is X 1…n then the parse tree is obtained as:
Graphical IRs 2: Abstract Syntax Trees (AST)
A syntax tree shows the structure of a program by abstracting away irrelevant details from a parse tree.– Each node represents a computation to be performed;– The children of the node represents what that
computation is performed on.
Graphical IRs 3: Directed Acyclic Graphs (DAGs)
A DAG is a contraction of an AST that avoids duplication of nodes.
• reduces compiler memory requirements;• exposes redundancies.
E.g.: for the expression (x+y)*(x+y), we have:
AST: DAG:
Linear IR 1: Three Address Code
• Instructions are of the form ‘x = y op z,’ where x, y, z are variables, constants, or “temporaries”.
• At most one operator allowed on RHS, so no ‘built-up” expressions.
Three Address Code: Example
• Source: if ( x + y*z > x*y + z)
a = 0;
• Three Address Code:t1 = y*z
t2 = x+t1 // x + y*z
t3 = x*y
t4 = t3+z // x*y + z
if (t2 t4) goto La = 0
L:
An Example Intermediate Instruction Set
• Assignment:– x = y op z (op binary)– x = op y (op unary); – x = y
• Jumps:– if ( x op y ) goto L (L a
label); – goto L
• Pointer and indexed assignments:– x = y[ z ]– y[ z ] = x– x = &y– x = *y– *y = x.
• Procedure call/return:– param x, k (x is the kth
param)– retval x– call p– enter p– leave p– return– retrieve x
• Type Conversion:– x = cvt_A_to_B y (A, B base
types) e.g.: cvt_int_to_float
• Miscellaneous– label L
Three Representations of Instructions
• Three representations of instructions in a data structure– Quadruples– Triples– Indirect triples
Quadruples
• Quadruple (quad): four fields– op, arg1, arg2, result
• Exceptions:– Unary operators: no arg2– Param: no arg2 and result– Conditional and unconditional jumps:
put the target label in result
Quadruples for a=c*b+c*b
op arg1 arg2result
(1) * c b T1
(2) * c b T2
(3) + T1 T2 a
Quadruples for a:=(b+c)*e+(b+c)/f
Triple
• Triple : three fields– op, arg1, arg2 op arg
1arg2
(14) uminus c
(15) * b (14)
(16) uminus c
(17) * b (16)
(18) + (15) (17)
(19) assign a (18)
Outline
• Intermediate Code Representation• Expressions Translation• Array Element Translation• Type Conversion• Boolean Expression Translation• Procedure Translation
SDD for Expression Translation
• The synthesized attribute S.code represents the three-address code for non-terminal S 。
• Each non-terminal E has two attributes :– E.place represents the place to store E’s
value 。– E.code represents the three-address code
for non-terminal E 。– Function newtemp returns a different
temp variable, such as T1,T2,…, for each call.
Three-address Code Generation SDD for Expression Translation
Production Semantic RulesS→id:=E S.code:=E.code || gen(id.place ‘:=’ E.place)E→E1+E2 E.place:=newtemp; E.code:=E1.code || E2.code ||
gen(E.place ‘:=’ E1.place ‘+’ E2.place)E→E1*E2 E.place:=newtemp;
E.code:=E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place)
E→-E1 E.place:=newtemp; E.code:=E1.code ||
gen(E.place ‘:=’ ‘uminus’ E1.place)E→ (E1) E.place:=E1.place;
E.code:=E1.codeE→id E.place:=id.place;
E.code=‘ ’
Three-address Code Generation SDT for Expression Translation
S→id:=E { p:=lookup(id.name); if pnil then
emit(p ‘:=’ E.place) else error }
E→E1+E2 { E.place:=newtemp;
emit(E.place ‘:=’ E1.place ‘+’ E2.place)}
E→E1*E2 { E.place:=newtemp;
emit(E.place ‘:=’ E 1.place ‘*’ E
2.place)}
S→id:=E S.code:=E.code || gen(id.place ‘:=’ E.place)E→E1+E2 E.place:=newtemp; E.code:=E1.code || E2.code ||gen(E.place ‘:=’ E1.place ‘+’ E2.place)E→E1*E2 E.place:=newtemp;
E.code:=E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place)
Three-address Code Generation SDT for Expression Translation
E→-E1 { E.place:=newtemp;
emit(E.place‘:=’ ‘uminus’E
1.place)}
E→(E1) { E.place:=E1.place}
E→id { p:=lookup(id.name); if pnil then
E.place:=p else error }
E→-E1 E.place:=newtemp; E.code:=E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place)
E→ (E1) E.place:=E1.place; E.code:=E1.code
E→id E.place:=id.place; E.code=‘ ’
a:=(b+c)*e+(b+c)/f
Outline
• Intermediate Code Representation• Expressions Translation• Array Element Translation• Type Conversion• Boolean Expression Translation• Procedure Translation
Addressing Array Elements
A: array[1..2, 1..3]• Column major
A[1, 1], A[2, 1], A[1, 2], A[2, 2], A[1, 3], A[2, 3]
• Row majorA[1, 1], A[1, 2], A[1, 3], A[2, 1], A[2, 2], A[2, 3] A[i1, i2] address:
base + ( (i1 low1) n2 + (i2 low2 ) ) w
= ( (i1 n2 ) + i2 ) w +
(base ( (low1 n2 ) + low2 ) w)
Addressing Array Elements
• For an array A[low, low+n-1] with n elements– A[i] begins at: base + (i-low)*w
• For k-dimensional arrays,– lowi is the lower-bound of i-th dimension,
– ((…i1 n2+i2)n3+i3)…)nk+ik)×w +
base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w
VARPART
CONSPART
Array Element Processing Grammar
• L → id [ Elist ] | idElist→Elist,E | E
To facilitate processing, We rewrite the grammar as L→Elist ] | id
Elist→Elist, E | id [ E
• New attributes and functions
– Elist.array : Symbol table entry of id
– Elist.ndim : number of dimensions.
– Elist.place : a temporary variable to store the value calculated from the index expression.
– limit(array , j) : return the length of the j-th dimension.
•Each non-terminal L has two attribute values–L.place :
•Symbol table entry of L if L is a simple variable
•CONSPART value if L is a indexed variable
–L.offset :•Null if L is a simple variable•VARPART value if L is a indexed variable
(1) S→L:=E(2) E→E+E(3) E→(E)(4) E→L(5) L→Elist ](6) L→id(7) Elist→ Elist, E(8) Elist→id [ E
(1) S→L:=E{ if L.offset=null then /*L is a simple variable*/
emit(L.place ‘:=’ E.place) else emit( L.place ‘ [’ L.offset ‘]’ ‘:=’
E.place)} (2) E→E1 +E2
{ E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘+’ E 2.place)}
(3) E→(E1) {E.place:=E1.place}
(4) E→L{ if L.offset=null then E.place:=L.place else begin E.place:=newtemp; emit(E.place ‘:=’ L.place ‘[’ L.offset
‘]’ ) end }
A[i1,i2,…,ik] ((…i1 n2+i2)n3+i3)…)nk+ik)×w +
base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w
(8) Elist→id [ E{ Elist.place:=E.place; Elist.ndim:=1; Elist.array:=id.place }
A[ i1,i2,…,ik ]( (…i1 n2+i2)n3+i3)…)nk+ik)×w +
base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w
(7) Elist→ Elist1, E
{ t:=newtemp;m:=Elist1.ndim+1;
emit(t ‘:=’ Elist1.place ‘*’ limit(Elist1.array,m) );
emit(t ‘:=’ t ‘+’ E.place); Elist.array:= Elist1.array;
Elist.place:=t;Elist.ndim:=m
}
A[i1,i2,…,ik] ((…i1 n2+i2)n3+i3)…)nk+ik) ×w +
base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w
(5) L→Elist ]{ L.place:=newtemp; emit(L.place ‘:=’ Elist.array ‘ ’ - C);
L.offset:=newtemp; emit(L.offset ‘:=’ w ‘*’ Elist.place) }
(6) L→id { L.place:=id.place; L.offset:=null }
a:=B[i,j]
Outline
• Intermediate Code Representation• Expressions Translation• Array Element Translation• Type Conversion• Boolean Expression Translation• Procedure Translation
Type Conversion
• E.type: the data type of non-terminal E
• Suppose there are two data types:– int op– real op
• The semantic action for EE1 op E2 :{ if E1.type=integer and E2.type=integer E.type:=integer else E.type:=real }
Type Conversion Example
• x:=y + i*j in which x,y are real and i,j are int 。
Three address codes:
T1:=i int* j
T3:=inttoreal T1
T2:=y real+ T3
x:=T2
Semantic Action for E→E1 + E2
{ E.place:=newtemp;
if E1.type=integer and E2.type=integer then begin
emit (E.place ‘:=’ E 1.place ‘int+’ E 2.place); E.type:=int
end
else if E1.type=real and E2.type=real then begin
emit (E.place ‘:=’ E 1.place ‘real+’ E 2.place); E.type:=real
end
else if E1.type=integer and E2.type=real then beginu:=newtemp;emit (u ‘:=’ ‘inttoreal’ E 1.place);emit (E.place ‘:=’ u ‘real+’ E 2.palce);E.type:=real
endelse if E1.type=real and E1.type=integer then begin
u:=newtemp;emit (u ‘:=’ ‘inttoreal’ E 2.place);emit (E.place ‘:=’ E 1.place ‘real+’ u);E.type:=real
end else E.type:=type_error}
Outline
• Intermediate Code Representation• Expressions Translation• Array Element Translation• Type Conversion• Boolean Expression Translation• Procedure Translation
• Direct translation : A or B and C=D
(1) (=, C, D, T1)
(2) (and, B, T1, T2)
(3) (or, A, T2, T3)
• Translation with optimization– if (x<100 or x>200 and x<>y) x:=0;
• if x<100 goto L2ifFalse x>200 goto L1ifFlase x<>y goto L1L2: x=0L1:
Two translation methods
Outline
• Three-Address Code• Expressions Translation• Array Element Translation• Type Conversion• Boolean Expression Translation
– Direct Translation– Optimized Translation– Backpatching
• Procedure Translation
Direct translation• a or b and not c can be translated into
T1:=not cT2:=b and T1
T3:=a or T1
• a<b can be written as
if a<b then 1 else 0
Hence, it can translated into
100: if a<b goto 103101: T:=0102: goto 104103: T:=1104:
Boolean Expression Direct Translation SDT
• emit – print the three address code to the output file
• nextstat – address index for the next three address code
• emit will add 1 to nextstat by generating a new three address code
Boolean Expression Direct Translation SDT
E→E1 or E2 {E.place:=newtemp;
emit(E.place ‘:=’ E 1.place ‘or’ E2.place)}
E→E1 and E2 {E.place:=newtemp;
emit(E.place ‘:=’ E 1.place ‘and’ E2.place)}
E→not E1 {E.place:=newtemp;
emit(E.place ‘:=’ ‘not’ E 1.place)}
E→(E1) {E.place:=E1.place}
Boolean Expression Direct Translation SDT
Eid1 relop id2 { E.place:=newtemp;emit(‘if’ id1.place relop. op id2.
place ‘goto’ nextstat+3);
emit(E.place ‘:=’ ‘0’);emit(‘goto’ nextstat+2);emit(E.place‘:=’ ‘1’) }
E→id { E.place:=id.place }
a<b is translated into100: if a<b goto 103101: T:=0102: goto 104103: T:=1104:
a<b or c<d and e<f Direction Translation
100: if a<b goto 103
101: T1:=0102: goto 104
103: T1:=1104: if c<d goto 107105: T2:=0106: goto 108107: T2:=1108: if e<f goto 111109: T3:=0110: goto 112111: T3:=1112: T4:=T2 and T3
113: T5:=T1 or T4
Eid1 relop id2 { E.place:=newtemp;
emit(‘if’ id1.place relop. op id2. place ‘goto’ nextstat+3);
emit(E.place ‘:=’ ‘0’);emit(‘goto’ nextstat+2);emit(E.place‘:=’ ‘1’) }
E→id { E.place:=id.place }E→E1 or E2 { E.place:=newtemp;
emit(E.place ‘:=’ E 1.place ‘or’ E2.place)}
E→E1 and E2
{ E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘and’
E2.place) }
Outline
• Three-Address Code• Expressions Translation• Array Element Translation• Type Conversion• Boolean Expression Translation
– Direct Translation– Optimized Translation– Backpatching
• Procedure Translation
Translation for Boolean Expression as Conditional
Statement Control• if E then S1 else S2
Two exits for E : E.true and E.false
E.code
S1.code
S2.code
To E.true
To E.false
goto S.next
S.next ……
E.true:
E.false:
• Example: if a>c or b <d then S1 else S2
the following three address code
if a>c goto L2 true exit goto L1
L1: if b<d goto L2 true exit goto L3 false exit
L2: (S1 three address code)goto Lnext
L3: (S2 three address code)Lnext:
• Newlabel- a new label will be returned for each call.
• For each Boolean expression E , there are two labels:– E.true is the label to reach when E is
true– E.false is the label to reach when E is
false
Three Address Code Generation SDD for Boolean Expression
Productions Semantic Rules
E→E1 or E2
E1.true:=E.true;
E1.false:=newlabel;
E2.true:=E.true;
E2.false:=E.false;
E.code:=E1.code ||
gen(E1.false ‘:’) || E2.code
E1.code To E.true
To E1.false
E2.code To E.trueTo E.false
Three Address Code Generation SDD for Boolean Expression
Productions Semantic Rules
E→E1 and E2
E1.true:=newlabel; E1.false:=E.false; E2.true:=E.true; E2.false:=E.fasle; E.code:=E1.code ||
gen(E1.true ‘:’) || E2.code
E1.code To E. false
To E1. true
E2.code To E.trueTo E.false
Three Address Code Generation SDD for Boolean Expression
Productions Semantic Rules
E→not E1 E1.true:=E.false; E1.false:=E.true;
E.code:=E1.code
E→ (E1) E1.true:=E.true; E1.false:=E.false; E.code:=E1.code
Three Address Code Generation SDD for Boolean Expression
Productions Semantic Rules E→id1 relop id2 E.code:=gen(‘if ’ id1.place
relop.op id2.place ‘goto’ E.true) || gen(‘goto’ E.false)
E→true E.code:=gen(‘goto’ E.true)
E→false E.code:=gen(‘goto’ E.false)
Outline
• Intermediate Code Representation• Expressions Translation• Array Element Translation• Type Conversion• Boolean Expression Translation
– Direct Translation– Optimized Translation– Backpatching
• Procedure Translation
Backpatching
• Key problem: matching a jump instruction with the target of the jump– Passing labels as inherited attributes, a
separate pass is needed to bind labels to addresses
– Backpatching: passing lists of jumps as synthesized attributes
One-pass Code Generation for Boolean Expression
• Quadruples (jnz, a, -, p) -- if a goto p (jrop, x, y, p) -- if x rop y goto p (j, -, -, p) -- goto p
Translating Short-Circuit Expressions Using
Backpatching
E E or M E | E and M E | not E | ( E ) | id relop id | true | falseM
Synthesized attributes:E.code three-address codeE.truelist backpatch list for jumps on trueE.falselist backpatch list for jumps on falseM.quad location of current three-address quad
Backpatch Operations with Lists
• nextquad –location of next quadruple• makelist(i) creates a new list containing
three-address location i, returns a pointer to the list
• merge(p1, p2) concatenates lists pointed to by p1 and p2, returns a pointer to the concatenates list
• backpatch(p, i) inserts i as the target label for each of the statements in the list pointed to by p
Backpatching with Lists: Example
a < b or c < d and e < f
100: if a < b goto _101: goto _102: if c < d goto _103: goto _104: if e < f goto _105: goto _
100: if a < b goto TRUE101: goto 102102: if c < d goto 104103: goto FALSE104: if e < f goto TRUE105: goto FALSE
backpatch
Backpatching with Lists: Translation Scheme
M { M.quad := nextquad() }E E1 or M E2
{ backpatch(E1.falselist, M.quad); E.truelist := merge(E1.truelist, E2.truelist); E.falselist := E2.falselist }
E E1 and M E2
{ backpatch(E1.truelist, M.quad); E.truelist := E2.truelist; E.falselist := merge(E1.falselist, E2.falselist); }
E not E1 { E.truelist := E1.falselist; E.falselist := E1.truelist }
E ( E1 ) { E.truelist := E1.truelist; E.falselist := E1.falselist }
Backpatching with Lists: Translation Scheme (cont’d)
E id1 relop id2
{ E.truelist := makelist(nextquad()); E.falselist := makelist(nextquad() + 1); emit(‘if’ id1.place relop.op id2.place ‘goto _’); emit(‘goto _’) }
E true { E.truelist := makelist(nextquad()); E.falselist := nil; emit(‘goto _’) }
E false { E.falselist := makelist(nextquad()); E.truelist := nil; emit(‘goto _’) }
Flow-of-Control Statements and Backpatching:
Grammar
S if E then S | if E then S else S | while E do S | begin L end | AL L ; S | S
Synthesized attributes:S.nextlist backpatch list for jumps to the
next statement after S (or nil)L.nextlist backpatch list for jumps to the
next statement after L (or nil)
S1 ; S2 ; S3 ; S4 ; S4 … backpatch(S1.nextlist, 200)backpatch(S2.nextlist, 300)backpatch(S3.nextlist, 400)backpatch(S4.nextlist, 500)
100: Code for S1200: Code for S2300: Code for S3400: Code for S4500: Code for S5
Jumpsout of S1
Flow-of-Control Statements and Backpatching
S A { S.nextlist := nil }S begin L end
{ S.nextlist := L.nextlist }S if E then M S1
{ backpatch(E.truelist, M.quad); S.nextlist := merge(E.falselist, S1.nextlist) }
L L1 ; M S { backpatch(L1.nextlist, M.quad); L.nextlist := S.nextlist; }
L S { L.nextlist := S.nextlist; }M { M.quad := nextquad() }
Flow-of-Control Statements and Backpatching (cont’d)
S if E then M1 S1 N else M2 S2
{ backpatch(E.truelist, M1.quad); backpatch(E.falselist, M2.quad); S.nextlist := merge(S1.nextlist,
merge(N.nextlist, S2.nextlist)) }S while M1 E do M2 S1
{ backpatch(S1,nextlist, M1.quad); backpatch(E.truelist, M2.quad); S.nextlist := E.falselist; emit(‘goto _’) }
N { N.nextlist := makelist(nextquad()); emit(‘goto _’) }
while (a<b) doif (c<d) then x:=y+z;
(5) E→id1 relop id2 { E.truelist:=makelist(nextquad); E.falselist:=makelist(nextquad+1);
emit(‘j’ relop.op ‘,’ id 1.place ‘,’ id 2.place‘,’ ‘ _’); emit(‘j, - , - , _’) }
S→id:=E { p:=lookup(id.name); if pnil then
emit(p ‘:=’ E.place) else error }
E→E1+E2 { E.place:=newtemp; emit(E.place ‘:=’ E1.place ‘+’ E2.place)}
S→if E then M S1
{ backpatch(E.truelist, M.quad);S.nextlist:=merge(E.falselist, S1.nextlist) }
M→ { M.quad:=nextquad }
S→A { S.nextlist:=makelist( ) }
S→while M1 E do M2 S1
{backpatch(S1.nextlist, M1.quad);
backpatch(E.truelist, M2.quad);
S.nextlist:=E.falselistemit(‘j, - , - ,’ M1.quad) }
M→ { M.quad:=nextquad }
while (a<b) doif (c<d) then x:=y+z;
100 (j<, a, b, 102)101 (j, -, -, 107)102 (j<, c, d, 104)103 (j, -, -, 100)104 (+, y, z, T)105 (:=, T, -, x)106 (j, -, -, 100)107
Outline
• Intermediate Code Representation• Expressions Translation• Array Element Translation• Type Conversion• Boolean Expression Translation• Procedure Translation
Translating Procedure Calls
S call id ( Elist )Elist Elist , E | E
foo(a+1, b, 7) t1 := a + 1t2 := 7param t1param bparam t2call foo 3
Translating Procedure Calls
S call id ( Elist ) { for each item p on queue do emit(‘param’ p); emit(‘call’ id.place |queue|) }
Elist Elist , E { append E.place to the end of queue }Elist E { initialize queue to contain only E.place }