chapter 10 code optimization zhang jing, wang hailing college of computer science & technology...

Post on 20-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 10 Code Optimization

Zhang Jing, Wang HaiLing

College of Computer Science & Technology

Harbin Engineering University

zhangjing@hrbeu.edu.cn 2

As we imagined, the target code made by compiler should run faster or take less space, or both. .

In fact, this goal is difficult to be achieved or only can be reached in limited cases.

In order to obtain the goal, we use code improving transformations which is called optimizing. Of course, code optimization can only guarantee the possibility that the code is best. .

zhangjing@hrbeu.edu.cn 3

There are two types of code optimization, the first one is machine-independent

optimizations which means the optimization has no relationship with properties of the target machine, optimizations of this type is on the level of intermediate code or source program.

The second one is the optimization which is related with target machine, namely, the optimization is based on the level of target code. The position of code optimization in compiler is shown below. .

zhangjing@hrbeu.edu.cn 4

zhangjing@hrbeu.edu.cn 5

As we all know, a compiler is a program that reads the source program in a high-level language and translates it into (typically) machine language. This is a complicated process involving a number of stages. If the compiler is an optimizing compiler, one of these stages "optimizes" the machine language code so that it either takes less time to run or occupies less memory or sometimes both. .

zhangjing@hrbeu.edu.cn 6

Of course, whatever optimizations the compiler does, it must not affect the logic of the program i.e. the optimization must preserve the meaning of the program. One might wonder what type of optimizations the compiler uses to produce efficient machine code? Since in no case the meaning of the program being compiled should be changed, the compiler must inspect the program very thoroughly and find out the suitable optimizations that can be applied. .

zhangjing@hrbeu.edu.cn 7

10.1 Classifications of optimizations Optimizations that are performed automatically by a compiler or manually by the programmer, can be classified by various characteristics

The scope of the optimization:

(1) Local optimizations - Performed in a part of

one procedure.

1) Common sub-expression elimination.

zhangjing@hrbeu.edu.cn 8

2) Using registers for temporary results, and if possible for variables.

3) Replacing multiplication and division by shift and add operations.

(2) Global optimizations - Performed with the help of data flow analysis.

1) Code motion (hoisting) outside of loops.

2) Constant propagation.

3) Strength reductions.

zhangjing@hrbeu.edu.cn 9

The improvement in optimization:

(1) Space optimizations - Reduces the size of the executable/object.

1) Constant folding.

2) Dead-code elimination.

3) Redundant Code Elimination.

4) Unreachable Code Elimination.

zhangjing@hrbeu.edu.cn 10

(2) Speed optimizations - Most optimizations belong to this category

The code types of optimization:

(1) Source program optimization.

(2) Three address code optimization.

(3) Quadruples code optimization.

(4) target code optimizations.

zhangjing@hrbeu.edu.cn 11

10.2 Source program optimizations Source program optimization is that optimizations

work regardless of processor or compiler and the object is source program. .

1. Eliminating common sub-expressions Register operations are much faster than memory

operations, so all compilers try to put in registers data that is supposed to be heavily used, like temporary variables and array indexes.

zhangjing@hrbeu.edu.cn 12

To facilitate such register scheduling, the largest sub-expressions may be computed before the smaller ones. This is an old optimization trick that compilers are able to perform quite well: .

Example1:

X = A * LOG(Y) + (LOG(Y) ** 2)

t = LOG(Y) X = A * t + (t ** 2)

Optimize

zhangjing@hrbeu.edu.cn 13

Example2:

/* Sum neighbors of i,j */up = val[(i-1)*n + j];down = val[(i+1)*n + j];left = val[i*n + j-1];right = val[i*n + j+1];sum = up + down + left + right;

int inj = i*n + j;up = val[inj - n];down = val[inj + n];left = val[inj - 1];right = val[inj + 1];sum = up + down + left + right;

Optimize

zhangjing@hrbeu.edu.cn 14

2. Redundant Code Elimination

i:=m-1j:=nt:=4*nv:=a[t]s:=m-1u:=a[s]i:=i+1

i:=m-1j:=nt:=4*nv:=a[t]u:=a[i]i:=i+1

Optimize

zhangjing@hrbeu.edu.cn 15

3. Unreachable Code Elimination

A common example of unreachable code elimination is an if statement. If the compiler finds out that the condition inside the if is never going to be true, then the body of the if statement will never be executed. In that case, the compiler can completely eliminate this unreachable code, thus saving the memory space occupied by the code. .

zhangjing@hrbeu.edu.cn 16

i:=m-1if (j>0) goto L1j:=nt:=4*nv:=a[t]L1:v:=a[i]i:=i+1……..

i:=m-1v:=a[i]i:=i+1……..

Optimize

zhangjing@hrbeu.edu.cn 17

4. Dead Code Elimination Dead code is the code in the program that will

never be executed for any input or other conditions. The dead code example is an constant that it has never been used, it is shown below.

i:=m-1j:=nt:=4*nv:=a[t]i:=v+1……..

i:=m-1j:=nt:=4*nv:=a[t]i:=v+1……..

Optimize

zhangjing@hrbeu.edu.cn 18

5. Strength Reduction

To replace an equivalent but cheaper (shorter) sequence. One type of code optimization is strength reduction in which a "costly" operation is replaced by a less expensive one. For example, the evaluation of x2 is much more efficient if we multiply x by x rather than call the exponentiation routine. One place where this optimization can be applied is in loops.

zhangjing@hrbeu.edu.cn 19

Replace costly operation with simpler one. Shift, add instead of multiply or divide

16*x x << 4

Utility of this optimization is machine dependent. Depends on cost of multiply or divide instruction, shift or add is usually a single cycle operation. Recognize sequence of products turn them into a sequence of adds.

zhangjing@hrbeu.edu.cn 20

Example 1

i:=m-1

j:=i

i:=i+i

i:=j+i

……..

i:=m-1j:=ii:=3*i……..

Optimize

zhangjing@hrbeu.edu.cn 21

Example 2

r1:=r2*2 r1:=r2+r2 r1:=r2<<1

r1:=r2/2 r1:=r2>>1

r1:=0Optimizer1:=r2*0

Optimize

Optimize Optimize

zhangjing@hrbeu.edu.cn 22

6. Constant folding: Constant folding is the simplest code optimization to

understand. Let us suppose that you write the statement x = 45 * 88; in your C program. A non-optimizing compiler will generate code to multiply 45 by 88 and store the value in x. An optimizing compiler will detect that both 45 and 88 are constants, so their product will also be a constant. Hence it will find 45 * 88 = 3960 and generate code that simply copies 3960 into x. This is constant folding, and means the calculation is done just once at compile time, rather than every time the program is run.

zhangjing@hrbeu.edu.cn 23

r2:=3*2 r2:=6Optimize

Elimination of redundant loads and stores.

r2:=6i:=r2r3:=ir4:=r3*3

r2:=6i:=r2r4:=r2*3

Optimize

zhangjing@hrbeu.edu.cn 24

Constant propagation:

r2:=4r3:=r1+r2r2:=….

r2:=4r3:=r1+4r2:=….

r3:=r1+4r2:=….

Optimize Optimize

r1:=3r2:=r1*2

r1:=3r2:=3*2

r1:=3r2:=6

Optimize Optimize

zhangjing@hrbeu.edu.cn 25

Copy propagation:

Elimination of useless instructions

r1:=r1+0r1:=r1*1

r2:=r1r3:=r1+r2r2:=5

r2:=r1r3:=r1+r1r2:=5

r3:=r1+r1r2:=5

Optimize Optimize

zhangjing@hrbeu.edu.cn 26

7. Loop optimizations

A very important part for optimization is loops. If we make the number of instructions in a loop decreased, the running time of a program will be improved, though sometimes, it maybe cause the number of code outside the loop increased. .

zhangjing@hrbeu.edu.cn 27

There are three ways for loop optimization: code motion, induction-variable elimination and reduction in strength. Code motion is to move code outside a loop; induction-variable elimination means to eliminate extra variable from a loop; reduction in strength can replace a complicated operation by a simple one.

Reduce frequency with which computation performed , if it will always produce same result, so moving code out of loop. .

zhangjing@hrbeu.edu.cn 28

Example 1

Example 2

J=2*4 While (i<=j)

While (i<= 2*4) ……

Optimize

Optimize

for (i = 0; i < n; i++)for (j = 0; j < n; j++)a[n*i + j] = b[j];

for (i = 0; i < n; i++) {int ni = n*i;for (j = 0; j < n; j++)a[ni + j] = b[j];}

zhangjing@hrbeu.edu.cn 29

Example 3

Moving as much as possible computations outside loops, saves computing time. In the following example (2.0 * PI) is an invariant expression that there is no reason to recompute it 100 times. .

DO I = 1, 100 ARRAY(I) = 2.0 * PI * IENDDO

t = 2.0 * PIDO I = 1, 100 ARRAY(I) = t * IENDDO

Optimize

zhangjing@hrbeu.edu.cn 30

So we can conclude that the transformation of loop optimization is

(1) Take an expression transformation to get the same result with the transformation before and to obtain independent of the time number.

(2) Place the expression before the loop.

zhangjing@hrbeu.edu.cn 31

10.3 Optimizations of three- address code

Actually, there are several blocks in one program, namely, block means a part of program with one entrance and one exit, further more, block running is in sequence. For example, figure 10.2a is a block, on the other hand figure 10.2 b is not a block, because there are two exits in it, so it is only a part of program. .

zhangjing@hrbeu.edu.cn 32

zhangjing@hrbeu.edu.cn 33

Combining the results of two expressions that they should be in one block. The procedure of combining is firstly to calculate the result of constants in one expression and then use the new result to replace all the calculation which is related with the constants. So the step of combining in detail is as follows. .

zhangjing@hrbeu.edu.cn 34

Step1 recognize constant expression. Step2 replace constant expression by the result

of constants computing Step3 generate target code according to the

combine results.

zhangjing@hrbeu.edu.cn 35

Example 10.1

zhangjing@hrbeu.edu.cn 36

Because the value of “a” is known at compiling, we can compute the known values and replace them by their result to obtain the optimized three address code. .

Before combining constant, we should first create a symbol table “Tab” which has two fields, field N stores variable name, field V deposits the variable value. The format of three address code is shown below, the first part is operatorω; second one is operand1, we name it P1; the third part is operand2, we call it P2. .

zhangjing@hrbeu.edu.cn 37

operator operand1 operand2

ω P1 P2

We combine constants from top of three address code to the end of three address code in block, the pointer of three address code is “i”.

zhangjing@hrbeu.edu.cn 38

The algorithm of combining constants is:

1 If operator does not equal to “ : =”: P1 or P2 is the variable name in symbol table

“Tab”, we can use the value V of P1 or P2 to replace P1 or P2 in three address code.

If operator equals to “ : =” .

(1) P1 is variable in “Tab”, we can replace P1 in three address code by its value V in “Tab”. .

zhangjing@hrbeu.edu.cn 39

(2) P1 is constant, we can find P2 in “Tab”, if P2 is in “Tab”, we replace value of P2 by value of P1, if P2 is not in “Tab”, store ( P2 , P1 ) to “Tab”, add 1 to pointer i of “Tab”. .

(3) P1 is not constant and there is P2 in “Tab”, delete P2 and its value from “Tab”. .

zhangjing@hrbeu.edu.cn 40

2. If both P1 and P2 are constant, we can combine them, namely, replace ( ω , P1 , P2 ) by ( α ,P1ωP2 , 0 ) , format of P1 in three address code equals P1ωP2 here, format P2 in three address code equals 0, αpresents the result of P1ωP2. .

3. If P1 or P2 is the number of three address code, and the operator in three address code belongs to this number equal to α, then we use P1 in three address code which belongs to this number replace P1 or P2 in present three address code.

zhangjing@hrbeu.edu.cn 41

4. If i is the end number of three address code, then exit; on the other hand, if i is not the end number of three address code, then i : =i+1, and return to step 1.

5. The three address codes which their operator does not equal toαare optimized three address codes.

zhangjing@hrbeu.edu.cn 42

We can optimize Example 10.1 by the optimization algorithm above.

( 1 )(: = , 10 ,a )( 2 ) ( + , a , 20 )( 3 )(: = ,( 2 ),b )( 4 )( / , b , a )( 5 )(: = ,( 4 ),c )

( 1 )(: = , 10 ,a )( 2 ) ( α , 30 ,0 ) (3) (: = , 30 ,b )( 4 ) ( α= , 3 ,0 )( 5 )(: = , 3 ,c )

( 1 ) ( a,10 )( 2 ) ( a,10 )( 3 ) ( a,10 ) ( b,30 )( 4 ) ( a,10 ) ( b,30 )( 5 ) ( a,10 ) ( b,30 )( c,3 )

three address code optimizing symbol table of it

zhangjing@hrbeu.edu.cn 43

The optimized three address code is

( 1 )(: = , 10 , a )( 3 )(: = , 30 , b )( 5 )(: = , 3 , c )

zhangjing@hrbeu.edu.cn 44

10.4 Optimizations of quadruples

Actually, dead code elimination is done in a block. We shall take a block of quadruples for example to introduce which instruction is extra code and should be eliminated. .

zhangjing@hrbeu.edu.cn 45

Example 10.2

a : =b*c+a ;d : =b*c+a;c : =b*c+a

( 1 )( * , b , c , T

1 )( 2 )( + , T1 , a ,T2 )( 3 )(: = , T2 , ,a )( 4 )( * , b , c , T

3 )( 5 )( + , T3 , a ,T4 )( 6 )(: = , T4 , ,d )( 7 )( * , b , c , T

5 )( 8 )( + , T5 , a ,T6 )( 9 )(: = , T6 , ,c )

A block

four address code of the block

zhangjing@hrbeu.edu.cn 46

From the code above, we know that instruction 4 and 7 are same with instruction 1, in addition, they have the same results. Instruction 8 does the same with calculation of instruction 5. In order to optimize the code, instruction 4, 7 and 8 should not be computed, because they can be replaced by others. .

How to judge the extra instructions automatically? We can use the depending algorithm to recognize it.

zhangjing@hrbeu.edu.cn 47

Depending algorithm

(1) At first, we define that depending number for every instruction is 0, namely, dep(X)=0.

(2) If the format of four address code is ( ω , A , B , Ti ) , then dep ( Ti ) =max ( dep ( A ), dep ( B )) +1 (3) If variable “a” is endowed value by instruction i, that is (: = ,

b , , a ) , then dep ( a ) =i (4) If two instruction i and j (i<j) have the same format like ( ω, P1, P

2, ) , and their depending number is same as well , we can judge that instruction j is extra and would not be computed any more, the instruction can be changed

( Same , Ti , Tj , 0 )

zhangjing@hrbeu.edu.cn 48

especially, ifωis a operator which position of operands can be exchanged, we can say ( ω, P1, P2, ) have same format with ( ω, P2 , P1 , ) .

With the help of depending algorithm, the optimized code of example10.2 is shown by table 10.1. .

zhangjing@hrbeu.edu.cn 49

zhangjing@hrbeu.edu.cn 50

10.5 Optimizations of target code

The aim of optimized code is to generate its target code, so this section, we will take expression for example to explain how to optimize target code.

Example 10.3 An expression:

a*b+c/d+a*(a*b+c/d)-a*(c/d+b*a)/d The target code of the expression is:

zhangjing@hrbeu.edu.cn 51

CLA a /*push “a” to stack*/ MUL b /* “a” from stack multiple “b” , and then push the computing result to stack */ STO T1 /*store the result of stack to T1*/ CLA c /*push “c” to stack*/ DIV d /* value “c” from stack divided by “d” ,and then push the computing result to stack */ ADD T1 /* value from stack add T1 , and then push the computing result to stack */ STO T1 CLA a MUL b STO T2 /*store the result of stack to T2*/ CLA c DIV d ADD T2 MUL a /* value from stack multiple “a” , and then push the computing result to stack */ ADD T1 STO T1 CLA c DIV d STO T2 CLA b MUL a ADD T2 /* value from stack add T2 , and then push the computing result to stack */ MUL a DIV d /* value from stack divided by “d” ,and then push the computing result to stack */ RUB T1 /* T1 minus the value from stack ,and then push the computing result to stack */

zhangjing@hrbeu.edu.cn 52

Its three address code of it is:

(1) ( * , a , b )(2) ( / , c , d )(3) ( + ,( 1 ),( 2 ))(4) ( * , a ,( 3 ))(5) ( +, ( 3 ) , ( 4 ))(6) ( / ,( 4 ), d )(7) (—,( 5 ),( 6 ))

zhangjing@hrbeu.edu.cn 53

The appearing time of three address code can be judged by the appearing times of three address code order. For example, the appearing time of three address code order 3 and order 4 are 2, the others are 1. .

Before optimizing target code, we define a concept of AC that labels the situation of stack. At beginning, AC=0; after completing one instruction, AC≠0. If the appearing time of three address code is more than 1, store the result of the three address code to a stack, and then AC=0.

zhangjing@hrbeu.edu.cn 54

Now we begin to generate the optimized target code of example 10.3 from its three address code.

AC=0, three address code ( * , a , b ) , the target code:

CLA a MUL b

AC≠0, three address code ( / , c , d ) , the target code

STO T1 CLA c DIV d

zhangjing@hrbeu.edu.cn 55

AC≠0, three address code ( + ,( 1 ),( 2 ))、( * , a ,( 3 )) , the target code:

AC=0, three address code ( + ,( 3 ),( 4 )) , the target code:

ADD T1 STO T2

MUL a STO T3

ADD T2

zhangjing@hrbeu.edu.cn 56

AC≠0, three address code ( / ,( 4 ), d ) , the target code:

AC≠0, three address code (—,( 5 ),( 6 )) , the target code:

STO T4 CLA T3

DIV d

RUB T4

zhangjing@hrbeu.edu.cn 57

So the optimized target code of it is:

CAL aMUL bSTO T1

CLA cDIV dADD T1

STO T2

MUL aSTO T3

ADD T2

STO T4

CLA T3

DIV dRUB T4

From the example above, we can see that there are 25 instructions before optimization, but after optimization, it has only 14 instructions. .

top related