generation of machine code in algol compilers

11
BIT 5 (1965), 235-245 GENERATION OF MACHINE CODE IN ALGOL COMPILERS* JORN JENSEN Abstract. The paper describes the method used in the GIER ALGOL compiler for gener- ating machine code from an error free Reverse Polish representation of the source code. The basic method is a pseudo evaluation of the expressions in the text. This imitates the run time processes but works with descriptions of how and where the operands are stored instead of with their values. The flow of control of the process is governed by the operators via a table which for each operator holds one or more pseudo instructions. These are obeyed interpretatively by the central logic of the process. Introduction. The final output from an ALGOL compilation is machine code which can perform the algorithm described by the source program. In the GIER ALGOL compiler (ref. 1) which is a multipass translator consisting of 8 passes, this machine code is generated in pass 7. The input to this pass is the source text in Reverse Polish form. Tlfis form is generated and used for type checking in pass 6 as described by Peter Naur (ref. 2). The output is a sequence of machine instructions in symbolic form. The final code uses the machine accumulator for evaluation of expressions and uses explicitly addressed working locations for intermediate results. The Reverse Polish form can easily be used at run time as shown by E. W. Dijkstra (ref. 3). However, in a machine which has built-in floating point operations but no special facilities for working on a stack, it will normally be faster to perform operations directly in the accumulator of the machine and to use named working locations instead of the anony- mous ones which the Reverse Polish form implies. Therefore the output will be a sequence of machine instructions which uses the machine accumulator for operations and refers directly to named variables. The aim. The primary purpose of this paper is to show that the machine code mentioned above may be generated conveniently and systematically * This paper was presented at the NordSAM conference in Stockholm, Aug. 1964.

Upload: jorn-jensen

Post on 15-Aug-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Generation of machine code in ALGOL compilers

BIT 5 (1965), 235-245

GENERATION OF MACHINE CODE IN

ALGOL COMPILERS*

JORN JENSEN

Abstract . The paper describes the method used in the GIER ALGOL compiler for gener-

ating machine code from an error free Reverse Polish representation of the source code. The basic method is a pseudo evaluation of the expressions in the text. This imitates the run time processes but works with descriptions of how and where the operands are stored instead of with their values. The flow of control of the process is governed by the operators via a table which for each operator holds one or more pseudo instructions. These are obeyed interpretatively by the central logic of the process.

Introduct ion .

The final ou tpu t f rom an A L G O L compilat ion is machine code which can per form the a lgor i thm described b y the source program. In the G I E R A L G O L compiler (ref. 1) which is a mult ipass t rans la tor consisting of 8 passes, this machine code is genera ted in pass 7. The inpu t to this pass is the source t ex t in Reverse Polish form. Tlfis form is genera ted and used for t yp e checking in pass 6 as described b y Pe te r Naur (ref. 2).

The o u t p u t is a sequence of machine instruct ions in symbolic form. The final code uses the machine accumula tor for evaluat ion of expressions and uses explici t ly addressed working locations for in te rmedia te results.

The Reverse Polish form can easily be used at run t ime as shown b y E. W. Di jks t ra (ref. 3). However , in a machine which has buil t- in f loating point operat ions bu t no special facilities for working on a stack, i t will normal ly be faster to per form operat ions direct ly in the accumula tor of the machine and to use n a m e d working locations ins tead of the anony- mous ones which the Reverse Polish fo rm implies.

Therefore the ou tpu t will be a sequence of machine ins t ruct ions which uses the machine accumula tor for operat ions and refers d i rec t ly to named variables.

T h e a im.

The p r imary purpose of this paper is to show tha t the machine code ment ioned above m a y be genera ted convenient ly and sys temat ica l ly

* This paper was presented at the NordSAM conference in Stockholm, Aug. 1964.

Page 2: Generation of machine code in ALGOL compilers

236 ;toR~ JENSEN

through a pseudo-evaluation process. By pseudo-evaluation we here mean a process which combines the operators and operands of the source text in the manner in which an actual evaluation would have to do it, but which operates on descriptions of the operands, not on their values. Our second point is that for the purpose of the pseudo-evaluation the Reverse Polish form is very convenient. Thus although we do not use this form in the final object code we find it extremely useful as an intermediate form during translation.

Pseudo -evaluation.

The pseudo-evaluation operates on a stack in the usual manner pre- scribed for evaluation of expressions in Reverse Polish form. This can roughly be described by the following three rules:

1. Proceed through the expression from left to right. 2. When an operand is encountered: Place it at the top of the stack. 3. When an operator is encountered: Perform the corresponding opera-

tion on the top elements of the stack. Leave the result at the top of the stack.

The operations will here normally operate on descriptions of operands and not on their values. The stack hereby keeps track of where and how the operands exist at run time and what machine registers and working locations they occupy.

Whenever changes in the stack situation reflect corresponding changes at run time, we output the code which will effect these changes.

Let us illustrate this scheme by an example: The ALGOL expression

(a+b) × (c+d)

will be input to this process in Reverse Polish form as:

va vb + vc vd + ×

where v announces a simple variable and a, b, c, and d each denotes the address description of such a variable. This address description is of the form (block relative}(block number}. For details about addressing of variables see section 3.2 in ref. 1.

We will follow the actions of the process controlled by the input sym- bols of the Reverse Polish form. We first give an action number (for reference), the input, the output , and the stack situation for each stage of the process. This is followed by a verbal description of the actions. The output is given as short ALGOL statements representing the gener-

Page 3: Generation of machine code in ALGOL compilers

G E N E R A T I O N O F M A C H I N E C O D E I N A L G O L C O M P I L E R S 237

ated code. R F denotes the floating point accumulator at run time, wl denotes a working location.

Action Input

1 va 2 vb

3 +

4: v c

5 vd

6 +

7 ×

Output Stack -+ v ~

va vb

R E := b va

R F : = R E + a RF RF vc

RF vc vd

w l := R F w l vc vd

R F := d w l vc

R F := R F - t - c w l RF

R F : = R F × w l R F

The actions may be described as follows:

1, 2, 4, and 5. Simple variable in input: Place the description of the operand in the stack. This announces that the variable will be used later as an operand. No output is produced.

3. + in input, add the two top operands: A. The stack shows that R F is free to be used for the operation. B. Output code: R F : = topoperand. Remove top element from

stack. C. Output code: R F := R F + topoperand. Remove top element. D. Place the result description RF in the stack.

6. + in input, add the two top operands: A. The stack shows that R F already contains a value. This value

is not one of the top operands. We must then save R F in a working location. Let us assume that we have some mechanism which will deliver us the description of a free working location, free w. Output code: free w := R F . Change the stack position which describes R F so that it instead describes the working location.

B. Proceed as from action 3.B. 7. × in input, multiply the two top operands:

A. The stack shows that R F already contains a value. This value is the top operand. We can therefore:

B. Remove top element. C. Output code: R F : = R F × topoperand. Remove top element. D. Place the result description RF in the stack.

B I T 5 ~ 17

Page 4: Generation of machine code in ALGOL compilers

238 JORN J E N S E N

Operand descriptions. In the example above we have already seen some of the possible

operand descriptions which may occur in the stack. A complete operand description will normally need two kinds of information, the address and the class:

1. The address information. This is primarily needed in the output for run time references to the operand. If the class uniquely identifies the operand this information is irrelevant. The address information for a variable or working location consists of a block-relative address and a block number. The address information for a constant consists of the value of the constant (in internal machine representation). In the examples the address information is denoted by the name of a vari- able, the number of a working location, or the value of a constant.

2. The class information. This specifies what kind of operand we have, it is used by the logic to decide what output actions and stack actions to perform.

In the actual implementation the class may assume the 10 different values listed below. Each value is given a short mnemonic name (bold face) which is used in the examples. After this name follows a short description of the corresponding operand. The first 7 class values describe operands where we are interested in the value of a quantity, e.g., for use in expressions. The last 3 (va, wa, and UAa) describe operands where we are interested in the address of a quantity, e.g., for assignments.

v Variable where the value will be used, e.g., in expressions. w Working location containing a value. UA Universal Address containing the address of an operand where the

value of this operand will be used. The address information is irrelevant. UA denotes a universal working location which at run time will be used for holding the final address of a subscripted variable or the address resulting from a reference to a formal parameter called by name. This is discussed further in section 3 of ref. 1.

UV Universal Value holding the value of a function designator. The address information is irrelevant. U V denotes another universal working location where function values are delivered at run time.

e Constant. The address information contains the value of the con- stant. This is the only case where we work on the value of an operand and not on the description of it.

RF The machine accumulator containing a floating point value. The address information is irrelevant.

Page 5: Generation of machine code in ALGOL compilers

GENERATION OF MACHINE CODE IN ALGOL COMPILERS 239

R The machine accumulator containing a fixed point value, e.g., the result of a logical operation. The address information is irrelevant.

v a Variable where the address will be used, e.g., in assignments. w a Working location containing an address. U A a Universal address containing the address of an operand where only

this address will be used. The address information is irrelevant.

With the above list at our disposal we can now show examples of the handling of a wider variety of ALGOL texts.

EXAMPLE 2. Formal parameter, saving of values. This example shows the handling of the assignment statement in the

following program:

begin real a, b; procedure p i p ( f ) ; real f ;

begin real c, d, e;

a : = b x ( c + d + e x f ) ;

, , ,

end;

end

Action Input

1 v a

2 a d d r

3 vb

4 vc

5 vd

6 -t-

7 ve

s If

9 x

10 11 12 13

+ ×

prepass ° ~ . ~

Output Stack

R F : = d

R F := R F + c

wl := /~F w 2 : = b

f o r m a l ( f ) R F := store[UA]

R F := R F x e

R F := R F + w l

R F : = R E x w2

a : - - R F

v a

v a a

v a a vb

v a a vb vc

v a a vb vc

v a a vb vc

v a a vb R F

v a a vb IIF v a a vb w l

v a a w 2 w l

v a a w 2 w l

v a a w 2 w l

v a a w 2 w l

v a a w 2 R F v a a R F

v a a

vd

v e

v e

v e

ve U A

v e

R F

Page 6: Generation of machine code in ALGOL compilers

240 JORN JENSEN

We have already seen the actions called for by some of the input sym- bols. With the above examples as illustration we can now discuss some further actions:

2. adds'. This announces that we later on will refer to the address of the top operand and not to the value. Change top element to describe address instead of value.

8. If. Formal variable: A. Although not required by ALGOL we use the rule that operands

must be evaluated strictly from left to right. We must there- fore now save all values which may be changed by the formal reference. R F must in any case be saved and as the saving of other values uses this register it must be saved as the first. Output code: free w : = R F . Change corresponding stack ele- ment to describe the working location.

B. We can now search the stack from top to bot tom to look for other operands which should be saved. During this search we apply the following 3 rules: 1. The classes w , w a , e, and v a never require saving. This rule

shows the use of the class v a for although the formal refer- ence may change the value of a simple variable it can never change its address.

2. The classes UA, U A a , and UV always require saving when they are encountered during the search.

3. The class v requires saving only if the variable is global to the procedure in which the formal reference occurs, i.e., the blocknumber of the variable is less than the blocknumber of the formal, otherwise the formal reference cannot change the value of the variable.

The application of these rules in the example leads to a code for saving of b in w2 and to the corresponding change in the stack.

C. Finally we can generate code for referencing the formal vari- able. The result of such a formal reference will be an address delivered in UA. Output code: formal (f). Place the result description U A in the stack.

12 and 13. See next example.

EXAMPLE 3. Subscripted variables, multiple assignments. ALGOL statement: A If , j] := a := B[i]; a, A and B are real, i and j are integer.

Page 7: Generation of machine code in ALGOL compilers

GENERATIOI~ OF MACHINE CODE IN ALGOL COMPILERS 241

Action

1 2 3

4 5

Input

aA vi sub

vj ]

6 a d d r 7 vet 8 a d d r

9 a B 10 v i 11 ]

12 prepass 13 := 14 :=

Output Stack -~

v a A v a A v i

R F := i v a A R F := R F × c ( A , 1 ) v a A R F

v a A R F v j v a A v j R F

R E := R F + j v a A R F

index(A) UA U A a UAa va U A a v a a

wl := U A w a l vaa v a B w a l vaa v a B

R F := i w a l vaa v a B

index(B) w a l vaa UA M := store[UA] w a l vaa a := M w a l

8tore[wl] := M

v i

Descriptions of some of the actions:

1. a A

3. s u b ~

A.

B.

Array in input: A denotes the address of the run time array description word. A is placed in the stack with the class value va, as it does not need to be saved. One more entry in the stack is made but is not shown in the example. This entry is the value of a counter, subcount, belonging to the logic for t reatment of subscripts; it will be referred to below. I t must be stacked because we may have subscripted variables as subscripts. Now subcount is set to zero to indicate tha t we have no subscripts yet.

subscript comma: if subcount 4-0 then simulate input of a plus; comment this addition takes place in the example in action 5. I t shows how the action for + exchanges the two top elements when R F contains the operand next to the top and hereby re- duces the case to the ordinary case for + as shown in example 1. Put top operand in R F if it is not already there. Remove top dement .

Page 8: Generation of machine code in ALGOL compilers

242 JO~N JENSEN

C. subcount : --- subcount + 1 ; output code: R F := R F × coefficient in dopevector for top operand[subcount]. Place the result description RF in the stack.

5.] last subscript: A. Same as above in 3.A. B. Same as above in 3.B. C. Output code: index(topoperand). Remove top element, and

restore subeount to the stacked value and remove this. Place the result description UA in the stack. index is a run time subroutine which computes the final address of a subscripted variable, checks it, and places it in UA.

12. prepass prepare assign: assign ea se := if class[top] = c ^ address[top] = 0 then 1 else if class[top] = R then 2 else if class[top] = RF then 3 else 4; if assign case = 4 then output code (M := topoperand); remove top element; comment assign ease indicates the machine instruction which should be used for each occurrence of := . The possibilities are: 1: R := O. operand := R. This describes one machine instruction. 2: operand :-= R. Storing of a non arithmetic value. 3: operand := RF. Normal storing of an arithmetic value. 4: operand:= M. M is the multiplier register of the machine.

I t is used when we only have to move a word from one cell to another because it is a little faster to load than the R register. M is part of t he /~F register, therefore R F must be saved before M is used.

13. and 14 := assignment:

output code for assignment to top operand according to the value of assign case as described above.

EXAMPLE 4. Constant operand. ALGOL statement: a := 3.1415/6 x a ~ 2 ; a is real

Page 9: Generation of machine code in ALGOL compilers

G E N E R A T I O N O F M A C H I N E C O D E I N A L G O L C O M P I L E R S 243

Action

1 2 3 4 5 6 7 8

9 10 11

Input Output Stack -~

v a v a

a d d r v a a

c3.1415 v a a c3.1415 c6 v a a c3.1415 [ vaa c.5236 v a v a a c.5236 c2 v a a c.5236

real R F : -- a v a a c.5236 I~F : = R F × a v a a c.5236

× R F : = R F × . 5 2 3 6 v a a R F

p r e p a s s v a a

: = a : = R F

c6

v a

v a c2

v a

R F

This example illustrates how constant expressions are evaluated at compile time. I t also illustrates the special t reatment of ~. ~ tests for exponent = 2 v exponent = 3 and performs in these cases the exponentia- tion by multiplications instead of by subroutine call.

G e n e r a l a c t i o n s .

Below we will describe 4 of the more general mechanisms which are implied by the actions in the examples. These are the assignment of working locations, the output of operands, the handling of R and RF, and the saving process.

1. Ass ignment of working locations.

The base for the assignment of working locations is the relative address of the first location after the local variables. This quant i ty appears as input after each block begin.

The algorithms used can be described as follows: integer i , j , f i r s t w ,

last w; B o o l e a n a rra y used w[l:40]; at each block begin we save the values of f i r s t w , last w , and used w in the stack. Then follows:

f i r s t w := last w : = i n p u t ;

for i :-- 1 step 1 until 40 do used w[i] := fa l s e ;

At each block end we output last w (for use for reservation at run time) and restore f i r s t w , last w , and used w from the stack.

I~ESERVATION OF A WORKING LOCATION,

A call of the function designator f r ee w w i l l reserve a working location and deliver the corresponding relative address as the value of f r ee w .

Page 10: Generation of machine code in ALGOL compilers

2~ J~RN JENSE~

integer procedure free w; begin for i : = 0, i + l while used w[i] do j : = i; used w[j+ 1] := true; free w := j := j+f i rs t w; if j > last w then last w := j

end;

RELEASE A WORKING LOCATIOn.

Each time an element af type w or wa is removed from the stack we set the corresponding element of used w to false.

2. The output o[ operands.

The address information in the stack is used during output actions to specify the address part of the final machine instructions. A procedure, output topoperand and release, selects from the description of the top operand one of the 7 possible address formats, outputs the top operand in this format, releaves a working location when necessary, and removes the top operand. This means that the same mechanism can be used for output of an operation regardless of how the operand should be addressed.

3. Handling o[ R and RF.

An operation which requires that the top operand is in the accumulator must ask several questions to decide what actions to perform. In most cases a call of the procedure top to R F release (or, for logical operations, top to R release) will do the work. These procedures have at their disposal a pointer, R used in, which points at the stack element holding the de- scription R or RF. When no such description exists in the stack R used in is zero.

top to R F release may be described as follows:

procedure top to R F release; begin comment top denotes the stackpointer; if R used in # top ^ R used in # 0 then save R F in a working loca-

tion and change the corresponding description in the stack; if R used in # top then

output code (RF := topoperand); top:= t o p - l ; /2 used in := O; comment the action calling this procedure must provide for a new stacking of RF and setting of R used in when appropriate end

Page 11: Generation of machine code in ALGOL compilers

GENERATION OF MACHINE CODE IN ALGOL COMPILERS 245

4. The saving process.

Before references to formal parameters can be generated the stack must be searched for values to be saved as described in example 2. A corresponding saving process takes place in connection with function designator calls and conditional expressions. This process is conveniently taken care of by save (blocklimit) which provides for the appropriate actions for saving of the accumulator and the relevant operands with bloeknumbers less than blocIdimit.

By assigning fictive blocknumbers to UA and U V this procedure will handle all necessary saving processes.

The implementation of the logic. The action of a new operator in the input can be described in terms

of a piece of program for each operator and a set of subroutines per- forming the common tasks. In straightforward coding this scheme was too space consuming for our compiler. The final design is therefore based on an interpretive scheme which works as fellows: The input symbol for an operator (or the first symbol of an operand) points to a word in a table. Each word contains from 1 to 4 pseudo instructions, which are obeyed interpretively by the central logic of the process. Each pseudo instruction consists of 10 bits which specify one out of three actions and a parameter to this action. The action is specified by the first one or two bits as follows:

11: Output the value of the last 8 bits. 10: Output the value of the last 8 bits followed by the output pro-

vided by the procedure output top operand and release. 0x: Call the subroutine whose address is specified by the last 9 bits.

The list of pseudo instructions which await execution may be extended or changed by any of the subroutines and this provides for flexible and economical specification of complicated actions.

REFERENCES

1. P. Naur, The Design qf the GIER ALGOL Compiler, BIT 3 (1963), 124-140 and 145-166. 2. P. Naur, Checking of Operand Types in ALGOL Compilers, PIT 5 (1965), 151-163. 3. E. W. Dijkstra, ALGOL 60 Translation, ALGOL Bulletin Supplement no. I0, Math.

Centrum Amsterdam, Nov. 1961; Annual Review of Automatic Programming Vol. IN, 327-356, Pergamon Press, London 1963.

REGNECENTRALE~ COPENHAGEn, DENmARK