assembler

1

Assembler

CS 230

이준원

2

System Software

• components– translator

• assembler

• compiler

• interpreter

– system manager

• operating system

– other utilities

• loader

• linker

• DBMS, editor, debugger, ...

• purpose of this course

– understand how to build system software

– understand how these components work

3

Issues in System Software

• not many in this area

– mature area

• advanced architectures complicates system software

– superscalar CPU

– memory model

– multiprocessor

• new applications

– embedded systems

– mobile/ubiquitous computing

4

Assembler Overview

• functions

– translate programs written in assembly language to machine code

• mnemonic code to machine code

• symbols to addresses

– handles

• constants

• literals

• addressing

• 32 bit constant or address

• 32 bit offset

5

Assembler Overview (cont’d)

• pass 1: loop until the end of the program1. read in a line of assembly code2. assign an address to this line

• increment N (word addressing or byte addressing)3. save address values assigned to labels

• in symbol tables4. process assembler directives

• constant declaration• space reservation

• pass2: same loop1. read in a line of code2. translate op code

using op code table3. change labels to address

using the symbol table4. process assembler directives5. produce object program

6

Data Structures for Assembler

• op code table

– looked up for the translation of mnemonic code

• key: mnemonic code

• result: bits

– hashing is usually used

• once prepared, the table is not changed

• efficient lookup is desired

• since mnemonic code is predefined, the hashing function can be tuned a priori

– the table may have the instruction format and length

• to decide where to put op code bits, operands bits, offset bits

• for variable instruction size

• used to calculate the address

add $t0, $t1, $t2 000000 01001 01010 01000 00000 100000

7

Data Structures for Assembler (cont’d)

• symbol table

– stored and looked up to assign address to labels

• efficient insertion and retrieval is needed

• deletion does not occur

– difficulties in hashing

• non random keys

– problem

• the size varies widely

.text

.globl mainmain:

la $t0, arraylw $t1, countlw $t2, ($t0)

loop:lw $t3, 4($t0)ble $t3, $t2, loop2move $t2, $t3

loop2: add $t1, $t1, -1add $t0, $t0, 4bnez $t1, loop

……..data

array: .word 3, 5, 5, 1, 6, 7, …..count: .word 15string1: .asciiz “\nmax = “

8

Symbol Table Construction

.text

.globl mainmain:

la $t0, arraylw $t1, countlw $t2, ($t0)

loop:lw $t3, 4($t0)ble $t3, $t2, loop2move $t2, $t3

loop2: add $t1, $t1, -1add $t0, $t0, 4bnez $t1, loop

……..data

array: .word 3, 5, 5, 1, 6, 7, …..count: .word 15string1: .asciiz “\nmax = “bad: .word 7

symbol name value

main 0

loop 12

loop2 24

…

array 408

count 468

string1 472

bad 478

9

Assembler Algorithm: pass1begin

if starting address is givenLOCCTR = starting address;

elseLOCCTR = 0;

while OPCODE != END do ;; or EOFbeginread a line from the codeif there is a label

if this label is in SYMTAB, then errorelse insert (label, LOCCTR) into SYMTAB

search OPTAB for the op codeif found

LOCCTR += N ;; N is the length of this instruction (4 for MIPS)else if this is an assembly directive

update LOCCTR as directedelse errorwrite line to intermediate fileend

program size = LOCCTR - starting address;end

10

Assembler Algorithm: pass2begin

read a line;if op code = START then ;; .globl xxx for MIPS

write header record;while op code != END do ;; or EOF

beginsearch OPTAB for the op code;if found

if the operand is a symbol thenreplace it with an address using SYMTAB;

assemble the object code;else if is a defined directive

convert it to object code;add object code to the text;read next line;end

write End record to the text;output text;

end

add $t0, $t1, $t2 =>000000 01001 01010 01000 00000 100000

11

Program Relocation

• motivations for relocation

– a program may consists of several pieces of codes that are assembled independently

– when a program is assembled, it is impossible to know the exact location where the program starts

.

.jump to 1004

. ..

jump to 1004.

0

1076 5000

6076

1004

program is loaded at 0 program is loaded at 5000

.

.jump to 1004

.

12

Program Relocation (cont’d)

• distances from the origin of a program do not change

– make the address relative to the origin

– provides loader with information about

• which address needs fixing

• length of address field

– the loader change those addresses as

• distance + start address of a program

– only absolute addresses need to be changed

13

Literals

• usage

– encoded as an operand (similar to the immediate in MIPS, but different)

• load $7, =X’0A7F’

– simple way to declare a constant

– assembler does

• declare a constant with a label

• use the label to use the value

• comparison with immediate

– literal is an assembler directive

• immediate is a machine recognizable data

– full word can be used for literals

• immediate: full word – (opcode, registers)

– values are obtained from data memory - slow

• immediate data is within the instruction itself

14

Literals (cont’d)

• literal pool

– assembler collects all the literals into one or more literal pools

– default location is at the end of the program

• for better code reading

– programmer can declare a place (LTORG)

• to use PC-relative addressing

• to keep data close to instruction

• optimization

– make one literal for the same value

• compare character string or value?– x’454F46’ = c’EOF’

• value comparison needs evaluation

• literal table

– name(label), operand value, operand length, address in the table

– name and value are all used as a key

15

Literal Handling Algorithm

pass 1at a recognition of a literal

search LITTAB by name if found but different value, errorelse if the same value, no actionelse if not found insert a new literal (no address yet)

if the code is LTORG or ENDallocate each literal assigning an address

pass 2replace each literal with the address in the LITTABif these addresses are absolute,

prepare modification for relocation

16

Symbol Defining Statement

• MAXLEN EQU 4096

– makes program structure better

– easier to modify a single location

– easier to remember than numbers

– registers can be given meaningful names

– (maxlen = 4096) in MIPS

• assembler

– searches SYMTAB and replace the symbol with the value in the table

– resulting object code is the same as using the value instead of symbol

– remember that with 2 passes there is restrictionX EQU YY EQU 100

• X cannot be defined in pass 1

17

Expressions

BUFFER: .space 4096 ; reserve 4096 bytes here

BUFEND: ; set current location to BUFFEND

(MAXLEN = BUFEND – BUFFER) ; calculate the size of the buffer

• allows simple arithmetic operations in symbol definition

• operands may have relative values for relocation

– relative values should be modified by the loader later

• we need to know which is relative

– symbol table needs a type field to discern absolute symbols from relative symbols

18

Expression Rules

• basic

– constant is absolute

– address is relative

• using expressions

– expression with absolute arguments is absolute

– expression that has multiplication and division is absolute

– relative_1 - relative_2 is absolute

• dependencies on starting address are canceled out

– all the other expressions having relative terms are neither relative nor absolute (error?)

• constant - relative

• relative_1 + relative_2

• 3 x relative_1

19

Program Blocks

block 0

block 1

block 2

block 0

block 1

block 2

assembled

source

block 0

block 1

block 2

object code

20

Program Blocks (cont’d)

• motivation

– programmer’s view may be different from machine’s view

• affects only efficiency not functionality

– addressing can be simplified

• large data area can be moved to the end of code while source code places it close to the instructions that use this data

• data structure and algorithm

– block table (name, block number, address, length)

– pass 1

• maintain separate LOCCTR for each block– each label is assigned address relative to the start of the block that contains it

• SYMTAB stores block number for each symbol

• store starting address of each block in block table

– pass 2

• assign address to each symbol by adding the relative address to the block starting address

21

Control Sections

• control section is a part of program that can be assembled independent of other parts

– a large problem can be divided into many control sections

– each control section can be developed independently

– each control section can be modified independently

• symbols defined in other control sections

– called external

– assembler prepares those symbols

– loader & linker resolves the value of external symbols

22

Control Sections (cont’d)

• a table prepared by assembler

– define record

• name of symbol defined in this control section

• relative address of the symbol

– refer record

• name of external symbols

– modification record

• starting address of field to be modified

• length of this field

• name of external symbol

• loader

– for every external symbol

• find the relative address from the define record

• add the starting address of the control section where the symbol is defined

• modify the field

23

One-Pass Assembler

• problem

– forward reference: reference to symbols that are not defined yet

• why do we need one-pass assembler?

– fast

• useful for program development and testing

• university computing environment

• load-and-go assembler

– writes the object code on memory not on disk file

– since it is on memory it is easy to modify a part of object code

24

One-Pass Assembler (cont’d)

• one-pass assembler for load-and-go

– stores undefined symbols in the SYMTAB with the address of the field that references this symbol

– when the symbol is defined later, look up the SYMTAB and modify the field with correct address

• there may be many places to be modified

• what if object code is written on disk?

– bring back the text to memory

• efficiency of one-pass assembler cannot be justified

– make loader to modify the address at loading time

• modification record again

• optimization

– require all the data declaration be placed at the beginning of the program

• reduces reference resolution

25

Multi-Pass Assembler

• support forwarding reference even though it is bad for program readability

1.(A = B/2)2.(B = C-D) ....8. C .....9. D ..…

at 1, store in a table two tuples (A, 1, B/2, 0)1: one symbol is missing0: no other symbol depends on A(B, *, , &LB)*: don’t know how many symbols missing yetLB: list of symbols that depend on B (now, there is only A in this list)

at 2,insert (C,*, ,&LC), (D,*, ,&LD)

LC and LD contains only Bmodify (B,*, ,&LB) as (B,2,C-D,&LB)

after 8from LC, B is foundchange 2 to 1 in the B tuple meaning one symbol remains to be defined

after 9from LD, B is foundnow evaluate B with defined C, D values

since B is donefrom LB, A is foundnow A can be evaluated

assembler

Documents