assembler
TRANSCRIPT
1
Assembler
CS 230
이준원
2
System Software
• components– translator
• assembler
• compiler
• interpreter
– system manager
• operating system
– other utilities
• loader
• linker
• DBMS, editor, debugger, ...
• purpose of this course
– understand how to build system software
– understand how these components work
3
Issues in System Software
• not many in this area
– mature area
• advanced architectures complicates system software
– superscalar CPU
– memory model
– multiprocessor
• new applications
– embedded systems
– mobile/ubiquitous computing
4
Assembler Overview
• functions
– translate programs written in assembly language to machine code
• mnemonic code to machine code
• symbols to addresses
– handles
• constants
• literals
• addressing
• 32 bit constant or address
• 32 bit offset
5
Assembler Overview (cont’d)
• pass 1: loop until the end of the program1. read in a line of assembly code2. assign an address to this line
• increment N (word addressing or byte addressing)3. save address values assigned to labels
• in symbol tables4. process assembler directives
• constant declaration• space reservation
• pass2: same loop1. read in a line of code2. translate op code
using op code table3. change labels to address
using the symbol table4. process assembler directives5. produce object program
6
Data Structures for Assembler
• op code table
– looked up for the translation of mnemonic code
• key: mnemonic code
• result: bits
– hashing is usually used
• once prepared, the table is not changed
• efficient lookup is desired
• since mnemonic code is predefined, the hashing function can be tuned a priori
– the table may have the instruction format and length
• to decide where to put op code bits, operands bits, offset bits
• for variable instruction size
• used to calculate the address
add $t0, $t1, $t2 000000 01001 01010 01000 00000 100000
7
Data Structures for Assembler (cont’d)
• symbol table
– stored and looked up to assign address to labels
• efficient insertion and retrieval is needed
• deletion does not occur
– difficulties in hashing
• non random keys
– problem
• the size varies widely
.text
.globl mainmain:
la $t0, arraylw $t1, countlw $t2, ($t0)
loop:lw $t3, 4($t0)ble $t3, $t2, loop2move $t2, $t3
loop2: add $t1, $t1, -1add $t0, $t0, 4bnez $t1, loop
……..data
array: .word 3, 5, 5, 1, 6, 7, …..count: .word 15string1: .asciiz “\nmax = “
8
Symbol Table Construction
.text
.globl mainmain:
la $t0, arraylw $t1, countlw $t2, ($t0)
loop:lw $t3, 4($t0)ble $t3, $t2, loop2move $t2, $t3
loop2: add $t1, $t1, -1add $t0, $t0, 4bnez $t1, loop
……..data
array: .word 3, 5, 5, 1, 6, 7, …..count: .word 15string1: .asciiz “\nmax = “bad: .word 7
symbol name value
main 0
loop 12
loop2 24
…
array 408
count 468
string1 472
bad 478
9
Assembler Algorithm: pass1begin
if starting address is givenLOCCTR = starting address;
elseLOCCTR = 0;
while OPCODE != END do ;; or EOFbeginread a line from the codeif there is a label
if this label is in SYMTAB, then errorelse insert (label, LOCCTR) into SYMTAB
search OPTAB for the op codeif found
LOCCTR += N ;; N is the length of this instruction (4 for MIPS)else if this is an assembly directive
update LOCCTR as directedelse errorwrite line to intermediate fileend
program size = LOCCTR - starting address;end
10
Assembler Algorithm: pass2begin
read a line;if op code = START then ;; .globl xxx for MIPS
write header record;while op code != END do ;; or EOF
beginsearch OPTAB for the op code;if found
if the operand is a symbol thenreplace it with an address using SYMTAB;
assemble the object code;else if is a defined directive
convert it to object code;add object code to the text;read next line;end
write End record to the text;output text;
end
add $t0, $t1, $t2 =>000000 01001 01010 01000 00000 100000
11
Program Relocation
• motivations for relocation
– a program may consists of several pieces of codes that are assembled independently
– when a program is assembled, it is impossible to know the exact location where the program starts
.
.jump to 1004
. ..
jump to 1004.
0
1076 5000
6076
1004
program is loaded at 0 program is loaded at 5000
.
.jump to 1004
.
12
Program Relocation (cont’d)
• distances from the origin of a program do not change
– make the address relative to the origin
– provides loader with information about
• which address needs fixing
• length of address field
– the loader change those addresses as
• distance + start address of a program
– only absolute addresses need to be changed
13
Literals
• usage
– encoded as an operand (similar to the immediate in MIPS, but different)
• load $7, =X’0A7F’
– simple way to declare a constant
– assembler does
• declare a constant with a label
• use the label to use the value
• comparison with immediate
– literal is an assembler directive
• immediate is a machine recognizable data
– full word can be used for literals
• immediate: full word – (opcode, registers)
– values are obtained from data memory - slow
• immediate data is within the instruction itself
14
Literals (cont’d)
• literal pool
– assembler collects all the literals into one or more literal pools
– default location is at the end of the program
• for better code reading
– programmer can declare a place (LTORG)
• to use PC-relative addressing
• to keep data close to instruction
• optimization
– make one literal for the same value
• compare character string or value?– x’454F46’ = c’EOF’
• value comparison needs evaluation
• literal table
– name(label), operand value, operand length, address in the table
– name and value are all used as a key
15
Literal Handling Algorithm
pass 1at a recognition of a literal
search LITTAB by name if found but different value, errorelse if the same value, no actionelse if not found insert a new literal (no address yet)
if the code is LTORG or ENDallocate each literal assigning an address
pass 2replace each literal with the address in the LITTABif these addresses are absolute,
prepare modification for relocation
16
Symbol Defining Statement
• MAXLEN EQU 4096
– makes program structure better
– easier to modify a single location
– easier to remember than numbers
– registers can be given meaningful names
– (maxlen = 4096) in MIPS
• assembler
– searches SYMTAB and replace the symbol with the value in the table
– resulting object code is the same as using the value instead of symbol
– remember that with 2 passes there is restrictionX EQU YY EQU 100
• X cannot be defined in pass 1
17
Expressions
BUFFER: .space 4096 ; reserve 4096 bytes here
BUFEND: ; set current location to BUFFEND
(MAXLEN = BUFEND – BUFFER) ; calculate the size of the buffer
• allows simple arithmetic operations in symbol definition
• operands may have relative values for relocation
– relative values should be modified by the loader later
• we need to know which is relative
– symbol table needs a type field to discern absolute symbols from relative symbols
18
Expression Rules
• basic
– constant is absolute
– address is relative
• using expressions
– expression with absolute arguments is absolute
– expression that has multiplication and division is absolute
– relative_1 - relative_2 is absolute
• dependencies on starting address are canceled out
– all the other expressions having relative terms are neither relative nor absolute (error?)
• constant - relative
• relative_1 + relative_2
• 3 x relative_1
19
Program Blocks
block 0
block 1
block 2
block 0
block 1
block 2
assembled
source
block 0
block 1
block 2
object code
20
Program Blocks (cont’d)
• motivation
– programmer’s view may be different from machine’s view
• affects only efficiency not functionality
– addressing can be simplified
• large data area can be moved to the end of code while source code places it close to the instructions that use this data
• data structure and algorithm
– block table (name, block number, address, length)
– pass 1
• maintain separate LOCCTR for each block– each label is assigned address relative to the start of the block that contains it
• SYMTAB stores block number for each symbol
• store starting address of each block in block table
– pass 2
• assign address to each symbol by adding the relative address to the block starting address
21
Control Sections
• control section is a part of program that can be assembled independent of other parts
– a large problem can be divided into many control sections
– each control section can be developed independently
– each control section can be modified independently
• symbols defined in other control sections
– called external
– assembler prepares those symbols
– loader & linker resolves the value of external symbols
22
Control Sections (cont’d)
• a table prepared by assembler
– define record
• name of symbol defined in this control section
• relative address of the symbol
– refer record
• name of external symbols
– modification record
• starting address of field to be modified
• length of this field
• name of external symbol
• loader
– for every external symbol
• find the relative address from the define record
• add the starting address of the control section where the symbol is defined
• modify the field
23
One-Pass Assembler
• problem
– forward reference: reference to symbols that are not defined yet
• why do we need one-pass assembler?
– fast
• useful for program development and testing
• university computing environment
• load-and-go assembler
– writes the object code on memory not on disk file
– since it is on memory it is easy to modify a part of object code
24
One-Pass Assembler (cont’d)
• one-pass assembler for load-and-go
– stores undefined symbols in the SYMTAB with the address of the field that references this symbol
– when the symbol is defined later, look up the SYMTAB and modify the field with correct address
• there may be many places to be modified
• what if object code is written on disk?
– bring back the text to memory
• efficiency of one-pass assembler cannot be justified
– make loader to modify the address at loading time
• modification record again
• optimization
– require all the data declaration be placed at the beginning of the program
• reduces reference resolution
25
Multi-Pass Assembler
• support forwarding reference even though it is bad for program readability
1.(A = B/2)2.(B = C-D) ....8. C .....9. D ..…
at 1, store in a table two tuples (A, 1, B/2, 0)1: one symbol is missing0: no other symbol depends on A(B, *, , &LB)*: don’t know how many symbols missing yetLB: list of symbols that depend on B (now, there is only A in this list)
at 2,insert (C,*, ,&LC), (D,*, ,&LD)
LC and LD contains only Bmodify (B,*, ,&LB) as (B,2,C-D,&LB)
after 8from LC, B is foundchange 2 to 1 in the B tuple meaning one symbol remains to be defined
after 9from LD, B is foundnow evaluate B with defined C, D values
since B is donefrom LB, A is foundnow A can be evaluated