cosc345 2013 software engineering - university of … · cosc345 2013 software engineering lecture...

21
COSC345 2013 Software Engineering Lecture 19: Linkers, Loaders, & Libraries

Upload: hoangtuyen

Post on 07-Sep-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

COSC345 2013 Software Engineering

Lecture 19: Linkers, Loaders, & Libraries

Outline •  Multiple source file projects •  Compiling •  Linking •  Loading •  Libraries

–  Static libraries –  Overlays –  Shared libraries

•  DLLs

From Source To Execution •  What is responsible for each step (each arrow)?

libgui.a

second.m first.m

first.o second.o

a.out

running program

mouse.m gui.m

gui.o mouse.o

Unix Windows file.m file.m file.o file.obj file.a file.lib a.out file.exe

A contingent fact of history •  Stephen Jay Gould said “human equality is a

contingent fact of history” — it’s true but it did not have to be, and history explains why.

•  Separation into compilers, linkers, and loaders is a contingent fact of history. There are other ways to do it.

•  Lisp, Smalltalk, Pop-2, Prolog, APL: compiler is part of runtime, interactive programmer loads source code into running system, even pl=OS!

•  BlueJ approximates that for Java. •  Whole-program compilers (SmartEiffel, MIPS)

analyse/optimise whole program.

Separate compilation •  Big broken programs, small machines! •  Break program into replaceable parts (repair). •  Break program into reusable parts (libraries). •  Don’t load what you don’t need (configurable). •  Small pieces ⇒ compiler has room and time •  Many pieces ⇒ can compile in parallel

•  TANSTAAFL! •  How do you put the pieces together?

The secret •  We don’t have to compile all the way to directly

executable code. •  The output of a compiler can be a description of

the code. •  “It’s sort of like this but when you find the Amulet

of Yendor do that to it.” •  Compiler output = meta-program executed by

linker/loader to generate actual code. •  Classic steps are relocate code from logical

address to physical address •  and resolve references to external names to their

actual addresses.

Separate Source Files •  use.c extern int another;

int main(void) {

another = 1234;

return 0;

}

•  declare.c int another;

•  How does the compiler know: –  Where another is stored in memory?

•  How can the compiler produce the machine code?

The Compiler •  Leave gaps in machine code when referencing externs •  use.c extern int another; int main(void) { another = 1234; } •  Compiler output for use.c 0001 18c0 sect 0 0002 _main: 0003 18c0 cc 04 d2 ldd #1234 0004 18c3 fd 00 00 std _another 0005 18c6 39 rts

•  On line 0004 another is at location 0000!

The Compiler •  Allocate space for global variables •  declare.c int another;

•  Compiler output for declare.c 0001 4000 sect 1

0002 _another:

0003 4000 rmb2

•  On line 0003 space is allocated for another

Revision: Segments •  Program

–  The code or text segment

•  Local Variables –  The stack segment

•  Global variables –  The data segment

•  In the example

Software Model

Stack

Program

Heap

Globals Data segment

Code (text) segment

Stack segment

“The break”

Segment Example Code sect 0 Data sect 1 Stack

The Linker •  Loads a set of object files and outputs an executable file •  Each input file is a set of segments (not related to x86

segments) –  Code / Data –  Symbol table –  Debugging information (not loaded)

•  Pass 1 –  Scan the input files to compute the segment sizes –  Collect the symbol tables together

•  Process –  Allocate locations for each symbol –  Lay the symbols out in the output (executable) file

•  Pass 2 –  Read and relocate the object code –  Replace symbol references with memory locations –  Copy segments into the output (executable) file

The Combined (Linked) Program 0001 * define starting addresses 0002 18c7 sect 0 * code 0003 1800 org $1800 0004 0000 sect 1 * data 0005 4000 org $4000 0006 7ffb stackbase equ $7ffb 0007 * 0008 * start of code 0009 * 0010 1800 sect 0 0011 1800 8e 7f fb lds #stackbase 0012 1803 bd 18 c0 jsr _main … 0002 _main: 0003 18c0 cc 04 d2 ldd #1234 0004 18c3 fd 40 00 std _another 0005 18c6 39 rts 0006 L1.use: 0001 4000 sect 1 0002 _another: 0003 4000 rmb 2

The Loader •  Read the executable file •  Allocate memory space for it •  Load each segment •  Initialize the stack (if needed)

–  Create stack segment (if needed) •  Set up environment, etc. •  Jump to program start

–  Initializes the stack (if needed)

Stack

Code

HEAP

Data Code

Data

Header

Other

a.out memory

Relocation •  In some systems (old and embedded systems)

–  Multiple programs in memory at one time –  No virtual address space

•  Executable format has a patch or relocation table •  Loader

–  Loads the executable at some base location –  For every direct memory address:

•  Adds the base to the address •  Uses the patch table to do this

•  Some hardware requires special attention –  E.g. Intel 8088 segmentation

Dynamic loaders •  Do the loading tasks after program starts. •  Also have to do some linker tasks. •  Position-independent code: executable code that

contains no absolute addresses so that it can be loaded anywhere in memory.

•  PC-relative code for branches; base+displacement addressing for external calls and data; use Global Offset Table in UNIX for resolution.

•  UNIX shared objects require PIC. •  Windows DLLs are dynamically relocated if not

loaded at their preferred address.

Static Libraries •  Just a collection of object files stored together plus a

combined symbol table (see ranlib(1)). –  In Unix they are created using ar

•  The archive program –  Using Windows they are created using LIB

•  Linux gcc -c first.c

gcc -c second.c

ar -r my.a first.o second.o

gcc -c use.c

gcc use.o my.a •  Windows

cl -c first.c

cl -c second.c

lib /out:my.lib first.obj second.obj

cl –c use.c

link use.obj my.lib

Overlays •  What if the program is larger than memory? •  Used:

–  When no virtual memory manager (VMM) available –  Before VMMs existed (e.g. DOS/360 MS-DOS)

•  Loader calls A or D (and loads one or the other) –  If A calls B then load B –  If A calls C then load C –  B cannot call C

•  C cannot call B –  A/B/C cannot call D

•  D cannot call A/B/C

•  A/B/C/D/Loader are sets of methods / objects

Loader

A B C

D

Dynamic unloading •  An overlay may be unloaded when no procedure

call is using it. (See dlclose(3) in UNIX.) •  Fortran and COBOL: “static” variables may be

reinitialised on re-entry to a procedure, because it might be in an overlay, and it’s not just the code that goes away when an overlay is unloaded, the data does too.

•  The ability to dynamically unload and reload a module means that a running program can be patched.

•  Erlang “hot loading”: modules can be replaced even while they are in use.

Static Shared Libraries •  Linker:

–  Loads the library –  Binds addresses (entry points) to the executable

•  Often via a branch table –  Throws the library away

•  Loader –  Load the library when program starts

•  Advantage: –  The library is only stored on disc once –  The program cannot be broken by library changes

•  Problems: –  Must be present when program is run –  Can’t change the library (much) once bound

DLL / OCX •  DLL: Dynamic link libraries

–  Load and bind at run time •  Allows library to change after program written

•  VBX / OCX / ActiveX –  Dynamic link and load of objects –  Load and bind at run time

•  Allows library to change after program written

References •  J. Levine, Linkers and Loaders •  https://developer.apple.com/library/mac/

#documentation/DeveloperTools/Conceptual/MachOTopics/1-Articles/loading_code.html#//apple_ref/doc/uid/TP40001830-SW1

•  Ian Lance Taylor’s notes on building a linker at http://www.airs.com/blog/archives/38