kernfach system software ws04/05

387
1 Kernfach System Software WS04/05 P. Reali M. Corti

Upload: melody

Post on 10-Jan-2016

38 views

Category:

Documents


7 download

DESCRIPTION

Kernfach System Software WS04/05. P. Reali M. Corti. Introduction Admin. Lecture Mo 13-14IFW A 36 We 10-12IFW A 36 Exercises Always on Thursday. 14-15IFW A34C. Tuduce(E) 14-15IFW C42V. Naoumov(E) 15-16IFW A32.1I. Chihaia(E) 15-16RZ F21C. Tuduce (E) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Kernfach System Software WS04/05

1

KernfachSystem Software

WS04/05

P. Reali

M. Corti

Page 2: Kernfach System Software WS04/05

System-Software WS 04/05

2

© P. Reali / M. Corti

IntroductionAdmin

Lecture– Mo 13-14 IFW A 36– We 10-12 IFW A 36

Exercises– Always on Thursday

14-15 IFW A34 C. Tuduce (E)14-15 IFW C42 V. Naoumov (E)15-16 IFW A32.1 I. Chihaia (E)15-16 RZ F21 C. Tuduce (E)16-17 IFW A34 T. Frey (E)16-17 IFW A32.1 K. Skoupý (E)

Page 3: Kernfach System Software WS04/05

System-Software WS 04/05

3

© P. Reali / M. Corti

IntroductionAdditional Info

Internet– Homepage http://www.cs.inf.ethz.ch/ssw/ – Inforum vis site

Textbooks & Co.– Lecture Slides– A. Tanenbaum, Modern Operating Systems– Silberschatz / Gavin, Operating Systems Concepts– Selected articles and book chapters

Page 4: Kernfach System Software WS04/05

System-Software WS 04/05

4

© P. Reali / M. Corti

IntroductionExercises

Exercises are optional(feel free to shoot yourself in the foot)

– Weekly paper exercisestest the knowledge acquired in the lectureidentify troubles earlyexercise questions are similar to the exam ones

– Monthly programming assignmentfeel the gap between theory and practice

Page 5: Kernfach System Software WS04/05

System-Software WS 04/05

5

© P. Reali / M. Corti

IntroductionExam

Sometimes in March 2005 Written, 3 hours Allowed help

– 2 A4 page summary– calculator

Official Q&A session 2 weeks before the exam

Page 6: Kernfach System Software WS04/05

System-Software WS 04/05

6

© P. Reali / M. Corti

IntroductionLecture Goals

Operating System Concepts– bottom-up approach– no operating system course– learn most important concepts– feel the complexity of operating systems

there‘s no silver-bullet!

Basic knowledge for other lectures / term assignments– Compilerbau– Component Software– ....– OS-related assignments

Page 7: Kernfach System Software WS04/05

System-Software WS 04/05

7

© P. Reali / M. Corti

IntroductionWhat is an operating system?

An operating system has two goals: Provide an abstraction of the hardware

– ABI (application binary interface)– API (application programming interface)– hide details

Manage resources– time and space multiplexing– resource protection

Page 8: Kernfach System Software WS04/05

System-Software WS 04/05

8

© P. Reali / M. Corti

IntroductionOperating system target machines

Targets mainframes servers multiprocessors desktops real-time systems embedded systems

Different goals and requirements!

memory efficiency reaction time abstraction level resources security ...

Page 9: Kernfach System Software WS04/05

System-Software WS 04/05

9

© P. Reali / M. Corti

IntroductionMemory vs. Speed Tradeoff

Example: retrieve a list of namesmemory time

1. Array Nn N2. List N(n+4) N/23. Bin. Tree N(n+8)

log(N)4. Hash Table 3Nn 1

N = # namesn = name length

Page 10: Kernfach System Software WS04/05

System-Software WS 04/05

10

© P. Reali / M. Corti

IntroductionOperating System as resource manager

... in the beginning was the hardware!

Most relevant resources:

CPU Memory Storage Network

Page 11: Kernfach System Software WS04/05

System-Software WS 04/05

11

© P. Reali / M. Corti

IntroductionLecture Topics

MemoryCPU Network

Abs

trac

tion

leve

l

Disk

Scheduling

Virtual Memory

Demand Paging

Thread

Process

Coroutine

Memory Management

Garbage CollectionConcurrencySupport

File System

Object-OrientedRuntime Support

DistributedFile-System

DistributedObject-System

Virtual Machine

Runtime support

Page 12: Kernfach System Software WS04/05

System-Software WS 04/05

12

© P. Reali / M. Corti

IntroductionA word of warning....

Most of the topics may seem simple.....

.... and in fact they are!

Problems are mostly due to: complexity when integrating system low-level („bit fiddling“) details bootstrapping (X needs Y, Y needs X)

Page 13: Kernfach System Software WS04/05

System-Software WS 04/05

13

© P. Reali / M. Corti

Locks

Storage

Modules

Processor

Memory

Interrupts

ActiveTraps

TimersSMP

IntroductionBootstrapping (Aos)

Leve

l

Page 14: Kernfach System Software WS04/05

System-Software WS 04/05

14

© P. Reali / M. Corti

IntroductionLecture Topics

Overview

Runtime Support

Virtual Addressing

Memory Management

Distributed Obj. System

Concurrency

Concurrency

Disc / Filesystem

Case Study: JVM

Oct

‘04

Nov

‘04

Dec

‘04

Jan

‘05

Feb

‘05

Page 15: Kernfach System Software WS04/05

System-Software WS 04/05

15

© P. Reali / M. Corti

Run-time SupportOverview

Support for programming abstractions– Procedures

calling conventions parameters

– Object-Oriented Model objects methods (dynamic dispatching)

– Exceptions Handling– ... more ...

Page 16: Kernfach System Software WS04/05

System-Software WS 04/05

16

© P. Reali / M. Corti

Call a.P Call b.Q Call b.q Call b.q Return b.q Return b.q Call c.R Return c.R Return b.QReturn a.P

a.P

b.Q

b.q

1

1

Run-time SupportApplication Binary Interface (ABI)

Object a, b, c, … with methods P, Q, R, … and internal procedures p, q, r, …

Call SequenceStack

Pointer (SP)

ProcedureActivation

Frame (PAF)a.P

b.Q

b.q

b.q

2

2

a.P

b.Q

3

3

a.P

b.Q

c.R

4

4

Stack

Page 17: Kernfach System Software WS04/05

System-Software WS 04/05

17

© P. Reali / M. Corti

locals

params

Run-time SupportProcedure Activation Frame

DynamicLink

FramePointer (FP)

Save RegistersPush ParametersSave PCBranchSave FPFP := SPAllocate Locals

Remove Locals Restore FPRestore PCRemove ParametersRestore Registers

FP‘PC

StackPointer (SP)

Ca

ller

Ca

llee

CallerFrame

Call

Return

Ca

ller

Page 18: Kernfach System Software WS04/05

System-Software WS 04/05

18

© P. Reali / M. Corti

Run-time SupportProcedure Activation Frame, Optimizations

Many optimizations are possible– use registers instead of stack– register windows– procedure inlining– use SP instead of FP addressing

Page 19: Kernfach System Software WS04/05

System-Software WS 04/05

19

© P. Reali / M. Corti

Run-time SupportProcedure Activation Frame (Oberon / x86)

push paramscall P push fp

mov fp, spsub sp, size(locals)

mov sp, fppop fpret size(params)

Caller Callee

...

push pcpc := P

pop pcadd sp,size(params)

Page 20: Kernfach System Software WS04/05

System-Software WS 04/05

20

© P. Reali / M. Corti

Run-time SupportCalling Convention

Convention between caller and callee– how are parameters passed

data layout left-to-right, right-to-left registers register window

– stack layout dynamic link static link

– register saving reserved registers

Page 21: Kernfach System Software WS04/05

System-Software WS 04/05

21

© P. Reali / M. Corti

Run-time SupportCalling Convention (Oberon)

Parameter passing:– on stack (exception: Oberon/PPC uses registers)– left-to-right– self (methods only) as last parameter– structs and arrays passed as reference, value-parameters

copied by the callee Stack

– dynamic link– static link as last parameter (for local procedures)

Registers– saved by caller

Page 22: Kernfach System Software WS04/05

System-Software WS 04/05

22

© P. Reali / M. Corti

Run-time SupportCalling Convention (C)

Parameter passing:– on stack – right-to-left– arrays passed as reference (arrays are pointers!)

Stack– dynamic link

Registers– some saved by caller

Page 23: Kernfach System Software WS04/05

System-Software WS 04/05

23

© P. Reali / M. Corti

Run-time SupportCalling Convention (Java)

Parameter passing– left-to-right– self as first parameter– parameters pushed as operands– parameters accessed as locals– access through symbolic, type-safe operations

Page 24: Kernfach System Software WS04/05

System-Software WS 04/05

24

© P. Reali / M. Corti

Run-time SupportObject Oriented Support, Definitions

Obj x = new ObjA();

• static type of x is Obj• dynamic type of x is ObjA

x compiled as being compatible with Obj, but executes as ObjA.

static and dynamic type can be different the system must keep track of the

dynamic type with an hidden„type descriptor“

Obj0

Obj

ObjA

ObjB

Class Hierarchy

Polymorphism

Page 25: Kernfach System Software WS04/05

System-Software WS 04/05

25

© P. Reali / M. Corti

Run-Time SupportPolymorphism

VARt: Triangle;s: Square;o: Figure;

BEGINt.Draw();s.Draw();o.Draw();

END;

WHILE p # NIL DOp.Draw(); p := p.next

END;

Type is discovered at runtime!

Type is statically known!

Page 26: Kernfach System Software WS04/05

System-Software WS 04/05

26

© P. Reali / M. Corti

Run-time SupportObject Oriented Support, Definitions

Obj x = new ObjA();

if (x IS ObjA) { ... }// type test

ObjA y = (ObjA)x// type cast

x = y;// type coercion// (automatic convertion)

Obj0

Obj

ObjA

ObjB

Class Hierarchy

Page 27: Kernfach System Software WS04/05

System-Software WS 04/05

27

© P. Reali / M. Corti

Run-time SupportObject Oriented Support (High-level Java)

.... a IS T ....

if (a != null) {Class c = a.getClass();while ((c != null) && (c != T)) {

c = c.getSuperclass();}return c == T;

} else {return false;

}

Type Test Implementation

Page 28: Kernfach System Software WS04/05

System-Software WS 04/05

28

© P. Reali / M. Corti

Run-Time SupportType Descriptors

struct TypeDescriptor {int level;type[] extensions;method[] methods;

}

class Object {TypeDescriptor type;

}

many type-descriptor layouts are possible

layout depends on the optimizations choosen

Page 29: Kernfach System Software WS04/05

System-Software WS 04/05

29

© P. Reali / M. Corti

Run-Time SupportType Tests and Casts

0

1

2

Obj0

Obj

ObjAObjB

0: Obj01: NIL2: NIL3: NIL

TD(Obj0)

TD(Obj)

0: Obj01: Obj2: NIL3: NIL

0: Obj01: Obj2: ObjA3: NIL

TD(ObjA)

(obj IS T)

obj.type.extension[ T.level ] = T

mov EAX, objmov EAX, -4[EAX]cmp T, -4 * T.level - 8[EAX]bne ....

“extension level”

Page 30: Kernfach System Software WS04/05

System-Software WS 04/05

30

© P. Reali / M. Corti

Run-time SupportObject Oriented Support (High-level Java)

.... a.M(.....) ....

Class[] parTypes = new Class[params.Length()];for (int i=0; i< params.Length(); i++) {

parTypes[i] = params[i].getClass();}Class c = a.getClass();Method m = c.getDeclaredMethod(“M”, parTypes);res = m.invoke(self, parValues);

Method Call Implementation

Use method implementation for the actual

class(dynamic type)

Page 31: Kernfach System Software WS04/05

System-Software WS 04/05

31

© P. Reali / M. Corti

Disadvantages:• memory usage• bad integration (explicit self)• non constantAdvantages:• instance bound• can be changed at run-time

Run-Time SupportHandlers / Function Pointers

TYPESomeType = POINTER TO SomeTypeDesc;Handler = PROCEDURE (self: SomeType; param: Par);SomeTypeDesc = RECORD

handler: Handler;next: SomeType;

END

handler

next handler

next handler

next

PROC Q

PROC R

root

Page 32: Kernfach System Software WS04/05

System-Software WS 04/05

32

© P. Reali / M. Corti

Run-Time SupportMethod tables (vtables)

TYPEA = OBJECT

PROCEDURE M0;PROCEDURE M1;

END A;

B = OBJECT (A)PROCEDURE M0;PROCEDURE M2;

END B;

B.M0 overrides A.M0

B.M2 is new

0: A.M0

1: A.M1

A.MethodTable

0: A.M0

1: A.M1

B.MethodTable

2: B.M2

B.M0

Idea:have a per-type table of function pointers.

• New methods add a new entry in the method table• Overrides replace an entry in the method table• Each method has an unique entry number

Page 33: Kernfach System Software WS04/05

System-Software WS 04/05

33

© P. Reali / M. Corti

Run-Time SupportMethod tables

TYPEA = OBJECT

PROCEDURE M0;PROCEDURE M1;

END A;

B = OBJECT (A)PROCEDURE M0;PROCEDURE M2;

END B;

0: A.M0

1: A.M1

A.MethodTable

0: A.M0

1: A.M1

B.MethodTable0: B.M0

2: B.M2

Virtual Dispatch

o.M0;

call o.Type.Methods[0]

mov eax, VALUE(o)mov eax, type[eax]mov eax, off + 4*mno[eax]call eax

o

Fields

Type

Page 34: Kernfach System Software WS04/05

System-Software WS 04/05

34

© P. Reali / M. Corti

Run-Time SupportOberon Type Descriptors

obj size

obj fields

ext table

mth table

type name

type desc

td size

type desc

• method table• superclass table• pointers in object for GC

type descriptor is also an object!

type desc

ptr offsetsfor garbage collection

for object allocation

for type checks

for method invocation

Page 35: Kernfach System Software WS04/05

System-Software WS 04/05

35

© P. Reali / M. Corti

Run-Time SupportInterfaces, itables

interface A {void m();

}

interface B {void p();

}

Object x;A y = (A)x;

y.m();

does x implement A?

x has an method table (itable) for each

implemented interface

multiple itables:how is the right itable

discovered?

Page 36: Kernfach System Software WS04/05

System-Software WS 04/05

36

© P. Reali / M. Corti

Run-Time SupportInterface support

How to retrieve the right method table (if any)? Global table indexed by [class, interface] Local (per type) table / list indexed by

[interface]

Many optimizations are availableuse the usual trick:

enumerate interfaces

Page 37: Kernfach System Software WS04/05

System-Software WS 04/05

37

© P. Reali / M. Corti

Run-Time SupportInterface support (I)

methodtable(vtable)

interfaces

methodtable(itable)

Intf0

methodtable(itable)

Intf7

Type Descriptor

Intf0 y = (Intf0)x;y.M();

interface i = x.type.interfaces;while ((i != null) && (i != Intf0) {

i = i.next;}if (i != null) i.method[mth_nr]();

Call is expensive because requires traversing a list: O(N) complexity

Page 38: Kernfach System Software WS04/05

System-Software WS 04/05

38

© P. Reali / M. Corti

Run-Time SupportInterface support (II)

vtable

interfaces

itable2

0

itable7

1 2 3 4 5 6 7 sparse array!

Intf0 y = (Intf0)x;y.M();

interface i = x.type.interfaces[Intf0];

if (i != null) i.method[mth_nr]();

Lookup is fast (O(1)), but wastes

memory

Type Descriptor

Page 39: Kernfach System Software WS04/05

System-Software WS 04/05

39

© P. Reali / M. Corti

Run-Time SupportInterface Implementation (III)

vtablet

interfaces

itablet,2

0

itablet,7

1 2 3 4 5 6 7vtablet

interfaces

itableu,2

itableu,0

0 1 2 3 4 5 6 7

overlapinterface table

indexType Descriptor t

Type Descriptor u

Page 40: Kernfach System Software WS04/05

System-Software WS 04/05

40

© P. Reali / M. Corti

Run-Time SupportInterface Implementation (III)

vtable

interfaces

itable

itable

vtable

interfaces

itable

itable

overlappedinterface table indexType Descriptor

Type Descriptor

Page 41: Kernfach System Software WS04/05

System-Software WS 04/05

41

© P. Reali / M. Corti

Run-Time SupportInterface Implementation (III)

vtable

interfaces

itable

itable

itable

itable

overlappedinterface

tables

Type DescriptorIntf0 y = (Intf0)x;y.M();

itable i = x.type.interfaces[Intf0];

if ((i != null) && (i in x.type))i.method[mth_nr]();

Page 42: Kernfach System Software WS04/05

System-Software WS 04/05

42

© P. Reali / M. Corti

Run-Time Support Exceptions

void catchOne() {

try {

tryItOut();

} catch (TestExc e) {

handleExc(e);

}

}

void catchOne()0 aload_01 invokevirtual tryItOut();4 return5 astore_16 aload_07 aload_18 invokevirtual handleExc11 return

ExceptionTableFrom To Target Type0 4 5 TestExc

Page 43: Kernfach System Software WS04/05

System-Software WS 04/05

43

© P. Reali / M. Corti

Run-Time Support Exception Handling / Zero Overhead

void ExceptionHandler(state){

pc = state.pc, exc = state.exception;

while (!Match(table[i], pc, exc)){

i++;if (i == TableLength) {

PopActivationFrame(state); pc = state.pc; i = 0;

}}state.pc = table[i].pchandler;ResumeExecution(state)

}

try {.....

} catch (Exp1 e) {.....

} catch (Exp2 e) {.....

}

pcstart

pcend

pchandler1

pchandler2

start end exception handler

pcstart pcend Exp1 pchandler1

pcstart pcend Exp2 pchandler2

Global Exception Table

Page 44: Kernfach System Software WS04/05

System-Software WS 04/05

44

© P. Reali / M. Corti

Run-Time Support Exception Handling / Zero Overhead

exception table filled by the loader / linker traverse whole table for each stack frame system has default handler for uncatched

exceptions

no exceptions => no overhead exception case is expensive

system optimized for normal case

Page 45: Kernfach System Software WS04/05

System-Software WS 04/05

45

© P. Reali / M. Corti

Run-Time Support Exception Handling / Fast Handling

try {.....

} catch (Exp1 e) {.....

} catch (Exp2 e) {.....

}

pchandler1

pchandler2

try {save (FP, SP, Exp1, pchandler1)save (FP, SP, Exp2, pchandler2).....remove catch descr.jump end

} catch (Exp1 e) {.....remove catch descr.jump end

} catch (Exp2 e) {.....remove catch descr.

jump end}end:

push catchdescriptors on

the stack

add codeinstrumentation

use an exception stack to

keep track of the handlers

Page 46: Kernfach System Software WS04/05

System-Software WS 04/05

46

© P. Reali / M. Corti

Run-Time Support Exception Handling / Fast Handling

void ExceptionHandler(ThreadState state){

int FP, SP, handler;Exception e;

do{retrieve(FP, SP, e, handler);

} while (!Match(state.exp, e));

state.fp = FP; // set frame to the onestate.sp = SP; // containing the handlerstate.pc = handler; // resume with the handlerResumeExecution(state)

}

pop next exception descriptor from exception stack

can resume in a different

activation frame

Page 47: Kernfach System Software WS04/05

System-Software WS 04/05

47

© P. Reali / M. Corti

Run-Time Support Exception Handling / Fast Handling

code instrumentation insert exception descriptor at try remove descriptor before catch

fast exception handling overhead even when no exceptions

system optimized for exception case

Page 48: Kernfach System Software WS04/05

System-Software WS 04/05

48

© P. Reali / M. Corti

Virtual Addressing Overview

Virtual Addressing: abstraction of the MMU(Memory Management Unit)

Work with virtual addresses, whereaddressreal = f(addressvirtual)

Provides decoupling from real memory– virtual memory– demand paging– separated address spaces

Page 49: Kernfach System Software WS04/05

System-Software WS 04/05

49

© P. Reali / M. Corti

Virtual AddressingPages

Memory as array of pages

12345

76

0

3

1

20

0

5

virtual address-space 2

real memory:pool of page frames

virtual address-space 1

unmapped(invalid) page

pagepageframe

unmappedrange

mapping

programs use and run in this

address spaces

memory address

Page 50: Kernfach System Software WS04/05

System-Software WS 04/05

50

© P. Reali / M. Corti

Translation Lookaside BufferAssociative Cache

(PT, VA, RA)(PT, VA, RA)(PT, VA, RA)

Virtual Address Real Address

page-no off

Virtual AddressingPage mapping

Page Table Ptr

Register

off

Virtual Address Real Addressframe

Real Memory

Page Frameframe

Page Table

page-no

off

frame

MMU

TLB

Page 51: Kernfach System Software WS04/05

System-Software WS 04/05

51

© P. Reali / M. Corti

Virtual AddressingDefinitions

page smallest unit in the virtual address space

page frame unit in the physical memory

page table table mapping pages into page frames

page fault access to a non-mapped page

working set pages a process is currently using

Page 52: Kernfach System Software WS04/05

System-Software WS 04/05

52

© P. Reali / M. Corti

pr

Virtual AddressingAlternate Page Mapping

Multilevel page tables Multipart Virtual Address Page table as (B*-)Tree

Inverted Page-Table

pno1 pno2 off

0

1

N

vp

vp

vp

pr

pr

pr

Hash

pr, vp pf

vp

vp

Next probe

pr

Processpf

pf

pf

pf

pfHashtable

64 bit Address Space

1. Level Table2. Level Table

unassigned

unassigned

Page 53: Kernfach System Software WS04/05

System-Software WS 04/05

53

© P. Reali / M. Corti

Virtual AddressingWhat for?

Decoupling from real memory– virtual memory (cheat: use more virtual memory than the

available real memory)– dynamically allocated contiguous memory blocks (for

multiple stacks in multitasking systems)– some optimizations

null reference checks garbage collection (using dirty flag)

Virtual Addressing is not for free!– address mapping may require additional memory accesses– page table takes space

Page 54: Kernfach System Software WS04/05

System-Software WS 04/05

54

© P. Reali / M. Corti

Virtual AddressingVirtual Memory

Use secondary storage (disc) to keep currently unused pages (swapping)

Page table usually keeps some per-page flag invalid page not mapped referenced page has been referenced dirty page has been modified

Accessing an invalid page causes a page-fault interrupt select page frame to be swapped out (victim or candidate) swap-in requested page frame

Page 55: Kernfach System Software WS04/05

System-Software WS 04/05

55

© P. Reali / M. Corti

Virtual AddressingVirtual Memory / Demand Paging

“Page-out”

“Page-in”Real Memory

Disc

Page Table

victimset to invalid

requestedpage

Page 56: Kernfach System Software WS04/05

System-Software WS 04/05

56

© P. Reali / M. Corti

Virtual AddressingDemand Paging Sequence

ELSE Access Page Table; IF Page invalid THEN Page-Fault ELSE RETURN RA ENDEND

IF Free Page Frame exists THEN Assign frame to VAELSE Search victim page; IF victim page modified THEN page-out to secondary storage END; Invalidate victim page; Assign frame to VAEND;Page-in from secondary storage;Reset invalid flag

MMU

OSPage-Fault

Handler

IF VA IN TLB THEN RETURN RA

TLB

E[t] = PTLB * tTLB +PPT * tPT +Pdisc * tdisc

Expected time to translateVA into RA

Page 57: Kernfach System Software WS04/05

System-Software WS 04/05

57

© P. Reali / M. Corti

Virtual Addressing Example

Page size 4 KBAddress size 32 Bits

addressable memory: 232 = 4GB

page offset: 12 Bits (4KB = 212)page number: 20 Bits (32 - 12)

page table size: 220 * 32 Bits = 4 MB

Real Memory 128 MB page table overhead: ca. 3%

Page 58: Kernfach System Software WS04/05

System-Software WS 04/05

58

© P. Reali / M. Corti

Virtual Addressing Example

mov EAX, @Addr

1-PTLB

PageTable

TLBPTLB

DiscPpage

fault

Memory1-Ppage

fault 1 disc read1 disc write

1 memory read

E[t] = PTLB tTLB + (1- PTLB)(tPT + PPF tdisc + (1-PPF)tmem)

Page 59: Kernfach System Software WS04/05

System-Software WS 04/05

59

© P. Reali / M. Corti

Virtual AddressingDemand Paging: Page Replacement

Optimal Strategy (Longest Unused) Take the page, that will remain unused for the

longest time Requires oracle

Pref ref mod

3 0 0

2 0 1

1 1 0

0 1 1

NRU: ”Not Recently Used” Reset the referenced flag at each

tick Create page categories (good

candidate to bad candidate) choose best candidate

Page 60: Kernfach System Software WS04/05

System-Software WS 04/05

60

© P. Reali / M. Corti

Virtual AddressingDemand Paging: Page Replacement (2)

LRU: “Least Recently Used” Assumption:

not used in past ==> not used in the future Hardware implementation

64-Bit time-stamp for each page Software implementation

“Aging”-Algorithm Choose page with lowest value

t

0 0 0 01 111 1 1

0 1 11

01 11

Reference Flag

t(i)

t(i+1)

set if page accessed

Page 61: Kernfach System Software WS04/05

System-Software WS 04/05

61

© P. Reali / M. Corti

Virtual AddressingDemand Paging: Page Replacement (3)

“Least Recently Created” LRC (FIFO) Page Lifespan as metric (old are swapped out) Chain sorted by creation time Bad handling for often-used pages

Fix: “second chance” when accessed (ref flag set) during the last tick

earliest

Ref-Flag

cur := earliest;WHILE cur.ref DO cur.ref := FALSE; cur := cur.nextEND

next

Page 62: Kernfach System Software WS04/05

System-Software WS 04/05

62

© P. Reali / M. Corti

Virtual AddressingDemand Paging: Page Replacement (4)

Strategies:– optimal– LRU / NRU / LRC

Exceptions:– “page pinning”: page cannot be swapped out

kernel code

Page 63: Kernfach System Software WS04/05

System-Software WS 04/05

63

© P. Reali / M. Corti

Virtual AddressingExample

Accessed Pages: 1, 2, 1, 3, 4, 1, 2, 3, 4Available Page Frames: 3

working set{1,2,3,4}

Page Access

1 2 1 3 4 1 2 3 4

Ideal 1 1, 2 1, 2 1, 2, 3 1, 2, 4 1, 2, 4 1, 2, 4 2, 3, 4 2, 3 ,4

FIFO 1 1, 2 1, 2 1, 2, 3 2, 3, 4 3, 4, 1 4, 1, 2 1, 2, 3 2, 3, 4

LRU 1 1, 2 1, 2 1, 2, 3 1, 3, 4 1, 3, 4 1, 4, 2 4, 2, 3 4, 2, 3

PF!

PF!

PF! PF! PF! PF! PF! PF! PF!

PF! PF! PF! PF!

PF! PF! PF! PF! PF!

PF!

Page 64: Kernfach System Software WS04/05

System-Software WS 04/05

64

© P. Reali / M. Corti

Demand PagingBelady’s Anomaly

LRC Strategie• 3 Page Frames

9 Page Faults

• 4 Page Frames10 Page Faults

0 1 2 3 0 1 4 0 1 2 3 4

0 1 2 3 0 1 4 4 4 2 3 3 0 1 2 3 0 1 1 1 4 2 2 0 1 2 3 0 0 0 1 4 4

0 1 2 3 0 1x x x x x x x x x

0 1 2 3 3 3 4 0 1 2 3 4 0 1 2 2 2 3 4 0 1 2 3 0 1 1 1 2 3 4 0 1 2 0 0 0 1 2 3 4 0 1

0 1 2 3 4 0x x x x x x x x x x

Victim

Victim

Page access sequence

Belady’s Anomaly:More page frames cause more page faults

Page 65: Kernfach System Software WS04/05

System-Software WS 04/05

65

© P. Reali / M. Corti

Demand PagingHow many page frames per process?

Even Distribution Every process has the same amount of memory Thrashing

every memory access causes a page-fault not enough page-frames for the current working-set

Process Count

CPU-Load

100 %

1 2 n n+1

System is swapping instead

of running

Page 66: Kernfach System Software WS04/05

System-Software WS 04/05

66

© P. Reali / M. Corti

Demand PagingHow many page frames per process? (2)

Depending on the process needs (1) use Working-Set

Page Frames assigned according to the process’ working-set size. Swap-out a process when not enough memory available.

1 3 2 2 3 3 1 2 2 3 3 3 4 2 2 1 1 1 2 1 3 3 3 1 3 1 2 3 4 1

{ 1, 2, 3, 4 }Sliding

Window

Page Access

{ 2, 3, 4 }Working

Set

Page 67: Kernfach System Software WS04/05

System-Software WS 04/05

67

© P. Reali / M. Corti

Demand PagingHow many page frames per process? (3)

Depending on the process needs (2) use Page-Fault Rate

Time

HIGH

LOW

Page-Fault Rate

Swap out one process Swap in

Page 68: Kernfach System Software WS04/05

System-Software WS 04/05

68

© P. Reali / M. Corti

Virtual AddressingAos/Bluebottle, Memory Layout Example

Stacks

4 GB

2 GB

Heap

Kernel

PROCEDURE PageFault;BEGIN

IF adr > 2GB THENadd page to stack

ELSEException(NilTrap)

ENDEND PageFault;

• 128 KB per stack• max. 32768 active objects• first stack page allocated on process creation

Page 69: Kernfach System Software WS04/05

System-Software WS 04/05

69

© P. Reali / M. Corti

Virtual AddressingExample: UNIX, Fork

code

text

data

a UNIX Program consists of.....

Process B

Fork()read-only

read-only

read-only

read-only

Process A

read-only

read-only

Page Table

data’read-write“copy on write”

Page 70: Kernfach System Software WS04/05

System-Software WS 04/05

70

© P. Reali / M. Corti

Virtual AddressingOS Control

Oberon– no virtual memory

Windows– Virtual Memory configuration– Task Manager

Linux– Swap partition / Swap files– ps / top

Page 71: Kernfach System Software WS04/05

System-Software WS 04/05

71

© P. Reali / M. Corti

Virtual AddressingSegmentation

e.g. Intel x86 Problem

– 640KB Max Memory– 16bit addresses (i.e. 64KB)

Solution– work in a segment– code / data segments– check segment boundaries

Addrreal = Segbase+Offset

real memory

datasegment

codesegment

segment limit

segment base

Page 72: Kernfach System Software WS04/05

System-Software WS 04/05

72

© P. Reali / M. Corti

Virtual AddressingSummary

virtual addresses, addressreal = f(addressvirtual)

Decoupling from real memory– virtual memory– demand paging– separate address spaces

Keywords– page– page frame– page table– page fault– page flags

dirty, used, unmapped

– page replacement strategy LRC, LRU, ideal, ...

– swapping– thrashing, belady’s anomaly

Page 73: Kernfach System Software WS04/05

System-Software WS 04/05

73

© P. Reali / M. Corti

Memory ManagementOverview

Abstractions for applications

– heap– memory blocks

( << memory pages)

Operations:– Allocate– Deallocate

Topics:– memory organization– free lists– allocation strategies– deallocation explicit– garbage collection

type-aware conservative copying / moving incremental generational

Page 74: Kernfach System Software WS04/05

System-Software WS 04/05

74

© P. Reali / M. Corti

Memory ManagementObjects on the heap

Object Instances: a, b, c, d, … Sequence:

NEW(a)NEW(b)NEW(c)DISPOSE(b)NEW(d)NEW(e)

a

b

c

dynamicallocation

explicitdisposal

„Heap“

e

a

c

d

!

e

Case 1

e

e

Case 2

not enough space

Page 75: Kernfach System Software WS04/05

System-Software WS 04/05

75

© P. Reali / M. Corti

Memory ManagementProblem overview

Problems Heap size limitation ( e, case 1) External Fragmentation ( e, case 2) Dangling Pointers (a points to b)

Solutions System-managed list of free blocks

(„free list“) Vector of blocks with fixed size

(Bitmap, with 0=free, 1=used) Automated detection and reclamation of unused blocks

(„garbage collection“)

Page 76: Kernfach System Software WS04/05

System-Software WS 04/05

76

© P. Reali / M. Corti

Memory ManagementTheory: 50% rule

Assumption: stable state M free blocks, N block allocated 50%-Rule: M = 1/2 N

A B B B BC C

N = A + B + CM = 1/2 (2A + B + e) e = 0,1, or 2

block disposal: ΔM = (C - A) / Nblock allocation:(splitting likelihood)

ΔM = 1 - p

B

(C - A) / N = 1 - pC - A - N + pN = 0

2M = 2A + B + e2M = 2A + N - A - C + e2M = N + A - C + e

2M +e = pN

Page 77: Kernfach System Software WS04/05

System-Software WS 04/05

77

© P. Reali / M. Corti

Memory ManagementTheory: Memory Fragmentation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

{ 50%-Rule }(b/2)*F = H - b*B, /2*b*B = H - b*BH/(b*B) = 1 + /2, = 2/ - 2

Criticalpoint

Page 78: Kernfach System Software WS04/05

System-Software WS 04/05

78

© P. Reali / M. Corti

Memory ManagementFree-list management with a Bitmap

Idea– partition heap in blocks of size s– use bitmap to track allocated blocks

bitmap[i] = true blocki allocated Problems

– internal fragmentationround up block size to next multiple of s

– map sizesize is (heap_size / s) bits

loss due to internal

fragmentation

Page 79: Kernfach System Software WS04/05

System-Software WS 04/05

79

© P. Reali / M. Corti

Memory ManagementFree-list management with a list

List organization– sorted / non-sorted

merging of empty blocks is simpler with sorted list– one list / many lists (per size)

search is simpler, merging is more difficult– management data stored in the free block

size, next pointer

Operations– Allocation– Disposal with merge

find free blocks next to current block, merge into bigger free block

Page 80: Kernfach System Software WS04/05

System-Software WS 04/05

80

© P. Reali / M. Corti

Memory ManagementMemory allocation strategies

block splitting:– if a free-block is bigger than the requested block, then it is split

first-fit– use first free block which is big enough

best-fit– take smallest fitting block causes a lot of fragmentation

worst-fit– take biggest available block

quick-fit– best-fit but multiple free-lists (one per block size) fast allocation!

freeused used freeused usedused

usedused used

internal fragmentation

Page 81: Kernfach System Software WS04/05

System-Software WS 04/05

81

© P. Reali / M. Corti

Memory ManagementBuddy System (for fast block merging)

Blocks have size 2k

Block with size 2i has address j*2i (last i bits are 0)

Blocks with address x=j*2i and (j XOR 1)*2i are buddies (can be merged into a block of size 2i+1)

buddy = x XOR 2i

32

64

32

321616

32816 8b1 xxxx 0 0000b2 xxxx 1 00002k+1

2k

2k-1Merge

Split

Page 82: Kernfach System Software WS04/05

System-Software WS 04/05

82

© P. Reali / M. Corti

Memory ManagementBuddy System (for fast block merging)

Problem: only buddies can be merged

Cascading merge

321616

32816 8

321616

32816 8

no buddiesbuddies

32816 8

321616

3232

Page 83: Kernfach System Software WS04/05

System-Software WS 04/05

83

© P. Reali / M. Corti

Memory ManagementBuddy System (for fast block merging)

Allocation– allocate(8)

328 168

321616

3232split

split

quickfit

328 168

Page 84: Kernfach System Software WS04/05

System-Software WS 04/05

84

© P. Reali / M. Corti

Block size = k*32free-lists for k = 1..9, one list for blocks > 9*32

Allocate quick-fit, splitting may be required Free-list management and block-merging done

by the Garbage Collector

Memory ManagementExample: Oberon / Aos

k * 32966432

ALLOCATE(50)

initialstate

k * 32966432

Allocated Block

Page 85: Kernfach System Software WS04/05

System-Software WS 04/05

85

© P. Reali / M. Corti

Memory ManagementGarbage Collection

Two steps:

1. Free block detection– type-aware

collector is aware of the types traversed, i.e. know which values are pointers

– conservative collector doesn’t know

which values are pointers

2. Block Disposal return unused blocks to the

free-lists

GC Characteristics– incremental

gc is performed in small steps to minimize program interruption

– moving / copying / compactingblocks are moved around

– generationalblocks are grouped in generations; different treatment or collection priority

Barriers– read

intercept and check every pointer read operation

– writeintercept and check every pointer write operation

Page 86: Kernfach System Software WS04/05

System-Software WS 04/05

86

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Reference Counting

Every object has a Reference counter rc rc = 0 Object is „Garbage“ Problems

Overhead

no support for circular structures

Useful for... Module hierarchies DAG-Structures (z. B. LISP)

p, q Pointers to Objectq := p

rc

p

rc

q

write barrier

INC p.rcDEC q.rcIF q.rc = 0 THEN Collect q^END;q := p

M

A B

C D

rc >= 1

rc >= 1

Page 87: Kernfach System Software WS04/05

System-Software WS 04/05

87

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Mark & Sweep

Mark-Phase (Garbage Detection) Compute the Root-set consisting of

global pointers (statics) in each module local pointers on the stack in each PAF temporary pointers in the CPU’s registers

Traverse the graph of the live objects starting from the root-set with depth-first strategy; mark all reached objects.

Sweep-Phase (Garbage Collection) Linear heap traversal. Non-marked blocks are inserted into

free-lists. Optimization: lazy sweeping (sweep during allocation,

allocation gets slower)

Page 88: Kernfach System Software WS04/05

System-Software WS 04/05

88

© P. Reali / M. Corti

Run-time support from object-system. Hidden data structures with (compiler generated) information about pointers (metadata).

Conservative approach. Guess which values could be pointers and threat them as such

Memory ManagementGarbage Collection: root-set

off

off1

off2

off2off1

off

ModuleDescriptor

ModuleData

ObjectInstance

TypDescriptor

Type Tag

globalpointer

instance

pointer

Page 89: Kernfach System Software WS04/05

System-Software WS 04/05

89

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Mark with Pointer Rotation/1

Problem:Garbage collection called when free memory is low, but mark may require a lot of memory

Solution:Pointer rotation algorithm (Deutsch, Schorre , Waite)

+ Memory efficient+ iterative

– structures are temporarily inconsistent– non-concurrent– non-incremental

Page 90: Kernfach System Software WS04/05

System-Software WS 04/05

90

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Mark with Pointer Rotation/2

q pq p

p.link

Simple case: list traversal

Page 91: Kernfach System Software WS04/05

System-Software WS 04/05

91

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Mark with Pointer Rotation/3

q

p q

p

Generic case: structure traversal

Page 92: Kernfach System Software WS04/05

System-Software WS 04/05

92

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Memory Compaction

nextavail Pointer: partition heap between allocated and free space

Allocate: increment nextavail Garbace Collector performs memory compaction

nextavail

ALLOC

GC

MS .NET

Page 93: Kernfach System Software WS04/05

System-Software WS 04/05

93

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Stop & Copy

Partition heap in from and to regions Collection:

– traverse objects in from, copy to to– leave forwarding pointer behind– requires read barrier– swap from and to

Characteristics– copying– incremental– (generational)

IF p is moved THENreplace p with forwarding pointer

END;access p

access p

instrument code with read barrier

Page 94: Kernfach System Software WS04/05

System-Software WS 04/05

94

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Stop & Copy

from to

1

from to

2

from to

3

to from

4

Page 95: Kernfach System Software WS04/05

System-Software WS 04/05

95

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Concurrent GC

„Stop-and-Go“ Approach

„Incremental“ Approach

Mutator Mutator MutatorGC GC

Mutator

GC

Mutator Mutator Mutator

User Process

Real-TimeConstraint

Page 96: Kernfach System Software WS04/05

System-Software WS 04/05

96

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Tricolor marking

„Wave-front“ Model

State Color

already traversed,behind wave

black

being traversed,on the wave

grey

not reached yet,

in front of the wave

white

Page 97: Kernfach System Software WS04/05

System-Software WS 04/05

97

© P. Reali / M. Corti

Mutator can change pointers at any time Critical case: black white

Remedy Write-Barrier

color B gray color W gray

Memory ManagementGarbage Collection: Tricolor marking / Isolation

W

unreachable

B

WriteBarrier

Page 98: Kernfach System Software WS04/05

System-Software WS 04/05

98

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Backer‘s Treadmill

To-SpaceFrom-Space

Free-Space

Heap: double-linked chain of

objects

curscan

Page 99: Kernfach System Software WS04/05

System-Software WS 04/05

99

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Backer‘s Treadmill

To-Space From-Space

Free-Space

curscan

conservativeallocation

progressiveallocation

Page 100: Kernfach System Software WS04/05

System-Software WS 04/05

100

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Backer‘s Treadmill

collect

To-SpaceFrom-Space

Free-Space

curscan

reference

curscancurscan

Page 101: Kernfach System Software WS04/05

System-Software WS 04/05

101

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Backer‘s Treadmill

State transitions after GC is complete From-Space + Free-Space Free-Space ToSpace FromSpace

Fragmentation External: not removed Internal: depends on

supported block sizes Allocation

conservative: black progressive: white

Root Set

x

y

NEW(y)

NEW(x)

curscan

Page 102: Kernfach System Software WS04/05

System-Software WS 04/05

102

© P. Reali / M. Corti

Memory ManagementGenerational Garbage Collection

Generations Expected object life

young short life (temp data)old long life

Generations G0, G1, G2

A

B

C

D

E

A

D

F

G

A

G

H

I

J

G2

G1

G0 special handling for pointers

across different generations

required

GenGC

frequency

G0 high

G1 medium

G2 low

collect where it is garbage is most

likely to be found

Page 103: Kernfach System Software WS04/05

System-Software WS 04/05

103

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Finalization

Finalization (after-use cleanup) User-defined routine when object is collected Establish Consistency

save buffers flush caches

Release Resources close connections release file descriptors

Dangers: Resurrection of objects: objects added to live structures Finalization sequence is undefined

Page 104: Kernfach System Software WS04/05

System-Software WS 04/05

104

© P. Reali / M. Corti

Memory ManagementGarbage Collection: .NET Finalization Example

Rules:objects with finalizer belong to

older generation finalizer only called once

(ReRegisterForFinalize)FinalizationQueue: live object

with finalizerFreachableQueue: collected

objects to be finalizedFinalization executed by

different process for security reasons

ABCDE E

BA

garbageFinalizationQueue

ABCDE

EB

A FinalizationQueue

FreachableQueue

GC

thread

Page 105: Kernfach System Software WS04/05

System-Software WS 04/05

105

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Weak Pointers

„Weak“ Pointers Objects referenced only

through a weak pointer can be collected by the GC in case of need

Used for Caches and Buffers

Implementation1. Weak Pointers are not

registered to the GC

2. Use a weak reference table (indirect access)

garbagegarbage in use

weak pointer

weak reference

weak reference table

Page 106: Kernfach System Software WS04/05

System-Software WS 04/05

106

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Weak Pointers Example

Oberon: internal file list– system must keep track of open files to avoid buffer

duplication– file descriptor must be collected once user has no

more reference to it– use weak pointer in the system (otherwise would

keep file alive!)

Page 107: Kernfach System Software WS04/05

System-Software WS 04/05

107

© P. Reali / M. Corti

Memory ManagementObject Pools

Application keeps a pool of preallocated object instances; handles allocation and disposal Simulation discrete events Buffers in a file system Provide dynamic allocation in real-time systemPROCEDURE NewT (VAR p: ObjectT);BEGIN IF freeT = NIL THEN NEW(p) ELSE p := freeT; freeT := freeT.next ENDEND NewT;

PROCEDURE DisposeT (p: ObjectT);BEGIN p.next := freeT; freeT := pEND DisposeT;

Page 108: Kernfach System Software WS04/05

System-Software WS 04/05

108

© P. Reali / M. Corti

Garbage Collection, Recap

GC kinds: compacting copying incremental generationalHelpers: write barrier read barrier forwarding pointer pointer rotation

Algorithms: Ref-Count Mark & Sweep Stop & Copy Mark & Copy (.NET) Baker’s Threadmill

– Dijkstra / Lamport– Steele

Page 109: Kernfach System Software WS04/05

System-Software WS 04/05

109

© P. Reali / M. Corti

Distributed Object SystemsOverview

Goals– object-based approach– hide communication

details

Advantages– more space– more CPU– redundancy– locality

Problems Coherency

– ensure that same object definition is used

Interoperability– serialization– type consistency– type mapping

Object life-time– distributed garbage collection

Page 110: Kernfach System Software WS04/05

System-Software WS 04/05

110

© P. Reali / M. Corti

Distributed Object SystemsArchitecture

Proxy Stub Impl.

NamingService

IDL

ObjectBroker

Client Server

ObjectBroker

CallContex

tMessage

IDL-Compiler IDL-Compiler

Impl.Skeleton

Application

Page 111: Kernfach System Software WS04/05

System-Software WS 04/05

111

© P. Reali / M. Corti

Remote Procedure InvocationOverview

Problem– send structured

information from A to B– A and B may have

different memory layouts– “endianness”

– How is 0x1234 (2 bytes) representend in memory?

12 340 1

1234

Big-Endian: MSB before LSB• IBM, Motorola, Sparc

Little-Endian: LSB before MSB•VAX, Intel

network byte-ordering

little end first

Page 112: Kernfach System Software WS04/05

System-Software WS 04/05

112

© P. Reali / M. Corti

Definitions

Serialization– conversion of an object‘s instance into a byte stream

Deserialization– conversion of a stream of bytes into an object‘s instance

Marshaling– gathering and conversion (may require serialization) to an

appropriate format of all relevant data, e.g in a remote method call; includes details like name representation.

Page 113: Kernfach System Software WS04/05

System-Software WS 04/05

113

© P. Reali / M. Corti

Remote Procedure InvocationProtocol Overview

Protocols– RPC + XDR (Sun)

RFC 1014, June 1987 RFC 1057, June 1988

– IIOP / CORBA (OMG) V2.0, February 1997 V3.0, August 2002

– SOAP / XML (W3C) V1.1, May 2000

– ...

XDR Type System– [unsigned] Integer (32-bit)– [unsigned] Hyper-Integer (64-bit)– Enumeration (unsigned int)– Boolean (Enum)– Float / Double (IEEE 32/64-bit)– Opaque– String– Array (fix + variable size)– Structure– Union– Void

big-endian representation

Page 114: Kernfach System Software WS04/05

System-Software WS 04/05

114

© P. Reali / M. Corti

Remote Procedure InvocationRPC Protocol

Remote Procedure Call Marshalling of procedure

parameters

Message Format Authentication Naming

Client

PROCEDURE P(a, b, c)• pack parameters• send message to

server• await response• unpack response

Server

Server• unpack parameters• find procedure• invoke• pack response• send response

P(a, b, c)

Page 115: Kernfach System Software WS04/05

System-Software WS 04/05

115

© P. Reali / M. Corti

Distributed Object SystemsDetails

References vs. Values– client receives reference to

remote object– data values are copied to

client for efficiency reasons– decide whether an object is

sent as reference or a value serializable (Java, .NET),

valuetype (CORBA) MarshalByRefObject (.NET),

java/RMI/Remote (Java), default (CORBA)

object creation– server creates objects– client creates objects– server can return references

object instances– one object for all requests– one object for each requests– one object per proxy

conversation state– stateless– stateful

Page 116: Kernfach System Software WS04/05

System-Software WS 04/05

116

© P. Reali / M. Corti

Distributed Object SystemsDistr. Object Systems vs. Service Architecture

Dist. Object System– object oriented model– object references– stateful / stateless– tight coupling

Service Architecture– OO-model / RPC– service references– stateless– loose coupling

internal communication between application’s

tiers

external communication

between applications

Page 117: Kernfach System Software WS04/05

System-Software WS 04/05

117

© P. Reali / M. Corti

Distributed Object SystemsDistr. Object Systems vs. Service Architecture

heterogeneoushomogeneous

tightloose

CORBAR

emoting

RM

I

Web Services

• components / objects(distributed object system)

• stateful and statelessconversation

• transactions

• servicesremote procedure calls

• stateless conversation(session?)

• messageenvironment

coupling

Page 118: Kernfach System Software WS04/05

System-Software WS 04/05

118

© P. Reali / M. Corti

Distributed Object SystemsType Mapping

Type System 1InteroperabilityType System Type System 2

Possible Types Possible Types Possible Types

MappableTypes

MappableTypes

InteropSubset

Page 119: Kernfach System Software WS04/05

System-Software WS 04/05

119

© P. Reali / M. Corti

Distributed Object SystemsType Mapping, Example

JavaType System

CORBAType System

CLSType System

wchar

doubledouble double

char

char

char

enumenum

union union union

custom implementation custom implementation

Page 120: Kernfach System Software WS04/05

System-Software WS 04/05

120

© P. Reali / M. Corti

Distributed Object SystemsExamples

Standards– OMG CORBA

IIOP

– Web Services SOAP

Frameworks– Java RMI (Sun)– DCOM (Microsoft)– .NET Remoting (Microsoft)

IIOP.NET

Page 121: Kernfach System Software WS04/05

System-Software WS 04/05

121

© P. Reali / M. Corti

Distributed Object SystemsCORBA

Common Object Request Broker Architecture

TCP/IP Socket

ORBORB

InterfaceRepository

ImplementationRepositoryCORBA

Runtime

Object AdaptorCORBARuntime

Client StubObject Skeleton

ObjectClient Application

Remote Architecture

Client Server

GIOP/IIOP

„Object-Bus“

                                                                                                     

Page 122: Kernfach System Software WS04/05

System-Software WS 04/05

122

© P. Reali / M. Corti

Distributed Object SystemsCORBA

– CORBA is a standard from OMG

Object Management Group Common Object Request

Broker Architecture

– CORBA is useful for... building distributed object

systems heterogeneous

environments tight integration

– CORBA defines... an object-oriented type system an interface definition language

(IDL) an object request broker (ORB) an inter-orb protocol (IIOP) to

serialize data and marshall method invocations

language mappings from Java, C++, Ada, COBOL, Smalltalk, Lisp, Phyton

... and many additional standards and interfaces for distributed security, transactions, ...

Page 123: Kernfach System Software WS04/05

System-Software WS 04/05

123

© P. Reali / M. Corti

Distributed Object SystemsCORBA

Basic Types– integers

16-, 32-, 64bit integers (signed and unsigned)

– IEEE floating point 32-, 64-bit and extended-

precision numbers– fixed point– char, string

8bit and wide– boolean– opaque (8bit), any– enumerations

Compound Types– struct– union– sequence (variable-length array)– array (fixed-length)– interface

concrete (pass-by-reference) abstract (pure definition)

– value type pass-by-value abstract (no state)

Operations in / out / inout parameters raises

Attributes

Page 124: Kernfach System Software WS04/05

System-Software WS 04/05

124

© P. Reali / M. Corti

Distributed Object SystemsCORBA / General Inter-ORB Protocol (GIOP)

CDR (Common Data Representation)

– Variable byte ordering– Aligned primitive types– All CORBA Types supported

IIOP (Internet IOP)– GIOP over TCP/IP– Defines Interoperable Object

Reference (IOR) host post key

Message Format– Defined in IDL– Messages

Request, Reply CancelRequest, CancelReply LocateRequest, LocateReply CloseConnection MessageError Fragment

– Byte ordering flag– Connection Management

request multiplexing asymmetrical / bidirectional

connections

Page 125: Kernfach System Software WS04/05

System-Software WS 04/05

125

© P. Reali / M. Corti

Distributed Object SystemsCORBA / GIOP Message in IDL

module GIOP {struct Version {

octet major;octet minor;

}enum MsgType_1_0 {

Request, Reply, CancelRequest,CancelReply, LocateRequest,LocateReply, CloseConnection, Error

}

struct MessageHeader {

char Magic[4];

Version GIOP_Version;

boolean byte_order;

octet message_size;

unsigned long message_type;

}

} // module end GIOP

Page 126: Kernfach System Software WS04/05

System-Software WS 04/05

126

© P. Reali / M. Corti

Distributed Object SystemsCORBA Services

CORBA Services– System-level services defined

in IDL– Provide functionality required by

most applications Naming Service

– Allows local or remote objects to be located by name

– Given a name, returns an object reference

– Hierarchical directory-like naming tree

– Allows getting initial reference of object

Event Service– Allows objects to

dynamically register interest in an event

– Object will be notified when event occurs

– Push and pull models ... and more

– Trader, LifeCycle, Persistence, Transaction, Security

Page 127: Kernfach System Software WS04/05

System-Software WS 04/05

127

© P. Reali / M. Corti

Web Services

Distributed Object SystemsWebServices

Service-oriented architecture Rely on existing protocols

– SOAP messaging protocol

– WSDL service description protocol

– UDDI service location protocol

SOAP

HTTP

TCP/IP

Page 128: Kernfach System Software WS04/05

System-Software WS 04/05

128

© P. Reali / M. Corti

Distributed Object SystemsSOAP

Simple Object Access Protocol communication protocol XML-based describes object values XML Schemas as interface

description language– basic types

string, boolean, decimal, float, double, duration, datetime, time, date, hexBinary, base64Binary, URI, Qname, NOTATION

– structured types list, union

SOAP Message– SOAP Envelope– SOAP Header– SOAP Body

Method Call– packed as structure– messages are self-

contained– no external object

references

Page 129: Kernfach System Software WS04/05

System-Software WS 04/05

129

© P. Reali / M. Corti

Distributed Object SystemsSOAP Message

SOAP Message– SOAP Envelope

SOAP Header SOAP Body

Example

float Multiply(float a, float b);

Page 130: Kernfach System Software WS04/05

System-Software WS 04/05

130

© P. Reali / M. Corti

Distributed Object SystemsSOAP Example (Request)

POST /quickstart/aspplus/samples/services/MathService/CS/MathService.asmx HTTP/1.1

Host: samples.gotdotnet.com Content-Type: text/xml; charset=utf-8 Content-Length: length SOAPAction: "http://tempuri.org/Multiply" <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body>

<Multiply xmlns="http://tempuri.org/"> <a>float</a> <b>float</b> </Multiply></soap:Body>

</soap:Envelope>

Page 131: Kernfach System Software WS04/05

System-Software WS 04/05

131

© P. Reali / M. Corti

Distributed Object SystemsSOAP Example (Answer)

HTTP/1.1 200 OK Content-Type: text/xml; charset=utf-8 Content-Length: length <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-

instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body>

<MultiplyResponse xmlns="http://tempuri.org/"> <MultiplyResult>float</MultiplyResult>

</MultiplyResponse> </soap:Body>

</soap:Envelope>

Page 132: Kernfach System Software WS04/05

System-Software WS 04/05

132

© P. Reali / M. Corti

Distributed Object SystemsSOAP Example (Service Description-1)

<?xml version="1.0" encoding="utf-8"?><definitions ....> <types> <s:schema elementFormDefault="qualified"

targetNamespace="http://tempuri.org/"> <s:element name="Multiply"> <s:complexType><s:sequence> <s:element minOccurs="1" maxOccurs="1" name="a" type="s:float" /> <s:element minOccurs="1" maxOccurs="1" name="b" type="s:float" /> </s:sequence></s:complexType> </s:element> </s:schema> </types> <message name="MultiplySoapIn"> <part name="parameters" element="s0:Multiply" /> </message>

Page 133: Kernfach System Software WS04/05

System-Software WS 04/05

133

© P. Reali / M. Corti

Distributed Object SystemsSOAP Example (Service Description-2)

<binding name="MathServiceSoap" type="s0:MathServiceSoap"> <soap:binding transport="http://schemas.xmlsoap.org/soap/http"

style="document" /> <operation name="Multiply"> <soap:operation soapAction="http://tempuri.org/Multiply" style="document" /> <input><soap:body use="literal" /></input> <output><soap:body use="literal" /></output> </operation> </binding> <service name="MathService"> <port name="MathServiceSoap" binding="s0:MathServiceSoap"> <soap:address

location="http://samples.gotdotnet.com/quickstart/aspplus/samples/services/MathService/CS/MathService.asmx" />

</port> </service></definitions>

Page 134: Kernfach System Software WS04/05

System-Software WS 04/05

134

© P. Reali / M. Corti

Distributed Object SystemsWebServices

Comments– XML (easily readable)– system independent– standard– stateless (encouraged design

pattern)

– bloated– big messages (but easily

compressed)– requires expensive parsing

Constraints– Services

no object references server-activated servant

– Goes over HTTP requires web server

Page 135: Kernfach System Software WS04/05

System-Software WS 04/05

135

© P. Reali / M. Corti

Distributed Object SystemsWebService Future

Use SOAP-Header to store additional information about message or context

Many standards to come...– WS-Security– WS-Policy– WS-SecurityPolicy– WS-Trust– WS-SecureConversation– WS-Addressing

Page 136: Kernfach System Software WS04/05

System-Software WS 04/05

136

© P. Reali / M. Corti

Distributed Object SystemsJava RMI

Java Remote Method Invocation

TCP/IP Socket

TransportLayer

RemoteReferences

Object Stub

ObjectClient Application

Network

Remote Architecture

Client Server

LookupRegister

TransportLayer

RemoteReferences

Object Stub

LookupRegister

Page 137: Kernfach System Software WS04/05

System-Software WS 04/05

137

© P. Reali / M. Corti

Distributed Object SystemsJava RMI Details

Framework– supports various implementations

e.g. RMI/IIOP– mapping limited to the Java type system, workarounds

needed

– uses reflection to inspect objects

Page 138: Kernfach System Software WS04/05

System-Software WS 04/05

138

© P. Reali / M. Corti

Distributed Object-SystemsLow-Level Details: Java RMI/IIOP

Common Type-System– restricted CORBA

Marshalling– name mapping– remote objects

only references

Interface Description Language (IDL)

– java to IDL mapping

Message representation Underlying protocol

– IIOP (CORBA)

Page 139: Kernfach System Software WS04/05

System-Software WS 04/05

139

© P. Reali / M. Corti

Distributed Object SystemsMicrosoft DCOM

Distributed Common Object Model

RPC Channel

SCMSCM

SCMs and RegistrationCOMRuntime

COMRuntime

Object Proxy Object Stub

ObjectClient Application

Network

Remote Architecture

Client ServerRegistry Registry

OXID Resolver

Ping Server

Page 140: Kernfach System Software WS04/05

System-Software WS 04/05

140

© P. Reali / M. Corti

Distributed Object SystemsMicrosoft .NET Remoting

InstanceInstance

Network

ChannelChannelChannelChannel

TransparentProxy

TransparentProxy

ObjRefObjRef

ClientClient

new Instace()or

Activator.GetObject(...) Application D

omain B

oundaryIChannelInfo ChannelInfo;IEnvoyInfo EnvoyInfo;IRemotingTypeInfo TypeInfo;string URI;

Page 141: Kernfach System Software WS04/05

System-Software WS 04/05

141

© P. Reali / M. Corti

channel

channel

Distributed Object SystemsMicrosoft .NET Remoting

ClientClient InstanceInstance

Instance s = new Instance();s.DoSomething();

Network

ProxyProxy DispatcherDispatcher

FormatterFormatter FormatterFormatter serialize object

TransportSink

TransportSink

TransportSink

TransportSink

handle communication

Stream Chan.Sink(s)

Stream Chan.Sink(s)

Stream Chan.Sink(s)

Stream Chan.Sink(s)

custom operations

MessageChan.Sink(s)

MessageChan.Sink(s)

MessageChan.Sink(s)

MessageChan.Sink(s)

custom operations

Page 142: Kernfach System Software WS04/05

System-Software WS 04/05

142

© P. Reali / M. Corti

Distributed Object SystemsMicrosoft .NET Remoting

Activation client

– one instance per activation server / Singleton

– one instance of object server / SingleCall

– one instance per call

Leases (Object Lifetimes)– renew lease on call– set maximal object lifetime

Serialization– SOAP

Warning: non-standard types, only for .NET use

– binary– user defined

Transport– TCP– HTTP– user defined

Page 143: Kernfach System Software WS04/05

System-Software WS 04/05

143

© P. Reali / M. Corti

AppDomain 2AppDomain 1

Distributed Object SystemsMicrosoft .NET Remoting (Object Marshalling)

MarshalByRefObjects remoted by reference client receives an ObjRef

object, which is a“pointer“ to the original object

[Serializable] all fields of instance are

cloned to the client [NonSerialized] fields are

ignored ISerializable

object has method to define own serialization

Obj Proxy

AppDomain 2AppDomain 1

Obj Obj‘

SerializedObjRef

Serializedfld1... fldn

Page 144: Kernfach System Software WS04/05

System-Software WS 04/05

144

© P. Reali / M. Corti

Distributed Object SystemsMicrosoft .NET Remoting, Activation

Server-Side Activation (Well-Known Objects)

– Singleton Objects only one instance is allocated to process all

requests– SingleCall Objects

one instance per call is allocated

Client-Side Activation– Client Activated Objects

the client allocates and controls the object on the server

“stateless”

“stateful”

Page 145: Kernfach System Software WS04/05

System-Software WS 04/05

145

© P. Reali / M. Corti

Distributed Object SystemsMicrosoft .NET Remoting, Limitations

–Server-Activated Objects object configuration limited to the default constructor

–Client-Activated Objects class must be instantiated, no access over interface class hierarchy limitations use Factory Pattern

– to get interface reference– to allow parametrization of the constructor

–Furthermore... interface information is lost when passing an object reference to another

machine no control over the channel

– which channel is used– which peer is allowed to connect

Page 146: Kernfach System Software WS04/05

System-Software WS 04/05

146

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET

Opensource project based on ETH-Diploma thesis– http://iiop-net.sourceforge.net/

IIOP.NET (marketing)– „Provide seamless interoperability between .NET and CORBA-

based peers (including J2EE)“

IIOP.NET (technical) .NET remoting channel implementing the CORBA IIOP protocol Compiler to make .NET stubs from IDL definitions IDL definition generator from .NET metadata

Page 147: Kernfach System Software WS04/05

System-Software WS 04/05

147

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET

IIOP rather than SOAP transparent reuse of

existing servers tight coupling object-level granularity efficiency

Runtime: standard .NET remoting channel for IIOP transport sink formatter type-mapper

Build tools IDL CLS compiler CLS IDL generator

.NETserver

.NETclient

J2EEserver

Javaclient

CORBAobjects

IIOPbinary IIOP

Java Type System IDL Type System CLS Type System

Possible Types Possible Types Possible Types

IDL MappableTypes

IDLMappableTypesInterop

Subset

Page 148: Kernfach System Software WS04/05

System-Software WS 04/05

148

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET, Interoperability

CommunicationProtocols

Data Model

Message Format

Contextual DataInterception Layer

Conversation

Services

Application

TCP/UDP, Byte stream, point-to-point communication

Type system, mapping and conversion issues

RPC, IIOP, HTTP, SOAP, proprietary binary format,messages, unknown data (exceptions), encryption

SessionID, TransactionID, cultureID, logical threadID …

Activation model (EJB, MBR), global naming,distributed garbage collection, conversational state,…

Distributed Transaction Coordinator, Active Directory, …

This is what we want

Page 149: Kernfach System Software WS04/05

System-Software WS 04/05

149

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET, Granularity

Service Component Object

Message-basedInterface,Stateless

Strongly-typedInterface,

Stateless or Stateful

ImplementationDependency,

Stateful

Object Object

Component

Object Object

Component

Object Object

Component

Service Service

System

Granularity

Coupling,Interaction

Page 150: Kernfach System Software WS04/05

System-Software WS 04/05

150

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET

1.01.11.2

1.3

1.4

1.5

1st A

rtic

le

2nd A

rtic

le1.6

Page 151: Kernfach System Software WS04/05

System-Software WS 04/05

151

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET, Performance

Test Case:–WebSphere 5.0.1 as server–Clients

IBM SOAP-RPC Web Services IBM Java RMI/IIOP IIOP.NET

–Response time receiving 100 beans from

server– WS: 4.0

seconds– IIOP.NET: 0.5

seconds when sending many more

beans, WS are then 200% slower than IIOP.NET

Source: posted on IIOP.NET forum

Page 152: Kernfach System Software WS04/05

System-Software WS 04/05

152

© P. Reali / M. Corti

Processes and ThreadsIntroduction

CPU as resource, provide abstraction to it

Allow multiprogramming– pseudo-parallelism

(single-processors)– real parallelism

(multi-processors)Required abstractions

– multiple activities -- execution of instructions

– protection of resources– synchronization of activities

Topics– coroutines– processes – threads– scheduling

fairness starvation

– synchronization deadlocks

Page 153: Kernfach System Software WS04/05

System-Software WS 04/05

153

© P. Reali / M. Corti

Processes and ThreadsMultithreading

Call a.run Call b.Q Call b.q Call b.q Return b.q Return b.q Call c.R Return c.R Return b.QReturn a.run

a.run

b.Q

Thread 1 b.q

b.q

Call c.run Call d.Q Call d.q Call d.q Return d.q Return d.q Call e.R Return e.R Return d.QReturn c.run

Thread 2

c.run

d.Q

e.R

Stack 1

Stack 2

12

time12

time12

time

Page 154: Kernfach System Software WS04/05

System-Software WS 04/05

154

© P. Reali / M. Corti

Processes and ThreadsCoroutines (1)

Coroutines– each activity has its own stack, address-space is

shared– explicit context switch (stack only) under

programmer‘s control– uses Transfer call switch to another coroutine

Page 155: Kernfach System Software WS04/05

System-Software WS 04/05

155

© P. Reali / M. Corti

Processes and ThreadsCoroutines (2)

Subroutines

Coroutinen

Call

Return

Start

Start

Transfer

Transfer

Call

Return

Page 156: Kernfach System Software WS04/05

System-Software WS 04/05

156

© P. Reali / M. Corti

Processes and ThreadsCoroutines (3)

TYPE Coroutine = POINTER TO RECORDFP: LONGINT;stack: POINTER TO ARRAY OF SYSTEM.BYTE;

END;

VAR cur: Coroutine; (* Current Coroutine *)

PROCEDURE Transfer*(to: Coroutine);BEGIN

SYSTEM.GETREG(SYSTEM.EBP, cur.FP);cur := to;SYSTEM.PUTREG(SYSTEM.EBP, cur.FP);

END Transfer;

MOV ESP, EBPPOP EBPRET 4

PUSH EBPSUB ESP, 4

save FP

restore FP

Page 157: Kernfach System Software WS04/05

System-Software WS 04/05

157

© P. Reali / M. Corti

Processes and ThreadsCoroutines (4)

to’SPFP

PC’FP’

locals

stackQstackP

to’

SP

FP PC’FP’

locals

stackQstackP

Qpcx

localsFP”

FP := Q.FP

to’

SP

FP PC’FP’

locals

stackQstackP

Qpcx

localsFP”

Transfer(Q)

FP

stackQstackP

Qpcx

localsFP”

SP

returnjump at PC’

Page 158: Kernfach System Software WS04/05

System-Software WS 04/05

158

© P. Reali / M. Corti

Processes and ThreadsCoroutines (5)

Current stack: current execution state All other stacks: top PAF (proc activation frame) contains last

Transfer call Start: create stack with fake Transfer-like PAF

PROCEDURE Start(C: Coroutine; size: LONGINT);BEGIN

NEW(C.stack, size);tos := SYSTEM.ADR(C.stack[0])+LEN(C.stack);SYSTEM.PUT(tos-4, 0); (* par = null *)SYSTEM.PUT(tos-8, 0); (* PC’ = null, not allowed to return *)SYSTEM.PUT(tos-12, 0); (* FP’ *)cur.FP := tos-12;

END;

Page 159: Kernfach System Software WS04/05

System-Software WS 04/05

159

© P. Reali / M. Corti

Processes and ThreadsProblems caused by multitasking

Concurrent access to resources

– protectionlimit access to a resource

– synchronizationsynchronize task with resource state or other task

Concurrent access to CPU– task priorities– scheduling

One problem’s solution is another problem’s cause....

– deadlocks– fairness– deadlines / periodicity

constraints

Page 160: Kernfach System Software WS04/05

System-Software WS 04/05

160

© P. Reali / M. Corti

Processes and ThreadsProtection: Mutual Exclusion

Mutual Exclusiononly one activity is allowed to access one resource at a time disable interrupts (single CPU only, avoid switches) locks

flag: lock taken / lock free spin lock (uses busy waiting) exclusive lock read-write lock (multiple reader, one writers)

Page 161: Kernfach System Software WS04/05

System-Software WS 04/05

161

© P. Reali / M. Corti

Processes and ThreadsProtection: Monitor

Shared resources as Monitor resources are passive objects execution of critical sections inside monitor is

mutually exclusive Global Monitor Lock Shared Monitor Lock for read-access (optional)

monitor as a special module [original version (Hoare, Brinch Hansen)]

object instance as monitor method and code block granularity Java, C#, Active Oberon, ...

Resource

task P task Q

acquire

releaseacquire

release

Page 162: Kernfach System Software WS04/05

System-Software WS 04/05

162

© P. Reali / M. Corti

Processes and ThreadsProtection

Simplistic implementation with coroutinesNon-reentrant lock (no recursion allowed)

PROCEDURE Acquire(r: Resource);BEGIN

IF r.taken THENInsertList(r.waiting, cur);SwitchToNextRoutine()

ELSEr.taken := TRUE

ENDEND Acquire;

PROCEDURE Release(r: Resource);BEGIN

next := GetFromList(r.waiting);IF next # NIL THEN

InsertList(ready , next);Transfer(GetNextTask());

ELSEr.taken := FALSE

ENDEND Release;

one waiting queue per resource is

required

Page 163: Kernfach System Software WS04/05

System-Software WS 04/05

163

© P. Reali / M. Corti

Processes and ThreadsProtection

Shared resource as Process synchronization during communication Communicating Sequential Processes (CSP)

C.A.R. Hoare (1978) Model of communication

„Rendez-vous“ between two processes P!x (send x to process P) Q?y (ask y from process Q)

Used in Ada, Occam

task P task Q

Q!z

P?x

task P task QQ!z

P?x

Page 164: Kernfach System Software WS04/05

System-Software WS 04/05

164

© P. Reali / M. Corti

Processes and ThreadsProtection

Some variations on the theme....– Reentrant Locks– Readers / Writers

one writer or multiple readers allowed

– Binary Semaphores one activity can get the resource

– Generic Semaphores N activities are allowed to get the resource

Page 165: Kernfach System Software WS04/05

System-Software WS 04/05

165

© P. Reali / M. Corti

Processes and ThreadsSynchronization

Wait on a condition / state Signals with Send/Wait Methods

Require cooperation from all processes

Example: Producer/Consumer with conditions nonempty/nonfull

Semantic of Send

Send-and-Pass vs. Send-and-Continue

Generic system-handled conditions (Active Oberon) AWAIT(x > y);

Wait on partner process CSP

Page 166: Kernfach System Software WS04/05

System-Software WS 04/05

166

© P. Reali / M. Corti

Processes and ThreadsSynchronization: Implementation Example

Process list double-chained list of all coroutines cur points to current (running) coroutine each signal has a LIFO list

C2

C1C4

C5C3s

link

ready

Signal cur

Page 167: Kernfach System Software WS04/05

System-Software WS 04/05

167

© P. Reali / M. Corti

Processes and ThreadsSynchronization: Implementation Example

Scheduleprev := cur;WHILE ~cur.ready & cur.next # prev DO cur := cur.nextEND;IF cur.ready THEN Transfer(cur) ELSE (*deadlock*) END

Terminatecur.next.prev := cur.prev;cur.prev.next := cur.next;Schedule

Page 168: Kernfach System Software WS04/05

System-Software WS 04/05

168

© P. Reali / M. Corti

Processes and ThreadsSynchronization: Implementation Example

Send(s)IF s # NIL THEN (*send-and-pass*) cur := s; s.ready := TRUE; s := s.linkEND;Schedule (*to next ready from cur*)

Wait(s)cur.link := s; s := cur; cur.ready := FALSE;Schedule (*to next ready from cur*)

Init(s)s := NIL

Page 169: Kernfach System Software WS04/05

System-Software WS 04/05

169

© P. Reali / M. Corti

Processes and ThreadsActive Oberon: Bounded Buffer

Buffer* = OBJECTVAR

data: ARRAY BufLen OF INTEGER;in, out: LONGINT;

(* Put - insert element into the buffer *)

PROCEDURE Put* (i: INTEGER);BEGIN {EXCLUSIVE}

(*AWAIT ~full *)AWAIT ((in + 1) MOD BufLen #

out);data[in] := i; in := (in + 1) MOD

BufLenEND Put;

(* Get - get element from the buffer *)PROCEDURE Get* (VAR i: INTEGER);

BEGIN {EXCLUSIVE}(*AWAIT ~empty *)AWAIT (in # out);i := data[out]; out := (out + 1) MOD

BufLenEND Get;

PROCEDURE & Init;BEGIN

in := 0; out := 0;END Init;

END Buffer;

Page 170: Kernfach System Software WS04/05

System-Software WS 04/05

170

© P. Reali / M. Corti

Processes and ThreadsCSP: Bounded Buffer (I)

[bounded_buffer || producer || consumer]

producer ::

*[<produce item>

bounded_buffer ! item;

]

consumer ::

*[bounded_buffer ? item;

<consume item>

]

Geoff CoulsonLancaster University

Page 171: Kernfach System Software WS04/05

System-Software WS 04/05

171

© P. Reali / M. Corti

Processes and ThreadsCSP: Bounded Buffer (II)

bounded_buffer ::

buffer: (0..9) item;in, out: integer;

in := 0; out := 0;*[

in < out+10; producer ? buffer(in mod 10)-> in := in + 1;

||out < in; consumer ! buffer(out mod 10)-> out := out + 1;

]

Page 172: Kernfach System Software WS04/05

System-Software WS 04/05

172

© P. Reali / M. Corti

Processes and ThreadsProcess State

Process states1. Running: actually using the

CPU

2. Ready: waiting for a CPU

3. Blocked: unable to run, waiting for external event

– Process state transitions1. wait for external event

2. system scheduler

3. system scheduler

4. external event happens

Running

Blocked Ready

12

3

4

Page 173: Kernfach System Software WS04/05

System-Software WS 04/05

173

© P. Reali / M. Corti

Processes and ThreadsProcess State (Active Oberon)

Active Oberon provides– monitor-like object

protection– conditions

Condition are checked by the system.

No explicit help or knowledge from user is required (no x.Signal)

Running

AwaitingObject

Ready

AwaitingCondition

Page 174: Kernfach System Software WS04/05

System-Software WS 04/05174 © P. Reali / M. Corti

Activities Program (static concept) ≠ Process (dynamic) Processes, jobs, tasks, threads (differences later)

– program code– context:

program counter (PC) and registersstack pointerstate

– [new]– running– waiting– ready– [terminated]

– stack– data section (heap)

Page 175: Kernfach System Software WS04/05

System-Software WS 04/05175 © P. Reali / M. Corti

Processes vs. Threads

Process or job (heavyweight)

– code– address space– processor state– private data

(stack+registers)

– can have multiple threads

Thread (lightweight)– shared code– shared address space– processor state– private data

(stack+registers)

CPU

Kernel

Page 176: Kernfach System Software WS04/05

System-Software WS 04/05176 © P. Reali / M. Corti

Processes vs. Threads: Example

PROC 1

instr

instr

instr

PROC 2

instr

instr

instr

HEAP 1

STACK 1

HEAP 2

STACK 2

PROC

instr

instr

instr

HEAP

STACK 1 STACK 2

Page 177: Kernfach System Software WS04/05

System-Software WS 04/05177 © P. Reali / M. Corti

Programmed events that can cause a task switch– protection (locks)

acquire release

– synchronization wait on a condition send a signal (send-and-pass)

System events that can cause a task switch– voluntary switch (“yield”, task termination)– process with higher priority becomes available– consumption of the allowed time quantum

syn

chro

nou

sa

syn

chro

nou

s

task preemption

Multitasking

Page 178: Kernfach System Software WS04/05

System-Software WS 04/05178 © P. Reali / M. Corti

Preemption

Assign each process a time-quantum (normally in the order of tens of ms)

Asynchronous task switches can happen at any time!– task can be in the middle of a computation– save whole CPU state (registers, flags, ...)

Perform switch– on resource conflict– on synchronization request– on timer-interrupt (time-quantum is over)

Page 179: Kernfach System Software WS04/05

System-Software WS 04/05179 © P. Reali / M. Corti

Context switch Scheduler invocation:

– preemption interrupt– cooperation explicit call

Operations:– store the process state (PC, regs, …)– choose the next process (strategy)– [accounting]– restore the state of the next process (regs, SP, PC, …)– jump to the restored PC

A context switch is usually expensive: 1–1000s depending on the system and number of processes

– hardware optimizations (e.g., multiple sets of registers – SPARC, DECSYSTEM-20)

Page 180: Kernfach System Software WS04/05

System-Software WS 04/05180 © P. Reali / M. Corti

Scheduling algorithms

Three categories of environments: batch systems (e.g., VPP, DOS)

– usually non-preemptive (i.e., task is not stopped by scheduler, only synchronous switches)

interactive systems (UNIX, Windows, Mac OS)– cooperative or preemptive– no task allowed to have the CPU forever

real-time systems (PathWorks, RT Linux)– timing constraints (deadlines, periodicity)

Page 181: Kernfach System Software WS04/05

System-Software WS 04/05181 © P. Reali / M. Corti

Scheduling Performance CPU utilization Throughput

– number of jobs per time unit– minimize context switch penalty

Turnaround time– = exit time - arrival time– execution, wait, I/O

Response time– = start time - request time

Waiting time (I/O, waiting, …) Fairness

Page 182: Kernfach System Software WS04/05

System-Software WS 04/05182 © P. Reali / M. Corti

Scheduling algorithm goals

All systems– Fairness

give every task a chance

– Policy enforcement– Balance

keep all subsystems busy

Interactive systems– Response time

respond quickly

– Proportionality meet user’s expectations

Batch systems– Throughput

maximize number of jobs

– Turnaround time minimize time in system

– CPU utilization keep CPU busy

Real-time systems– Meet deadlines

avoid losing data

– Predictability avoid degradation

– Hard- vs. soft-real-time systems

Page 183: Kernfach System Software WS04/05

System-Software WS 04/05183 © P. Reali / M. Corti

Batch Scheduling Algorithms

Choose task to run (task is usually not preempted) First Come First Serve (FCFS)

– fair, may cause long waiting times

Shortest Job First (SJF)– requires knowledge about job length

Longest Response Ratio– response ratio = (time in the system / CPU time)– depends on the waiting time

Highest Priority First– with or without preemption

Mixed– the priority is adjusted dynamically (time in queue, length, priority, …)

ETH-VPP is a batch system!

Which algorithm does it use?

Page 184: Kernfach System Software WS04/05

System-Software WS 04/05184 © P. Reali / M. Corti

Time sharing– Each task has a predefined time quantum– Round-Robin

Schedule next task on the ready list

– Quantum choice:small: may cause frequent switchesbig: may cause slow response

– Implicit assumption: all task have same importance

P1

P4

P3

P2

next

next

Preemptive Scheduling Algorithms

Page 185: Kernfach System Software WS04/05

System-Software WS 04/05185 © P. Reali / M. Corti

Preemptive Scheduling Algorithms

Priority scheduling– process with highest priority is scheduled first

Variants– multilevel queue scheduling

one list per priority, use round-robin on list– dynamic priorities

proportional to time in system inversely proportional to part of quantum used

– make time quantum proportional to priority

Page 186: Kernfach System Software WS04/05

System-Software WS 04/05186 © P. Reali / M. Corti

Real-Time Scheduling Algorithms

Task needs to meet the deadline!

Task cost is known (should)

Two task kind:– aperiodic– periodic

Reservation– scheduler decides if

system has enough resources for the task

Algorithms:– Rate Monotonic

Schedulingassign static priorities (priority proportional to frequency)

– Earliest Deadline Firsttask with closest deadline is chosen

Page 187: Kernfach System Software WS04/05

System-Software WS 04/05187 © P. Reali / M. Corti

Scheduling Algorithm Example Situation:

– Tasks P1, P2, P3, P4 Arrive at time t = 0 Priority: P1 highest, P4 lowest Time to process: 10, 2, 5, 3

Page 188: Kernfach System Software WS04/05

System-Software WS 04/05188 © P. Reali / M. Corti

P1

P2

P3

P40 10 12 17 20

Scheduling Algorithm Example Highest Priority First

Page 189: Kernfach System Software WS04/05

System-Software WS 04/05189 © P. Reali / M. Corti

P1

P2

P3

P40 2 20105

Scheduling Algorithm Example Shortest Job First

Page 190: Kernfach System Software WS04/05

System-Software WS 04/05190 © P. Reali / M. Corti

P1

P2

P3

P40 2 4 6 8 10 12

1314 16 18 20

Scheduling Algorithm Example Timesharing with quantum = 2

Page 191: Kernfach System Software WS04/05

System-Software WS 04/05191 © P. Reali / M. Corti

P1

P2

P3

P40 8 11 15 20

runningat 1/4

runningat 1/3

runningat 1/2

Scheduling Algorithm Example Timesharing with quantum 0

Page 192: Kernfach System Software WS04/05

System-Software WS 04/05192 © P. Reali / M. Corti

Scheduling Algorithm Example: Results

Situation:– Tasks P1, P2, P3, P4

Arrive at time t = 0 Priority: P1 highest, P4 lowest Time to process: 10, 2, 5, 3

– Results turnaround response time Highest Priority First: 14.75 9.75 Shortest Job First: 9.25 4.25 Timesharing with Quantum = 2: 12.75 3.0 Timesharing with Quantum 0: 13.5 0

Page 193: Kernfach System Software WS04/05

System-Software WS 04/05193 © P. Reali / M. Corti

Scheduling Examples UNIX

– preemption– 32 priority levels (round robin)– each second the priorities are recomputed (CPU usage,

nice level, last run)

BSD similar– every 4th tick priorities are recomputed (usage

estimation)

Windows NT– “real time” priorities: fixed, may run forever– variable: dynamic priorities, preemption– idle: last choice (swap manager)

Page 194: Kernfach System Software WS04/05

System-Software WS 04/05194 © P. Reali / M. Corti

Scheduling Examples: Quantum & Priorities

Win2K:– quantum = 20ms (professional) 120ms (user),

configurable– depending on type (I/O bound)

BSD:– quantum = 100ms– priority = f(load,nice,timelast)

Linux:– quantum = quantum / 2 + priority– f(quantum, nice)

Page 195: Kernfach System Software WS04/05

System-Software WS 04/05195 © P. Reali / M. Corti

Scheduling Problems Starvation

A task is never scheduled (although ready) “fairness”

DeadlockNo task is ready (nor it will ever become ready) detection+recovery or avoidance

Page 196: Kernfach System Software WS04/05

System-Software WS 04/05196 © P. Reali / M. Corti

Coffman conditions for a deadlock (1971): Mutual exclusion Hold and wait No resource preemption Circular wait (cycle)

R1

R2

A holds R

A wants S

T1 T2

B holds S

B wants RT Thread

R Resource

Deadlock Conditions

Page 197: Kernfach System Software WS04/05

System-Software WS 04/05197 © P. Reali / M. Corti

Deadlock Remedies

Coarser lock granularity: use a single lock for all resources (e.g., Linux 2.0-2.4 “Big

Kernel Lock”)

Locking order: resources are ordered resource locking according to the resource order (ticketing)

Two-phase-locking: try to acquire all the resources if successful, lock them; otherwise free them and try again

Page 198: Kernfach System Software WS04/05

System-Software WS 04/05198 © P. Reali / M. Corti

Deadlock Detection, Prevention & Recovery

Deadlock detection: the system keeps a graph of locks and tries to detect cycles.– time consuming– the graph has to be kept consistent with the actual state

Deadlock prevention (avoidance): remove one of the four Coffman conditions cycles

Recovery:– kill processes and reclaim the resources– rollback: requires to save the states of the processes

regularly

Page 199: Kernfach System Software WS04/05

System-Software WS 04/05199 © P. Reali / M. Corti

A

B

C

+S +T -S -T+T +R -T -R

+R +S -R -S

Simple Deadlock Scenario Example

– Resources R, S, T– Tasks A, B, C require { R, S }, { S, T }, { T, R } respectively

Case 1: Sequential execution, no deadlock

Page 200: Kernfach System Software WS04/05

System-Software WS 04/05200 © P. Reali / M. Corti

A

B

C

+R+S

+T

+S+T

+R

Simple Deadlock Scenario Case 2: Interleaving, deadlock

C A

B

R

T S

Page 201: Kernfach System Software WS04/05

System-Software WS 04/05201 © P. Reali / M. Corti

DC

A

F

B

E

G

R

S

W

T

U V

graphicalrepresentation

is this a case of deadlock?

Complex Deadlock Scenario Case with 6 resources and 7 tasks

Page 202: Kernfach System Software WS04/05

System-Software WS 04/05202 © P. Reali / M. Corti

Locks

Blocks

Modules

Configuration

Memory

Interrupts

ThreadsTraps

TimersProcessors

Module Lock

Mod

ule

Hie

rarc

hy

Each Kernel Modulehas a lock to protectits data

When multiple locks areneeded, acquire themaccording to the module hierarchy

Deadlock Avoidance Strategy in Bluebottle

Page 203: Kernfach System Software WS04/05

System-Software WS 04/05203 © P. Reali / M. Corti

Priority Inversion A high-priority task can be blocked by a lower

priority one. Example:

Low

running

High

ready

waiting

Medium

Page 204: Kernfach System Software WS04/05

System-Software WS 04/05204 © P. Reali / M. Corti

Priority Inversion Big problem for RTOS Solutions

– priority inheritancelow-priority task holding resource inherits priority of high-priority task wanting the resource

– priority ceilings each resource has a priority corresponding to the highest priority

of the users +1 the priority of the resource is transferred to the locking process can be used instead of semaphores

Page 205: Kernfach System Software WS04/05

System-Software WS 04/05205 © P. Reali / M. Corti

Example: Mars Pathfinder (1996–1998) VxWorks real-time system: preemptive, priorities Communication bus: shared resource (mutexes) Low priority task (short): meteorological data

gathering Medium priority task (long): communication High priority: bus manager

Detection: watchdog on bus activity system reset Fix: activate priority inheritance via an uploaded on-

the-fly patch (no memory protection).

Page 206: Kernfach System Software WS04/05

System-Software WS 04/05206 © P. Reali / M. Corti

Locking on Multiprocessor Machines Real parallelism! Cannot “disable interrupts” like on single processor

machines (could stop every task, but not efficient) Software solutions

– Peterson, Dekker, ...

Hardware support– bus locking– atomic instructions

(Test And Set, Compare And Swap)

Page 207: Kernfach System Software WS04/05

System-Software WS 04/05207 © P. Reali / M. Corti

Locking on multiprocessor machines Test And Set

TAS s:

IF s = 0 THEN

s := 1

ELSE

CC := TRUE

END

Compare and Swap (Intel)CAS R1, R2, A:

R1: expected value

R2: new value

A: address

IF R1 = M[A] THEN

M[A] := R2; CC := TRUE

ELSE

R1 := M[A]; CC := FALSE

END

These instructions are atomic even on multiprocessors!The usually do so by locking the data bus

Page 208: Kernfach System Software WS04/05

System-Software WS 04/05208 © P. Reali / M. Corti

Counter s: available resources Binary Semaphores with TAS

Try TAS sJMP TryCS

Spinning(busy wait)

TAS sJMP QueuingCS

Blocking

Example: Semaphores on SMP

Page 209: Kernfach System Software WS04/05

System-Software WS 04/05209 © P. Reali / M. Corti

Counter s: available resources Generic Semaphores with CAS

P(s)Enter CS

Load R1sTryP MOVE R1R2

DEC R2CAS R1, R2, sBNE TryPCMP R2, 0BN Queuing[CS]

[CS]Load R1s

TryV MOVE R1R2INC R2CAS R1, R2, sBNE TryVCMP R2, 0BNP Dequeuing

Exit CSV(s)

Example: Semaphores on SMP

P(S): { S := S - 1}IF S < 0 THEN

jump queuingEND

V(S): { S := S + 1}IF S <= 0 THEN

jump dequeuingEND

Page 210: Kernfach System Software WS04/05

System-Software WS 04/05210 © P. Reali / M. Corti

PROCEDURE AcquireSpinTimeout(VAR locked: BOOLEAN);CODE {SYSTEM.i386}MOV EBX, locked[EBP] ; EBX := ADR(locked)MOV AL, 1 ; AL := 1CLI ; switch interrupts off before

; acquiring lock

test:XCHG [EBX], AL ; set and read the lock

; atomically. ; LOCK prefix implicit.

CMP AL, 1 ; was locked?JE test ; retry

..

END AcquireSpinTimeout;simplified

version

Spin-Locks: the Bluebottle/i386 way

Page 211: Kernfach System Software WS04/05

System-Software WS 04/05211 © P. Reali / M. Corti

Z = OBJECTVAR myT: T; I: INTEGER;

PROCEDURE & NEW (t: T);BEGIN myT := tEND NEW;

PROCEDURE P (u: U; VAR v: V);BEGIN { EXCLUSIVE } i := 1END P;

BEGIN { ACTIVE } BEGIN { EXCLUSIVE }

AWAIT (i > 0); ENDEND Z;

Condition

State

Object Activity

Method

Initializer

Mutual Exclusion

Active Objects in Active Oberon

Page 212: Kernfach System Software WS04/05

System-Software WS 04/05212 © P. Reali / M. Corti

Ready Queue

Running

AwaitingAssertion

AwaitingObject

Ready

NIL

Ready

CPUs

1

Lock

Queue

Wait

Queue

2

Active Oberon Runtime Structures

Page 213: Kernfach System Software WS04/05

System-Software WS 04/05213 © P. Reali / M. Corti

Running

AwaitingAssertion

AwaitingObject

Ready

NIL

NIL

1

2 3

4 5

0

7

6

END Run next ready

Preempt Set to ready; Run next ready

6

7

1

1

NEW Create object; Create process; Set to ready

0

Active Oberon Implementation

Page 214: Kernfach System Software WS04/05

System-Software WS 04/05214 © P. Reali / M. Corti

Enter Monitor IF monitor lock set THEN Put me in monitor obj wait list; Run next ready ELSE set monitor lock END

Exit Monitor Find first asserted x in wait list; IF x found THEN set x to ready ELSE Find first x in obj wait list; IF x found THEN set x to ready ELSE clear monitor lock END END Run next ready

4

5

1

1

2

Running

AwaitingAssertion

AwaitingObject

Ready

NIL

NIL

1

2 3

4 5

0

7

6

Active Oberon Implementation

Page 215: Kernfach System Software WS04/05

System-Software WS 04/05215 © P. Reali / M. Corti

Running

AwaitingAssertion

AwaitingObject

Ready

NIL

NIL

1

2 3

4 5

0

7

6AWAIT Put me in monitor assn wait list; Call Exit monitor

3

Active Oberon Implementation

Page 216: Kernfach System Software WS04/05

System-Software WS 04/05216 © P. Reali / M. Corti

p

pq < p

end of quantum

Case Study: Windows CE 3.0 Real-time constraints

– Reaction time on events– Execution time

Threads with priorities and time quanta– Priorities: 0 (high), …, 255 (low)– Time quanta in ms

Default 100 ms 0 no quantum

Single processor

Page 217: Kernfach System Software WS04/05

System-Software WS 04/05217 © P. Reali / M. Corti

IST

ISR

EventEvent IRQ

NK.EXE

Kernel Modus

User Modus

Case Study: Windows CE 3.0 Interrupt Handling

– ISR (Interrupt Service Routine) 1st level handling Kernel mode, uses kernel stack Installed at boot-time Creates event on-demand Preempted by ISR with higher priority

– IST (Interrupt Service Thread) 2nd level handling User mode Awaits events

Page 218: Kernfach System Software WS04/05

System-Software WS 04/05218 © P. Reali / M. Corti

[

[

][ ]

]

CS

Case Study: Windows CE 3.0 Synchronization on common resources:

– Critical sections: enter, leave operations– Semaphores and mutexes (binary semaphores)

Synchronization is performed with system/library calls (they are not part of a language).

Priority inversion avoidance– priority inheritance (thread inherits priority of task wanting

the resource)

Page 219: Kernfach System Software WS04/05

System-Software WS 04/05219 © P. Reali / M. Corti

Case Study: Java Activities are mapped to threads (no processes) Synchronization in the language

– locks– signals

Threads provided by the library Scheduling depends on the JVM

Page 220: Kernfach System Software WS04/05

System-Software WS 04/05220 © P. Reali / M. Corti

Case Study: Java

public class MyThread() extends Thread {

public void run() { System.out.println("Running"); }

public static void main(String [] arguments) { MyThread t = (new MyStread()).start(); }

}

Page 221: Kernfach System Software WS04/05

System-Software WS 04/05221 © P. Reali / M. Corti

Case Study: Java

public class MyThread() implements Runnable {

public void run() { System.out.println("Running"); }

public static void main(String [] arguments) { Thread t = (new Thread(this)).start(); }

}

Page 222: Kernfach System Software WS04/05

System-Software WS 04/05222 © P. Reali / M. Corti

Case Study: Java Protection with monitor-like objects

– with method granularitypublic synchronized void someMethod()

– with statement granularitysynchronized(anObject) { ... }

Synchronization with signals– wait() (with optional time-out)– notify() / notifyAll() (“send and continue” pattern)

Page 223: Kernfach System Software WS04/05

System-Software WS 04/05223 © P. Reali / M. Corti

Case Study: Java

private Object o;

public synchronized consume() { while (o == null) { try { wait(); } catch (InterruptedException e) {} } use(o); o = null; notifyAll();}public synchronized void produce(Object p) { while (o != null) { try { wait(); } catch (InterruptedException e) {} } o = p; notifyAll();}

Page 224: Kernfach System Software WS04/05

System-Software WS 04/05224 © P. Reali / M. Corti

Case Study: POSIX Threads Standard interface for threads in C Mostly UNIX, possible on Windows Provided by a library (libpthread) and not part of the

language. IEEE POSIX 1003.1c standard (1995) Various implementations (both user and kernel

level)

Page 225: Kernfach System Software WS04/05

System-Software WS 04/05225 © P. Reali / M. Corti

Case Study: POSIX Threads

#include <pthread.h>

pthread_mutex_t m;

void *run(){ pthread_mutex_lock(&m); // critical section pthread_mutex_unlock(&m); pthread_exit(NULL);}

int main (int argc, char *argv[]){ pthread_t t; pthread_create(&t, NULL, run,NULL); pthread_exit(NULL);}

Page 226: Kernfach System Software WS04/05

File Systems

Page 227: Kernfach System Software WS04/05

System-Software WS 04/05227 © P. Reali / M. Corti

File Systems - Overview Hardware File abstraction File organization File systems

– Oberon– Unix– FAT

Distributed file systems– NFS– AFS

Special topics– Error recovery– ISAM– B* Trees

Page 228: Kernfach System Software WS04/05

System-Software WS 04/05228 © P. Reali / M. Corti

Hardware: the ATA Bus ATA / IDE (1986)

– Advanced Technology Attachment

– Integrated Drive Electronics ATA-2 / EIDE ATA-4 / ATAPI

– ATA Packet Interface(SCSI command set)

ATA-5– UDMA 66

ATA-6– UDMA 100– SATA

ATA-7– UDMA 133

bus with 2 devices– master / slave

low-level interface– head / cylinder / sector– support for LBA

(logical block addressing)

PIO mode– read byte by byte through

hardware port

DMA mode– use DMA transfer

Page 229: Kernfach System Software WS04/05

System-Software WS 04/05229 © P. Reali / M. Corti

Hardware: the SCSI Bus SCSI: Small Computer

Systems Interface SCSI-2

– Fast SCSI– Wide SCSI

SCSI-3

Bus with 8 devices– wide: 16 / 32 devices– bus arbitration– disconnected mode

Device kinds– direct access– CD-ROM– ...

Block-oriented access– read-block, write-block

Transfer mode selection– asynchronous (hand-shake)– synchronous (period / offset)

Page 230: Kernfach System Software WS04/05

System-Software WS 04/05230 © P. Reali / M. Corti

surf

ace

(h

ead

)rotationaxis

track (cylinder)sector

Hardware: Hard Disk Organization

– cylinder (c)– head (h)– sector (s)

Addressing– sector (c, h, s)– block (LBA)

Page 231: Kernfach System Software WS04/05

System-Software WS 04/05231 © P. Reali / M. Corti

Hardware: Example

Current disk example: ATA-100 250GB 512 bytes per sector (488·106 sectors) 8MB cache 8.9ms average seek time 7200 rpm

Page 232: Kernfach System Software WS04/05

System-Software WS 04/05232 © P. Reali / M. Corti

1

2

3

4

5

67

cylinder

Hardware: Hard Disk Improvements Interleaving

optimize sequential sector access

Read-ahead Caching Sector defect management

Page 233: Kernfach System Software WS04/05

System-Software WS 04/05233 © P. Reali / M. Corti

Hardware: Disk Scheduling Disk controllers have a queue of pending requests:

– type: read or write– block number: translated into the (h,c,s)-tuple– memory address (where to copy from and to)– amount to be transferred (byte or block count)

Page 234: Kernfach System Software WS04/05

System-Software WS 04/05234 © P. Reali / M. Corti

Hardware: Disk Scheduling

First-come, first-served (FCFS)

Shortest-seek-time-first (SSTF)

SCAN (elevator) &C-SCAN

LOOK &C-LOOK

Performance: minimize head movements, maximize throughput

Scheduling is now in the hardware

Page 235: Kernfach System Software WS04/05

System-Software WS 04/05235 © P. Reali / M. Corti

Hardware: Disk Scheduling Example (head position, track number):

queue = 31, 72, 4, 18, 147, 193, 199, 153, 114, 72

Page 236: Kernfach System Software WS04/05

System-Software WS 04/05236 © P. Reali / M. Corti

Hardware: Disk Scheduling

Page 237: Kernfach System Software WS04/05

System-Software WS 04/05237 © P. Reali / M. Corti

Abstractions

Block: array of sectors some systems call

them “clusters” user configured reduces address space increases access

speed causes internal

fragmentation

Disk: array of sectors

File: stream of bytes sequential access random access stored on disk

– mapping byte to block– block allocation

management

Page 238: Kernfach System Software WS04/05

System-Software WS 04/05238 © P. Reali / M. Corti

Disk

ReadSector, WriteSector

Volume

ReadBlock, WriteBlockAllocateBlock, FreeBlock

File System

OpenFile, WriteFile, ReadFile, SeekFile, CloseFile

Abstractions Implementations

ATA driver

SCSI driver

FAT

Oberon

ISO 9660

Abstraction Layers

ext3

NTFS

Page 239: Kernfach System Software WS04/05

System-Software WS 04/05239 © P. Reali / M. Corti

File Organization How can we map groups of blocks into files? How do we manage free space? How can I jump to a certain location?

Operation: read n bytes at position p.

Page 240: Kernfach System Software WS04/05

System-Software WS 04/05240 © P. Reali / M. Corti

File Organization: Contiguous Allocation

File is a group of contiguous blocks Simple management Fast transfers IBM MVS (mainframe)

start length

Page 241: Kernfach System Software WS04/05

System-Software WS 04/05241 © P. Reali / M. Corti

File Organization: Contiguous Allocation

external fragmentation allocation

– how much space does a file need?– first fit, best fit, …?

file growth (error? move? extensions?) preallocation: internal fragmentation

start length

Page 242: Kernfach System Software WS04/05

System-Software WS 04/05242 © P. Reali / M. Corti

File Organization: Linked Allocation File is a linked list of blocks

– no external fragmentation– no growth problems

Problems– sequential files only (positioning requires traversal)– space for pointers (1TB, 5B addr., 1% with 512B blocks)– reliability (lost pointers)

start

Page 243: Kernfach System Software WS04/05

System-Software WS 04/05243 © P. Reali / M. Corti

File Organization: Linked Allocation Clusters: series of contiguous blocks

– faster (less jumps)– less space wasted for pointers– internal fragmentation

start

Page 244: Kernfach System Software WS04/05

System-Software WS 04/05244 © P. Reali / M. Corti

File Organization: Linked Allocation Pointer tables

– the list of pointers is stored in a separate table– can be cached– usually is stored twice (reliability)– FAT (MS-DOS, OS/2, Windows, solid-state memory)

start

Page 245: Kernfach System Software WS04/05

System-Software WS 04/05245 © P. Reali / M. Corti

File Organization: Indexed Allocation Index with block addresses Fast access for random-access files No external fragmentation Problems

– high management overhead– limited file size (depending on the index structure)– pointer overhead

file

Page 246: Kernfach System Software WS04/05

System-Software WS 04/05246 © P. Reali / M. Corti

File Organization: Indexed Allocation Variation:

– linked list of indexes

Advantage:– no file size limitation

Disadvantage:– Index lookup requires sequential traversal of index list

file

Page 247: Kernfach System Software WS04/05

System-Software WS 04/05247 © P. Reali / M. Corti

File Organization: Indexed Allocation multi-level indexes

(index of indexes) UNIX

Advantage:– fast index lookup

Disadvantage:– limited file size

file

Page 248: Kernfach System Software WS04/05

System-Software WS 04/05248 © P. Reali / M. Corti

File Organization: Indexed Allocation

Example: blocks 2KB address 4B

First level index blocks:512 entries · 2KB = 1MB

Second level index block:512 entries · 2KB = 0.5GB

file

Page 249: Kernfach System Software WS04/05

System-Software WS 04/05249 © P. Reali / M. Corti

Free Space Management Bitmap (e.g., HFS)

– bit vector to mark free blocks– simple– needs caching

Linked lists– list of free blocks (similar to linked allocation)

Grouping– free blocks contain n address of free blocks (similar to

multilevel indexing)

Counting– list of 2-tuples of series of free blocks (start, length)

Page 250: Kernfach System Software WS04/05

System-Software WS 04/05250 © P. Reali / M. Corti

Case Study: Oberon File System Disk module: controller driver

– block management FileDir module:

– maps files to locations– implemented with B-trees– garbage collection (files)

the directory is the root set anonymous (nonregistered) files are collected

Files module:– allows user operations (read, create, write,

…)– access is performed through riders

Files

FileDir

Disk

Page 251: Kernfach System Software WS04/05

System-Software WS 04/05251 © P. Reali / M. Corti

Characteristics Block size = 1KB File organization

– multilevel index: 64 direct 12 1st level indirect

– 672 data bytes in file header

Block allocation– allocation table created at boot-time (partition GC)– no collection at run-time (partition fills up!)

designed to optimize

small files

Case Study: Oberon File System

Page 252: Kernfach System Software WS04/05

System-Software WS 04/05252 © P. Reali / M. Corti

d

01

63

75

d

(672B)(672B)

ddd

dddd

dddd

dddd

ddi1d

i2

i1

12 index blocks with 256data blocks each

64 blocks

Case Study: Oberon File System Block = 1KB

Page 253: Kernfach System Software WS04/05

System-Software WS 04/05253 © P. Reali / M. Corti

Free block management: bitmap

Garbage collection at startup

Case Study: Oberon File System

11111111111111111111111111111111

11010010011110111101110100011100

11010010011110110001110100011100

11010010011110110000110100011100

startup / GC

allocate 16,17

allocate 19

0 8 16 24

0 8 16 24

0 8 16 24

0 8 16 24

Page 254: Kernfach System Software WS04/05

System-Software WS 04/05254 © P. Reali / M. Corti

Rf

f

File Handle

R

Buffer

R

Rider

“Hint”

Case Study: Oberon File System

Internals “Rider”: current read

or write position Buffer (cache) for

consistency (each filesees the write operationson it)

Page 255: Kernfach System Software WS04/05

System-Software WS 04/05255 © P. Reali / M. Corti

Case Study: Oberon RAM Disk

File = POINTER TO Header;Index = POINTER TO Sector;

Rider = RECORDeof: BOOLEAN;file: File;pos: LONGINT;adr: LONGINT;

END;

Header = RECORDmark: LONGINT;name: FileDir.Name;len, time, date: LONGINText: ARRAY 12 OF Index;sec: ARRAY 64 OF SectorTable;

END;

exttable

primarysectortable

header

points tosectors 0 - 63

indexsector 0

points tosectors64 - 319

indexsector 1

points tosectors320 - 575

Page 256: Kernfach System Software WS04/05

System-Software WS 04/05256 © P. Reali / M. Corti

Case Study: Oberon RAM Disk

PROCEDURE Read(VAR r: Rider; VAR x: SYSTEM.BYTE);VAR m: INTEGER;

BEGINIF r.pos < r.file.len THEN SYSTEM.GET(r.adr, x); INC(r.adr); INC(r.pos);IF r.adr MOD SS = 0 THEN (*end of sector *)m := SHORT(r.pos DIV SS);IF m < STS THEN

r.adr := r.file.sec[m]ELSE

r.adr := r.file.ext[(m-STS) DIV XS].x[(m-STS) MOD XS]END

ENDELSE x := 0X; r.eof := TRUEEND

END Read;

SS = Sector SizeSTS = Sector Table SizeXS = Index Size

Page 257: Kernfach System Software WS04/05

System-Software WS 04/05257 © P. Reali / M. Corti

Case Study: Oberon RAM Disk

PROCEDURE Write(VAR r: Rider; x: SYSTEM.BYTE);VAR k, m, n: INTEGER; ix: LONGINT;

BEGINIF r.pos < r.file.len THENm := SHORT(r.pos DIV SS); INC(r.pos);IF m < STS THEN

r.adr := r.file.sec[m]ELSE

r.adr := r.file.ext[(m-STS) DIV XS].x[(m-STS) MOD XS]END

ELSE....

END;SYSTEM.PUT(r.adr, x); INC(r.adr);

END Write;

overwrite

Page 258: Kernfach System Software WS04/05

System-Software WS 04/05258 © P. Reali / M. Corti

Case Study: Oberon RAM DiskIF r.pos < r.file.len THEN ....ELSE IF r.adr MOD SS = 0 THEN m := SHORT(r.pos DIV SS); IF m < STS THEN Kernel.AllocSector(0, r.adr); r.file.sec[m] := r.adr ELSE n := (m-STS) DIV XS; k := (m-STS) MOD XS; IF k = 0 THEN Kernel.AllocSector(0, ix); r.file.ext[n] := SYSTEM.VAL(Index, ix) END; Kernel.AllocSector(0, r.adr); r.file.ext[n].x[k] := r.adr END; INC(r.pos); r.file.len := r.pos END; SYSTEM.PUT(r.adr, x); INC(r.adr);

expand

Page 259: Kernfach System Software WS04/05

System-Software WS 04/05259 © P. Reali / M. Corti

Case Study: UNIX, inodes

Inode: file owner file type

– regular / directory / special

access permissions access time reference count (links) table of contents file size

Inode table of contents 10 (12) direct blocks 1 indirect block 1 double indirect block 1 triple indirect block

File system: files and directories (files with a special content) A file is represented by an inode

Page 260: Kernfach System Software WS04/05

System-Software WS 04/05260 © P. Reali / M. Corti

ddi3i3 i2i2

i2i2

i1i1

i1i1

i1i1

Case Study: UNIX, inodes

01

101112

i3 i2 i1 d

info

inode

typeaccess

refc

i2 i1

i1

ddd

ddd

ddd

Page 261: Kernfach System Software WS04/05

System-Software WS 04/05261 © P. Reali / M. Corti

Case Study: UNIX, directories Directories are normal files with a special content. The data part contains a list with

– inode– name

Every directory has two special entries– . the directory itself– .. the parent directory

Page 262: Kernfach System Software WS04/05

System-Software WS 04/05262 © P. Reali / M. Corti

Case Study: UNIX, inodes

type: dirblocks: 132owner: rootref count: 1

inode 2

/2 .2 ..4 bin3 root

block 132

type: dirblocks: 406owner: rootref count: 1

inode 3

/root/3 .2 ..5 .tcshrc6 mbox

block 406

type: fileblocks: 42, 103owner: rootref count: 1

inode 6

datablock 42

data

block 103

inode # name

inodes

disk block

Page 263: Kernfach System Software WS04/05

System-Software WS 04/05263 © P. Reali / M. Corti

Case Study: UNIX, soft and hard links Hard links:

– two directories entries with the same inode number– each file has a reference counter

42 file42 hardlink

Soft links– the directory entry points to a special file with the path of

the linked file

42 file43 softlink

(inode 43 points to a special file with the path of file)

Page 264: Kernfach System Software WS04/05

System-Software WS 04/05264 © P. Reali / M. Corti

Case Study: UNIX, hard links

type: dirblocks: 132owner: rootref count: 1

inode 2

/2 .2 ..4 bin3 root

block 132

type: dirblocks: 406owner: rootref count: 1

inode 3

/root/3 .2 ..5 mails5 mbox

block 406

type: fileblocks: 42, 103owner: rootref count: 2

inode 5

datablock 42

data

block 103

inodes

disk block

Page 265: Kernfach System Software WS04/05

System-Software WS 04/05265 © P. Reali / M. Corti

Case Study: UNIX, soft links

type: dirblocks: 132owner: rootref count: 1

inode 2

/2 .2 ..4 bin3 root

block 132

type: dirblocks: 406owner: rootref count: 1

inode 3

/root/3 .2 ..5 mbox6 mails

block 406

type: fileblocks: 42owner: rootref count: 1

inode 5datablock 42

type: fileblocks: 43owner: rootref count: 1

inode 6/root/mboxblock 43

Page 266: Kernfach System Software WS04/05

System-Software WS 04/05266 © P. Reali / M. Corti

Case Study: UNIX, Volume Layout

A volume (partition) contains boot block

– bootstrap code

super block– size– max file– free space– …

inodes data blocks

bootblock

superblock inode list data blocks

Page 267: Kernfach System Software WS04/05

System-Software WS 04/05267 © P. Reali / M. Corti

Case Study: UNIX, Functions

Core functions breadread block bwrite write block

iget get inode from disk iput put inode to disk bmapmap (inode, offset) to disk block namei convert path name to inode

Page 268: Kernfach System Software WS04/05

System-Software WS 04/05268 © P. Reali / M. Corti

Case Study: UNIX, namei

namei (path)

if (absolute path)inode = root;

elseinode = current directory inode;

while (more path to process) {read directory (inode);if match(directory, name component) {inode = directory[name component];iget(inode);

} else {return no inode;

}}

return inode;

Page 269: Kernfach System Software WS04/05

System-Software WS 04/05269 © P. Reali / M. Corti

FATnn: nn corresponds to the FAT size in bits FAT12, FAT16, FAT32 used by MS-DOS and

Windows for disks and floppies Volume Layout

bootblock

FAT1 FAT2 rootdirectory

data

Case Study: FAT

Page 270: Kernfach System Software WS04/05

System-Software WS 04/05270 © P. Reali / M. Corti

Case Study: FAT, Example0

1

2 EOF

3 EOF

4 12

5 FREE

6 9

7 BAD

8 3

9 11

10 EOF

11 10

12 EOF

13 FREE

6 9 11 10

4 12

38

File 1:

File 2:

File 3:

disksize

Page 271: Kernfach System Software WS04/05

System-Software WS 04/05271 © P. Reali / M. Corti

Case Study: FAT, Directory Information about files is kept in the directory

File name (8)

Extension (3)

A D V S H R

Reserved (10)

Time (2)

Date (2)

First block (2)

File size (4)

Page 272: Kernfach System Software WS04/05

System-Software WS 04/05272 © P. Reali / M. Corti

Case Study: FAT, Max. Partition Size

Block size FAT-12 FAT-16 FAT-32

0.5 KB 2 MB

1 KB 4 MB

2 KB 8 MB 128 MB

4 KB 16 MB 256 MB 1 TB

8 KB 512 MB 2 TB

16 KB 1024 MB 2 TB

32 KB 2048 MB 2 TB

Page 273: Kernfach System Software WS04/05

System-Software WS 04/05273 © P. Reali / M. Corti

File System Mounting More than one volume mounted in the same

directory tree.

/ usr

mnt

floppy

dos

cd

home corti

bin

afs ethz.ch

Page 274: Kernfach System Software WS04/05

System-Software WS 04/05274 © P. Reali / M. Corti

Virtual File System Support for several file systems

– disk based– network– special

VFS: unifies the system calls Mirrors the traditional UNIX file system model

Applications

ext3 FAT NFS AFS proc pts

ext3 FAT NFS AFS proc ptsVFS

Page 275: Kernfach System Software WS04/05

System-Software WS 04/05275 © P. Reali / M. Corti

File System Mounting Each file system type has a method table System calls are indirect function calls through the method

table Common interface (open, write, readdir, lock, …) Each file is associated with a the method table

Page 276: Kernfach System Software WS04/05

System-Software WS 04/05276 © P. Reali / M. Corti

File System Mounting: Special Files Devices

– disks– memory– USB devices– serial ports– …

Kernel communication (e.g., proc) Uniform interface (open, close, read, write) Uniform protection (user, groups)

Page 277: Kernfach System Software WS04/05

System-Software WS 04/05277 © P. Reali / M. Corti

File Systems: Protection Restrict: access (who), operations (what),

management– FAT: flags in the directory

e.g., read only execution based on name

– UNIX: restrictions in inodes based on users and groups operations: read, write, execute directories: manage files not so flexible

– VMS: access lists list of users and rights per file

Page 278: Kernfach System Software WS04/05

Distributed File Systems

Page 279: Kernfach System Software WS04/05

System-Software WS 04/05279 © P. Reali / M. Corti

Distributed File Systems (DFS) Clients, servers and storage are dispersed among

machines in a distributed system.

Client Client

ClientClient

Client

Server

ServerServer

Client

Server

Page 280: Kernfach System Software WS04/05

System-Software WS 04/05280 © P. Reali / M. Corti

Overview

Naming (dynamic): location

transparency: file name does not reveal the file location

location independence: file name does not change when storage is moved

Caching (efficiency) write-through delayed-write write-on-close

Consistency client-initiated: poll

server for changes server-initiated: notify

clients

Page 281: Kernfach System Software WS04/05

System-Software WS 04/05281 © P. Reali / M. Corti

Naming Simple approaches:

– file is identified by a host, path pair– Ibis (host:path)– SMB (\\host\path)

Transparent– remote directory are mounted in the local file system– not uniform (the mount point is not defined)– NFS (/mnt/home, /home/)– SMB (\\host\path mounted on Z:)

Global name structure– uniform and transparent naming– AFS (/afs/cell/path)

Page 282: Kernfach System Software WS04/05

System-Software WS 04/05282 © P. Reali / M. Corti

Caching Reduces network and disk load Consistency problems Granularity:

– How much? Big/small chunks of data? Entire files?– Big: +hit ratio, +hit penalty, +consistency problems

Location:– memory: +diskless stations, +speed– disk: +cheaper, +persistent– hybrid

Space consumption on the clients

Page 283: Kernfach System Software WS04/05

System-Software WS 04/05283 © P. Reali / M. Corti

Caching

Policies: write-through: +reliability, -performance (cache is

effective only for read operations) delayed-write: +write speed, +unnecessary writes

eliminated, -reliability– write when the cache is full (+performance, -long time in

the cache)– regular intervals

write-on-close

Page 284: Kernfach System Software WS04/05

System-Software WS 04/05284 © P. Reali / M. Corti

Consistency Is my cached copy up-to-date? Client-initiated approach:

– the client performs validity checks– when? open/fixed intervals/every access

Server-initiated approach:– the server keeps track of cached files (parts)– notifies the clients when conflicts are detected– should the server allow conflicts?

Page 285: Kernfach System Software WS04/05

System-Software WS 04/05285 © P. Reali / M. Corti

Stateless and Stateful Servers

Stateful: the server keeps track of each accessed file session IDs (e.g., identifying an inode on the server) fast

– simple requests– caches– fewer disk accesses– read ahead

volatile– server crash: rebuild structures (recovery protocol)– client crash: orphan detection and elimination

Page 286: Kernfach System Software WS04/05

System-Software WS 04/05286 © P. Reali / M. Corti

Stateless and Stateful Servers

Stateless: each request is self-contained request: file and position complex requests need for uniform low-level naming scheme (to avoid

name translations) need idempotent operations (same results if

repeated)– absolute byte counts

No locking possible

Page 287: Kernfach System Software WS04/05

System-Software WS 04/05287 © P. Reali / M. Corti

File Replication A file can be present on failure independent

machines Naming scheme manages the mapping

– same high-level name– different low-level names

Transparency Consistency

Page 288: Kernfach System Software WS04/05

System-Software WS 04/05288 © P. Reali / M. Corti

Distributed File-Systems (mainstream) NFS: Network File System (Sun) AFS: Andrew File System (CMU) SMB: Server Message Block (Microsoft) NCFS: Network Computer FS (Oberon)

Page 289: Kernfach System Software WS04/05

System-Software WS 04/05289 © P. Reali / M. Corti

Network File System (NFS) UNIX - based (Sun) mount file system from

another machine into local directory

stateless (no open/close)

uses UDP to communicate

based on RPC and XDR (External Data Representation)– every operation is a

remote procedure call

known problems:– no caching– no disconnected mode– efficiency

security: IP based

Page 290: Kernfach System Software WS04/05

System-Software WS 04/05290 © P. Reali / M. Corti

NFS: Example

/home

corti

reali

etc

server

exports

/home/ client(rw)

mount -t nfs server:/home /home client

/home

etc

/home

corti

reali

etc

Page 291: Kernfach System Software WS04/05

System-Software WS 04/05291 © P. Reali / M. Corti

NFS No special servers (each machine can act as a

server and as a client) Cascading mounts are allowed

– mount -t nfs server1:/home /home– mount -t nfs server2:/projects/corti /home/corti/projects

Limited scalability (limited number of exports)

Page 292: Kernfach System Software WS04/05

System-Software WS 04/05292 © P. Reali / M. Corti

NFS: Stateless Protocol Each request contains a unique file identifier and an

absolute offset No concurrency control (locking has to be

performed by the applications) Committed information is assumed to be on disk

(the server cannot cache writes)

Page 293: Kernfach System Software WS04/05

System-Software WS 04/05293 © P. Reali / M. Corti

Network File System (NFS)

Virtual file system layer

System call layer

Local filesystem

NFS client

RPC / XDR

Virtual file system layer

Local filesystem

NFS server

RPC / XDR

network (UDP)

Page 294: Kernfach System Software WS04/05

System-Software WS 04/05294 © P. Reali / M. Corti

12 34

0 1

1234

Big-endian: MSB before LSB• IBM, Motorola, SPARC

Little-endian: LSB before MSB•VAX, Intel

network byte-

ordering

little end first

Remote Procedure Invocation: Overview

Problem– send structured information

from A to B– A and B may have different

memory layouts– byte order problems

– How is 0x1234 (2 bytes) represented in memory?

Page 295: Kernfach System Software WS04/05

System-Software WS 04/05295 © P. Reali / M. Corti

Marshalling / SerializationMarshalling: packing one or

more data items into a buffer using a standard representation

Presentation layer (OSI)

RPC + XDR (Sun)– RFC 1014, June 1987– RFC 1057, June 1988

IIOP / CORBA (OMG)– V2.0, February 1997– V3.0, August 2002

SOAP / XML (W3C)– V1.1, May 2000

XDR Type System [unsigned] integer (32-bit) [unsigned] hyper-integer

(64-bit) enumeration (unsigned int) boolean (enum) float / double (IEEE 32/64-

bit) opaque string array (fix + variable size) structure union void

Page 296: Kernfach System Software WS04/05

System-Software WS 04/05296 © P. Reali / M. Corti

Client

procedure P(a, b, c)•pack parameters•send message toserver

•await response

•unpack response

Server

Server

•unpack parameters• find procedure• invoke•pack response•send response

P(a, b, c)

RPC Protocol Remote procedure call Marshalling of procedure

parameters

Message format Authentication Naming

Page 297: Kernfach System Software WS04/05

System-Software WS 04/05297 © P. Reali / M. Corti

NFS

Client Server

lookup lookup

read read

write write

RPC - protocol

Page 298: Kernfach System Software WS04/05

System-Software WS 04/05298 © P. Reali / M. Corti

NFS Efficiency Stateless protocols are inherently slow

– e.g., directory lookup

Caching:– file blocks (data)– file attributes (inodes)

– read-ahead– delayed write

– tradeoff between speed and consistency

It is possible that two machines see different data

Page 299: Kernfach System Software WS04/05

System-Software WS 04/05299 © P. Reali / M. Corti

NFS: Security Exports based on IP addresses

– low security– low granularity

Data is not encrypted Permissions based on user and group ID

– uniform naming needed (e.g., NIS)

Page 300: Kernfach System Software WS04/05

System-Software WS 04/05300 © P. Reali / M. Corti

Andrew File System (AFS) 1983 CMU (later IBM, now open source) Scalable (>5000 workstations):

– network divided in clusters (cells)

Client/user mobility (files are accessible from everywhere)

Security: encrypted communication (Kerberos) Protection: control access lists Heterogeneity: clear interface to the server

Page 301: Kernfach System Software WS04/05

System-Software WS 04/05301 © P. Reali / M. Corti

Andrew File System (AFS) server provides a cell world-wide addressing

scheme (name cell) client caches a whole

file server-synchronization

on file open and close

AFS is efficient low network overhead stateful: consistency is

implemented with callbacks

callback = client is in synch with server

on store, server changes the callbacks

Page 302: Kernfach System Software WS04/05

System-Software WS 04/05302 © P. Reali / M. Corti

AFS: Logical View

/

afs

dir dir

vol

bin

binusr

Shared Space

PrivateSpace

f

Volume

Mount Point

Page 303: Kernfach System Software WS04/05

System-Software WS 04/05303 © P. Reali / M. Corti

AFS: Physical View

ethz.ch

epfl.ch

cmu.edu

clientsever

cell

network

Page 304: Kernfach System Software WS04/05

System-Software WS 04/05304 © P. Reali / M. Corti

AFS

Client Server

open open

RPC - protocol

close

Cache

read

write

close

Page 305: Kernfach System Software WS04/05

System-Software WS 04/05305 © P. Reali / M. Corti

AFS: Consistency Interaction only when opening and closing files. Writes are not visible on other machines before a

close. Clients assume that cached files are up-to-date. Servers keep track of caching by the clients

(callbacks)– clients are notified in case of changes

Page 306: Kernfach System Software WS04/05

System-Software WS 04/05306 © P. Reali / M. Corti

AFS: Kerberos Kerberos (Cerberos: three-headed dog guarding the

Hades)– authentication– accounting– audit

Needham-Schroeder shared key protocol Distributed AFS: communication is encrypted

Page 307: Kernfach System Software WS04/05

System-Software WS 04/05307 © P. Reali / M. Corti

AFS: Protection

Access lists:%> fs listacl thesis

Access list for thesis is

Normal rights:

system:anyuser l

trg rlidwk

corti rlidwka It’s possible to allow (or deny) access to users or

customized groups Restriction on: read, write, lookup, insert,

administer, lock and delete. Supports UNIX control bits.

Page 308: Kernfach System Software WS04/05

System-Software WS 04/05308 © P. Reali / M. Corti

The Eight Fallacies of Distributed Computing (Peter Deutsch)

Network Fallacies

The network is reliable Latency is zero Bandwidth is infinite The network is secure

The network topology doesn’t change

There is one administrator Transport cost is zero The network is

homogeneous

Page 309: Kernfach System Software WS04/05

System-Software WS 04/05309 © P. Reali / M. Corti

General Principles (Satyanarayan)

From DFSs we learned the following lessons: we should try to move computations to the clients use caching whenever possible special files (e.g., temporary) can be specially

treated. make scalable systems. trust the fewest possible entities batch work if possible

Page 310: Kernfach System Software WS04/05

Kernel Structure

Page 311: Kernfach System Software WS04/05

System-Software WS 04/05311 © P. Reali / M. Corti

Introduction Kernel performs “dangerous” operations

– page table mapping– scheduling

Kernel must be protected against malign user code– access to other processes’ data– increasing own processes’ priority

Kernel must have more rights than user code Solution:

– distinguish between kernel mode and user mode– access kernel through system calls– the system calls define the interface to the kernel

Page 312: Kernfach System Software WS04/05

System-Software WS 04/05312 © P. Reali / M. Corti

application application applicationsystem calls

Kernel Protection

application application application

driversmemorymanager

filesystems

Page 313: Kernfach System Software WS04/05

System-Software WS 04/05313 © P. Reali / M. Corti

Kernel Protection

Means: hardware support

– privileged instructions– supervisor mode

separate address spaces– user process has no access to kernel structures

access memory / functions through symbolic names– user has no access to hardware

Page 314: Kernfach System Software WS04/05

System-Software WS 04/05314 © P. Reali / M. Corti

Kernel Protection Privileged instructions in user mode generate a trap Mode switch:

– interrupts– gated calls (user generated sw interrupt calls)

Parameters:– stack– registers

Examples:– Intel x86: 4 protection levels (code/segment attribute),

interrupt– PowerPC: 2 levels (CPU attribute), special instruction

Page 315: Kernfach System Software WS04/05

System-Software WS 04/05315 © P. Reali / M. Corti

Linux System Calls (Intel) System calls are wrapped in libraries (e.g., libc) The library function

– writes the parameters in registers (5)– writes the parameters on the stack (>5)– writes the system call number in EAX– calls int 0x80

The kernel– jumps to the corresponding function in sys_call_table

Page 316: Kernfach System Software WS04/05

System-Software WS 04/05316 © P. Reali / M. Corti

Linux System CallsExamples: pid_t fork(void): creates a child process ssize_t write(int fd, const void *buf, size_t count): writes count bytes from buf to fd

int kill(pid_t pid, int sig): send signal to a process

int gettimeofday(struct timeval *tv, struct timezone *tz): gets the current time

int open(const char *pathname, int flags): opens a file

int ioctl(int d, int request, ...): manipulates special devices

Page 317: Kernfach System Software WS04/05

System-Software WS 04/05317 © P. Reali / M. Corti

Windows System Calls Layered system: system

call must be performed by a wrapper (NTDLL.DLL).

The system call position in the KiSystemServiceTable is not known (depends on the build)

call WriteFile()

KiSystemServiceTable

NtWriteFile()

application

KERNEL32.DLL

…int 0x2e

NTDLL.DLL

Page 318: Kernfach System Software WS04/05

System-Software WS 04/05318 © P. Reali / M. Corti

Kernel Design: API vs. System Calls

Linux system-calls are clearly

specified (POSIX standard) system-calls do not change about 100 calls

Windows system-calls are hidden only Win32 API is published Win32 is standard “thousands” of API calls,

still growing some API calls are handled

in user space More than one API:

– POSIX– OS/2

Page 319: Kernfach System Software WS04/05

System-Software WS 04/05319 © P. Reali / M. Corti

Protection and SMP What happens when two process (on two CPUs)

enter in kernel mode?– Big kernel lock: not allowed (OpenBSD, NetBSD)– Fine grained locks in the kernel (FreeBSD 5, Linux 2.6)

CPU 1 CPU 2

proc1:

int 0x80

proc1:

int 0x80

Page 320: Kernfach System Software WS04/05

System-Software WS 04/05320 © P. Reali / M. Corti

Kernel Structure monolithic kernel

– big mess, no structure, one big block, fast– MS-DOS (no protection), original UNIX– micro-kernel (AIX, OS X)

layered system– layern uses functions from layern-1

– OS/2 (some degree of layering)

virtual machine– define artificial environment for programs

client-server – tiny communication microkernel to access various

services

Page 321: Kernfach System Software WS04/05

System-Software WS 04/05321 © P. Reali / M. Corti

Monolithic Kernels

terminal controllersdevice drivers

memory controllers

schedulersignal handling

file systemswapping

virtual memory

user-levelapplications

terminal controllersdevice drivers

memory controllers

schedulersignal handling

file systemswapping

virtual memory

user-levelapplications

Monolithic Micro-kernel

Page 322: Kernfach System Software WS04/05

System-Software WS 04/05322 © P. Reali / M. Corti

Layered Systems THE operating system A layer uses only functions from

below What goes where? Less efficient

user programs

buffering I/O

console drivers

memory management

CPU scheduling

hardware

Page 323: Kernfach System Software WS04/05

System-Software WS 04/05323 © P. Reali / M. Corti

Virtual Machines VM operating system (IBM) slow and difficult to implement complete protection no sharing of resources useful for development and

research compatibility

hardware

virtual machine

procs procs procs

Page 324: Kernfach System Software WS04/05

System-Software WS 04/05324 © P. Reali / M. Corti

Design: Kernel or User Space?

Big monolithic kernel: fast (less switches) less protection

Examples: HTTP server in the Linux

kernel. graphic routines in

Windows

Modular and micro-kernels: structured more separation move code to user space less efficient more secure

Example: user level drivers

Page 325: Kernfach System Software WS04/05

System-Software WS 04/05325 © P. Reali / M. Corti

Virtual Machines Machine specification in

software– instruction set– memory layout– virtual devices– ....

JVM (Java Virtual Machine) .NET / Mono VMWare

– specified machine is a whole PC

– allows multiple PC environments on same machine

IBM VM/370

Page 326: Kernfach System Software WS04/05

Case Study: JVM

Page 327: Kernfach System Software WS04/05

System-Software WS 04/05327 © P. Reali / M. Corti

Reality is somewhat fuzzy!

Is a Pentium-II a machine?

Hardware and software are

logically equivalent

(A. Tanenbaum)

RISCCore

instructions

decoderOp1

Op2

Op3

Virtual Machines

What is a machine? does something (...useful) programmable concrete (hardware)

What is a virtual machine? a machine that is not

concrete a software emulation of a

physical computing environment

Page 328: Kernfach System Software WS04/05

System-Software WS 04/05328 © P. Reali / M. Corti

Virtual Machine, Intermediate Language Pascal P-Code (1975)

– stack-based processor– strong type machine language– compiler: one front end, many back ends– UCSD Apple][ implementation, PDP 11, Z80

Modula M-Code (1980)– high code density– Lilith as microprogrammed virtual processor

JVM – Java Virtual Machine (1995)– Write Once – Run Everywhere– interpreters, JIT compilers, Hot Spot Compiler

Microsoft .NET (2000)– language interoperability

Page 329: Kernfach System Software WS04/05

System-Software WS 04/05329 © P. Reali / M. Corti

JVM Case Study compiler (Java to bytecode) interpreter, ahead-of-time

compiler, JIT dynamic loading and linking exception Handling memory management,

garbage collection

OO model with single inheritance and interfaces

system classes to provide OS-like implementation

– compiler– class loader– runtime– system

Page 330: Kernfach System Software WS04/05

System-Software WS 04/05330 © P. Reali / M. Corti

JVM: Type System Primitive types

– byte– short– int– long– float– double– char

– reference– boolean mapped to int

Object types– classes– interfaces– arrays

Single class inheritance Multiple interface

implementation Arrays

– anonymous types– subclasses of

java.lang.Object

Page 331: Kernfach System Software WS04/05

System-Software WS 04/05331 © P. Reali / M. Corti

JVM: Java Byte-Code

Memory access tload / tstore ttload / ttstore tconst getfield / putfield getstatic / putstatic

Operations tadd / tsub / tmul / tdiv tshifts

Conversions f2i / i2f / i2l / .... dup / dup2 / dup_x1 / ...

Control ifeq / ifne / iflt / .... if_icmpeq / if_acmpeq invokestatic invokevirtual invokeinterface athrow treturn

Allocation new / newarray

Casting checkcast / instanceof

Page 332: Kernfach System Software WS04/05

System-Software WS 04/05332 © P. Reali / M. Corti

JVM: Java Byte-Code Example

bipush

Operation Push byte

Format

Forms bipush = 16 (0x10)

Operand Stack ... => ..., value

Description The immediate byte is sign-extended to an int value. That value is pushed onto the operand stack.

bipush

byte

Page 333: Kernfach System Software WS04/05

System-Software WS 04/05333 © P. Reali / M. Corti

JVM: Machine Organization

Virtual Processor stack machine no registers typed instructions no memory addresses, only

symbolic names

Runtime Data Areas pc register stack

– locals– parameters– return values

heap method area

– code

runtime constant pool native method stack

Page 334: Kernfach System Software WS04/05

System-Software WS 04/05334 © P. Reali / M. Corti

iload 5iload 6iaddistore 4

iaddv5+v6

v5

v6

locals

v4

istore 4

pro

gra

m

Time

v5

iload 5

v6

iload 6

operand stack

JVM: Execution Example

Page 335: Kernfach System Software WS04/05

System-Software WS 04/05335 © P. Reali / M. Corti

JVM: Reflection

java.lang.Class– getFields– getMethods– getConstructors

java.lang.reflect.Field– setObject getObject– setInt getInt– setFloatgetFloat– .....

java.lang.reflect.Method– getModifiers– invoke

java.lang.reflectConstructor

Load and manipulate unknown classes at runtime.

Page 336: Kernfach System Software WS04/05

System-Software WS 04/05336 © P. Reali / M. Corti

JVM: Reflection – Example

import java.lang.reflect.*;

public class ReflectionExample {

public static void main(String args[]) { try { Class c = Class.forName(args[0]); Method m[] = c.getDeclaredMethods(); for (int i = 0; i < m.length; i++) { System.out.println(m[i].toString()); } } catch (Throwable e) { System.err.println(e); } }}

Page 337: Kernfach System Software WS04/05

System-Software WS 04/05337 © P. Reali / M. Corti

JVM: Java Weaknesses

Transitive closure of java.lang.Object contains 1.1 47 1.2 178 1.3 180 1.4 248 5 (1.5) 280 classpath 0.03 299

class Object {public String toString();....

}class String {

public String toUpperCase(Locale loc);....

} public final class Locale implements Serializable, Cloneable {

....}

Page 338: Kernfach System Software WS04/05

System-Software WS 04/05338 © P. Reali / M. Corti

B

static {y = A.f();

}

A

static {x = B.f();

}

JVM: Java Weaknesses

Class static initialization T is a class and an instance of T

is created

T tmp = new T(); T is a class and a static method

of T is invoked

T.staticMethod(); A nonconstant static field of T is

used or assigned(field is not static, not final, and not initialized with compile-time constant)

T.someField = 42;

Problem circular dependencies in static

initialization code

Page 339: Kernfach System Software WS04/05

System-Software WS 04/05339 © P. Reali / M. Corti

JVM: Java Weaknessesinterface Example {

final static String labels[] = {“A”, “B”, “C”}}

hidden static initializer:labels = new String[3];

labels[0] = “A”; labels[1] = “B”; labels[2] = “C”;

Warning: in Java final means write-once! interfaces may contain code

Page 340: Kernfach System Software WS04/05

System-Software WS 04/05340 © P. Reali / M. Corti

JVM: Memory Model The JVM specs define a memory model:

– defines the relationship between variables and the underlying memory

– meant to guarantee the same behavior on every JVM

The compiler is allowed to reorder operations unless synchronized or volatile is specified.

Page 341: Kernfach System Software WS04/05

System-Software WS 04/05341 © P. Reali / M. Corti

JVM: Reordering read and writes to ordinary variables can be

reordered.public class Reordering { int x = 0, y = 0;

public void writer() { x = 1; y = 2; }

public void reader() { int r1 = y; int r2 = x; }}

Page 342: Kernfach System Software WS04/05

System-Software WS 04/05342 © P. Reali / M. Corti

JVM: Memory Model synchronized: in addition to specify a monitor it

defines a memory barrier:– acquiring the lock implies an invalidation of the caches– releasing the lock implies a write back of the caches

synchronized blocks on the same object are ordered.

order among accesses to volatile variables is guaranteed (but not among volatile and other variables).

Page 343: Kernfach System Software WS04/05

System-Software WS 04/05343 © P. Reali / M. Corti

JVM: Double Checked Lock

Singleton

public class SomeClass {

private static Resource resource = null;

public Resource synchronized getResource() { if (resource == null) { resource = new Resource(); } return resource; }}

Page 344: Kernfach System Software WS04/05

System-Software WS 04/05344 © P. Reali / M. Corti

JVM: Double Checked Lock

Double checked locking

public class SomeClass {

private static Resource resource = null;

public Resource getResource() { if (resource == null) { synchronized (this) { if (resource == null) { resource = new Resource(); } } } return resource; }}

Page 345: Kernfach System Software WS04/05

System-Software WS 04/05345 © P. Reali / M. Corti

JVM: Double Checked Lock

Thread 1 Thread 2

public class SomeClass {

private Resource resource = null;

public Resource getResource() { if (resource == null) { synchronized { if (resource == null) { resource = new Resource(); } } } return resource; }}

public class SomeClass {

private Resource resource = null;

public Resource getResource() { if (resource == null) { synchronized { if (resource == null) { resource = new Resource(); } } } return resource; }}

The object isinstantiated

but not yet initialized!

Page 346: Kernfach System Software WS04/05

System-Software WS 04/05346 © P. Reali / M. Corti

JVM: Immutable Objects are not Immutable Immutable objects:

– all types are primitives or references to immutable objects– all fieds are final

Example (simplified): java.lang.String– contains

an array of characters the length an offset

– example: s = “abcd”, length = 2, offset = 2, string = “cd”

String s1 = “/usr/tmp”String s2 = s1.substring(4); //should contain “/tmp”

Sequence: s2 is instantiated, the fields are initialized (to 0), the array is copied, the fields are written by the constructor.

What happens if instructions are reordered?

Page 347: Kernfach System Software WS04/05

System-Software WS 04/05347 © P. Reali / M. Corti

JVM: Reordering Volatile and Nonvolatile Stores

volatile reads and writes are totally ordered among threads

but not among normal variables example

Thread 1 Thread 2

o = new SomeObject;initialized = true;

while (!initialized) {sleep();

}o.field = 42;

volatile boolean initialized = false;SomeObject o = null;

?

Page 348: Kernfach System Software WS04/05

System-Software WS 04/05348 © P. Reali / M. Corti

JVM: JSR 133 Java Community Process Java memory model revision

Final means final Volatile fields cannot be reordered

Page 349: Kernfach System Software WS04/05

System-Software WS 04/05349 © P. Reali / M. Corti

Java JVM: Execution Interpreted (e.g., Sun JVM)

– bytecode instructions are interpreted sequentially– the VM emulates the Java Virtual Machine– slower– quick startup

Just-in-time compilers (e.g., Sun JVM, IBM JikesVM)– bytecode is compiled to native code at load time (or later)– code can be optimized (at compile time or later)– quicker– slow startup

Ahead-of time compilers (e.g., GCJ)– bytecode is compiled to native code offline– quick startup– quick execution– static compilation

Page 350: Kernfach System Software WS04/05

System-Software WS 04/05350 © P. Reali / M. Corti

JVM: Loader – The Classfile Format

ClassFile {

version

constant pool

flags

super class

interfaces

fields

methods

attributes

}

Constants: Values

String / Integer / Float / ... References

Field / Method / Class / ...

Attributes: ConstantValue Code Exceptions

Page 351: Kernfach System Software WS04/05

System-Software WS 04/05351 © P. Reali / M. Corti

JVM: Class File Formatclass HelloWorld {

public static void printHello() {System.out.println("hello, world");

}

public static void main (String[] args) {HelloWorld myHello = new HelloWorld();myHello.printHello();

}

}

Page 352: Kernfach System Software WS04/05

System-Software WS 04/05352 © P. Reali / M. Corti

JVM: Class File (Constant Pool)1. String hello, world2. Class HelloWorld3. Class java/io/PrintStream4. Class java/lang/Object5. Class java/lang/System6. Methodref HelloWorld.<init>()7. Methodref

java/lang/Object.<init>()8. Fieldref java/io/PrintStream

java/lang/System.out9. Methodref

HelloWorld.printHello()10. Methodref

java/io/PrintStream.println(java/lang/String )

11. NameAndType <init> ()V12. NameAndType out

Ljava/io/PrintStream;13. NameAndType printHello ()V14. NameAndType println

(Ljava/lang/String;)V

15. Unicode ()V16. Unicode (Ljava/lang/String;)V17. Unicode

([Ljava/lang/String;)V18. Unicode <init>19. Unicode Code20. Unicode ConstantValue21. Unicode Exceptions22. Unicode HelloWorld23. Unicode HelloWorld.java24. Unicode LineNumberTable25. Unicode Ljava/io/PrintStream;26. Unicode LocalVariables27. Unicode SourceFile28. Unicode hello, world29. Unicode java/io/PrintStream30. Unicode java/lang/Object31. Unicode java/lang/System32. Unicode main33. Unicode out34. Unicode printHello

Page 353: Kernfach System Software WS04/05

System-Software WS 04/05353 © P. Reali / M. Corti

JVM: Class File (Code)Methods

0 <init>() 0 ALOAD0 1 INVOKESPECIAL [7] java/lang/Object.<init>() 4 RETURN

1 PUBLIC STATIC main(java/lang/String []) 0 NEW [2] HelloWorld 3 DUP 4 INVOKESPECIAL [6] HelloWorld.<init>() 7 ASTORE1 8 INVOKESTATIC [9] HelloWorld.printHello() 11 RETURN

2 PUBLIC STATIC printHello() 0 GETSTATIC [8] java/io/PrintStream java/lang/System.out 3 LDC1 hello, world 5 INVOKEVIRTUAL [10] java/io/PrintStream.println(java/lang/String ) 8 RETURN

Page 354: Kernfach System Software WS04/05

System-Software WS 04/05354 © P. Reali / M. Corti

JVM: Compilation – Pattern Expansion Each byte code is translated according to fix

patterns+ easy- limited knowledge

Example (pseudocode) switch (o) {case ICONST<n>: generate(“push n”); PC++; break;case ILOAD<n>: generate(“push off_n[FP]”); PC++; break;case IADD: generate(“pop -> R1”); generate(“pop -> R2”); generate(“add R1, R2 -> R1”); generate(“push R1”); PC++; break;…

Page 355: Kernfach System Software WS04/05

System-Software WS 04/05355 © P. Reali / M. Corti

JVM: Optimizing Pattern Expansion

Main Idea: use internal virtual stack stack values are consts / fields / locals / array

fields / registers / ... flush stack as late as possible

iload 4iload 5iaddistore 6

local4 local4

local5

EAX

local5

EAX

MOV EAX, off4[FP] ADD EAX, off5[FP]

iload4 iload5 iadd istore6

MOV off6[FP], EAXemittedcode

virtualstack

Page 356: Kernfach System Software WS04/05

System-Software WS 04/05356 © P. Reali / M. Corti

JVM: Compiler Comparison

pattern expansionpush off4[FP]

push off5[FP]

pop EAX

add 0[SP], EAX

pop off6[FP]

optimizedmov EAX, off4[FP]

add EAX, off5[FP]

mov off6[FP], EAX

iload_4iload_5iaddistore_6

5 instructions9 memory accesses

3 instructions3 memory accesses

Page 357: Kernfach System Software WS04/05

System-Software WS 04/05357 © P. Reali / M. Corti

Linking (General) A compiled program contains references to external

code (libraries) After loading the code the system need to link the

code to the library– identify the calls to external code– locate the callees (and load them if necessary)– patch the loaded code

Two options:– the code contains a list of sites for each callee– the calls to external code are jumps to a procedure

linkage table which is then patched (double indirection)

Page 358: Kernfach System Software WS04/05

System-Software WS 04/05358 © P. Reali / M. Corti

Linking (General)

0 instr

1 instr

2 jump -

3 instr

4 instr

5 jump -

6 instr

7 jump 2

9 instr

10 instr

proc 0 5

proc 1 7

0 instr

1 instr

2 jump 101

3 instr

4 instr

5 jump 100

6 instr

7 jump 101

9 instr

10 instr

100 jump

101 jump

Page 359: Kernfach System Software WS04/05

System-Software WS 04/05359 © P. Reali / M. Corti

Linking (General)

0 instr

1 instr

2 jump &p1

3 instr

4 instr

5 jump &p0

6 instr

7 jump &p1

9 instr

10 instr

proc 0 5

proc 1 7

0 instr

1 instr

2 jump 101

3 instr

4 instr

5 jump 100

6 instr

7 jump 101

9 instr

10 instr

100 jump &p0

101 jump &p1

Page 360: Kernfach System Software WS04/05

System-Software WS 04/05360 © P. Reali / M. Corti

JVM: Linking Bytecode interpreter

– references to other objects are made through the JVM (e.g., invokevirtual, getfield, …)

Native code (ahead of time compiler)– static linking– classic native linking

JIT compiler– only some classes are compiled– calls could reference classes that are not yet loaded or

compiled (delayed compilation) code instrumentation

Page 361: Kernfach System Software WS04/05

System-Software WS 04/05361 © P. Reali / M. Corti

JVM: Methods and Fields Resolution method and fields are accessed through special VM

functions (e.g., invokevirtual, getfield, …) the parameters of the special call defines the target the parameters are indexes in the constant pool the VM checks id the call is legal and if the target is

presentl

Page 362: Kernfach System Software WS04/05

System-Software WS 04/05362 © P. Reali / M. Corti

class A {

....

...B.x}

class B {int x;

}

B.x CheckClass(B);B.x

IF ~B.initialized THEN

Initialize(B)END;

JVM: JIT – Linking and Instrumentation Use code instrumentation to detect first access of static

fields and methods

Page 363: Kernfach System Software WS04/05

System-Software WS 04/05363 © P. Reali / M. Corti

C header C source

Compiler

ObjectFileObject

FileObjectFile

Object file

Linker

C header

Loader

LoadedCode

Compilation and Linking Overview

Page 364: Kernfach System Software WS04/05

System-Software WS 04/05364 © P. Reali / M. Corti

Oberonsource

Compiler

ObjectFileObject

FileObjectFile

Object &Symbol

LoaderLinker

LoadedModule

LoadedModule

LoadedModuleLoaded

Module

Compilation and Linking Overview

Page 365: Kernfach System Software WS04/05

System-Software WS 04/05365 © P. Reali / M. Corti

LoaderLinker

JITCompiler

Javasource

ClassFile

Compiler

ClassLoader

Class

ReflectionAPI

ClassClass

ClassClass

Compilation and Linking Overview

Page 366: Kernfach System Software WS04/05

System-Software WS 04/05366 © P. Reali / M. Corti

Jaos Jaos (Java on Active Object System) is a Java

virtual machine for the Bluebottle system goals:

– implement a JVM for the Bluebottle system– show that the Bluebottle kernel is generic enough to

support more than one system– interoperability between the Active Oberon and Java

languages– interoperability between the Oberon System and the Java

APIs

Page 367: Kernfach System Software WS04/05

System-Software WS 04/05367 © P. Reali / M. Corti

Metadata

LoaderLinker

LoadedModuleLoaded

Module

OberonLoaderLinker

LoadedModule

Oberonsource

Compiler

Object &Symbol

OberonMetadata

Loader

OberonBrowser

JavaReflection

API

JITCompiler

LoadedClass

Linker

ClassFile

LoaderJava

MetadataLoader

Jaos (Interoperability Framework)

Page 368: Kernfach System Software WS04/05

System-Software WS 04/05368 © P. Reali / M. Corti

JVM: Verification Compiler generates

“good” code.... .... that could be

changed before reaching the JVM

need for verification

Verification makes the VM simpler (less run-time checks):

– no operand stack overflow– load / stores are valid– VM types are correct– no pointer forging– no violation of access

restrictions– access objects as they are

(type)– local variable initialized before

load– …

Page 369: Kernfach System Software WS04/05

System-Software WS 04/05369 © P. Reali / M. Corti

JVM: Verification

Pass1 (Loading): class file version check class file format check class file complete

Pass 2 (Linking): final classes are not

subclassed every class has a

superclass (but Object) constant pool references constant pool names

Page 370: Kernfach System Software WS04/05

System-Software WS 04/05370 © P. Reali / M. Corti

Byte-CodeVerification

Delayed forperformance

reasons

JVM: Verification

Pass 3 (Linking):

For each operation in code

(independent of the path): operation stack size is the

same accessed variable types are

correct method parameters are

appropriate field assignment with

correct types opcode arguments are

appropriate

Pass 4 (RunTime):

First time a type is referenced: load types when referenced check access visibility class initialization

First member access: member exists member type same as

declared current method has right to

access member

Page 371: Kernfach System Software WS04/05

System-Software WS 04/05371 © P. Reali / M. Corti

JVM: Byte-Code Verification

Verification: branch destination must

exists opcodes must be legal access only existing locals code does not end in the

middle of an instruction

types in byte-code must be respected

execution cannot fall of the end of the code

exception handler begin and end are sound

Page 372: Kernfach System Software WS04/05

Addendum: Security

Page 373: Kernfach System Software WS04/05

System-Software WS 04/05373 © P. Reali / M. Corti

Security internal protection

– memory protection– file system accesses

external protection– accessibility

problems:– program threats

Page 374: Kernfach System Software WS04/05

System-Software WS 04/05374 © P. Reali / M. Corti

Security: Program Threats Trojan horses: a code segment

that misuses its environment– mail attachments– web downloads (e.g., SEXY.EXE

which formats your hard disk)– programs with the same name as

common utilities– misleading names (e.g.,

README.TXT.EXE) Trap door (in programs or

compilers): an intentional hole in the software

Page 375: Kernfach System Software WS04/05

System-Software WS 04/05375 © P. Reali / M. Corti

Security: System Threats worms: a standalone program that spawns other

processes (copies of itself) to reduce system performance– example: Morris worm (1988)

exploited holes in rsh, finger and sendmail to gainaccess to other machines

once on the other machine it was able to replicate itself

– used by spammers to spread and distribute spamming applications

viruses: similar to worms but embedded in other programs– they usually infect other programs and

the boot sector

Page 376: Kernfach System Software WS04/05

System-Software WS 04/05376 © P. Reali / M. Corti

Security: System Threats Denial of service

– perform many requests to steal all the available resources– often distributed (using worms)

Example: SYN flooding attacks– the attacker tries to connect– the victim answers with a synchronize and acknowledge

packet– and waits for acknowledgment

Countermeasures– active filtering– request dropping– cookie based protocols (requests must be authenticated)– stateless protocols

Page 377: Kernfach System Software WS04/05

System-Software WS 04/05377 © P. Reali / M. Corti

Security: System Threats badly implemented and designed software:

– lpr (setuid) with an option to delete the printed file– mkdir (first create the inode then change the owner)

it was possible to change the inode before the chown …– buffer overflows– password in memory or swap files– insecure protocols (FTP, SMTP)– missing sanity checks (syscalls, command in input, …)– short keys and passwords– proprietary protocols

Page 378: Kernfach System Software WS04/05

System-Software WS 04/05378 © P. Reali / M. Corti

Bad design: A very recent example Texas Instruments produces RFID tags offering

cryptographic functionalities. used for cars and electronic payments 40 bit keys proprietary protocol Attack from Johns Hopkins University and RSA

Labs– less than 2 hours for 5 keys– less than 3500$

Page 379: Kernfach System Software WS04/05

System-Software WS 04/05379 © P. Reali / M. Corti

Security: Buffer Overflows Overwrite a function’s return

address

function foo(int p1, int p2) { char array[10]; strcpy(array, someinput);}

array

FP

RET

p1 & p2

array

Avoid strcpy and check the length, e.g., strncpy

Page 380: Kernfach System Software WS04/05

System-Software WS 04/05380 © P. Reali / M. Corti

Security: Monitoring check for suspicious patterns

– login times

audit logs periodic scans for security holes (bad passwords,

set-uid programs, changes to system programs)– system integrity checks (checksums for executable files)

[tripwire]

network services– monitor network activity

Page 381: Kernfach System Software WS04/05

System-Software WS 04/05381 © P. Reali / M. Corti

Example: Firewalling Many applications use network sockets to

communicate (even on a single machine) Many applications are not protected

Solution: filter all the incoming connections by default and allow only the trusted ones

Page 382: Kernfach System Software WS04/05

System-Software WS 04/05382 © P. Reali / M. Corti

Security: (some) Design Principles Open systems (programs and protocols) Default is deny access Check for current authority (timeouts, …) Give the least privilege possible Simple protection mechanisms Do not ask to much to the users (or they will avoid

to protect themselves)

Page 383: Kernfach System Software WS04/05

System-Software WS 04/05383 © P. Reali / M. Corti

Security and Systems: Some Examples

Enhancements to memory management: Intel XD bit, AMD NX bit mark pages according to the content (data or code) an exception is generated if the PC is moved to a

data address prevents some buffer overflow attacks dynamically generated code has to be generated

through special system calls Windows XP SP2, Linux, BSD …

Page 384: Kernfach System Software WS04/05

System-Software WS 04/05384 © P. Reali / M. Corti

Security and Systems: Some Examples

SELinux National Security Agency (USA) patches to the Linux kernel to enforce mandory

access control open source independent from the traditional UNIX roles (users

and groups) configurable policies restricting what a program is

able to do

Page 385: Kernfach System Software WS04/05

System-Software WS 04/05385 © P. Reali / M. Corti

Security and Systems: Some Examples

OpenBSDaudit process (proactive bug search)random gaps in the stackProPolice: gcc puts a random integer on the

stack in a call prologue and checks it when returning

W^X: pages are writable xor executable

Page 386: Kernfach System Software WS04/05

System-Software WS 04/05386 © P. Reali / M. Corti

Security and Systems: Some Examples

OpenBSDrandomized shared library order and

addressesmmap() and malloc() return randomized

addressesguard pages between objectsprivilege separation and revocation

Page 387: Kernfach System Software WS04/05

System-Software WS 04/05387 © P. Reali / M. Corti

Privilege Separation unprivileged child process to contain and restrict

the effects of programming errors e.g., openssh

listen *22network connection

monitornetwork

processing

request authauth result

key exchange

authentication

fork unprivileged child

monitoruser requestprocessing

request PTYpass PTY user network data

state export

fork user child

time