kernfach system software ws04/05
DESCRIPTION
Kernfach System Software WS04/05. P. Reali M. Corti. Introduction Admin. Lecture Mo 13-14IFW A 36 We 10-12IFW A 36 Exercises Always on Thursday. 14-15IFW A34C. Tuduce(E) 14-15IFW C42V. Naoumov(E) 15-16IFW A32.1I. Chihaia(E) 15-16RZ F21C. Tuduce (E) - PowerPoint PPT PresentationTRANSCRIPT
1
KernfachSystem Software
WS04/05
P. Reali
M. Corti
System-Software WS 04/05
2
© P. Reali / M. Corti
IntroductionAdmin
Lecture– Mo 13-14 IFW A 36– We 10-12 IFW A 36
Exercises– Always on Thursday
14-15 IFW A34 C. Tuduce (E)14-15 IFW C42 V. Naoumov (E)15-16 IFW A32.1 I. Chihaia (E)15-16 RZ F21 C. Tuduce (E)16-17 IFW A34 T. Frey (E)16-17 IFW A32.1 K. Skoupý (E)
System-Software WS 04/05
3
© P. Reali / M. Corti
IntroductionAdditional Info
Internet– Homepage http://www.cs.inf.ethz.ch/ssw/ – Inforum vis site
Textbooks & Co.– Lecture Slides– A. Tanenbaum, Modern Operating Systems– Silberschatz / Gavin, Operating Systems Concepts– Selected articles and book chapters
System-Software WS 04/05
4
© P. Reali / M. Corti
IntroductionExercises
Exercises are optional(feel free to shoot yourself in the foot)
– Weekly paper exercisestest the knowledge acquired in the lectureidentify troubles earlyexercise questions are similar to the exam ones
– Monthly programming assignmentfeel the gap between theory and practice
System-Software WS 04/05
5
© P. Reali / M. Corti
IntroductionExam
Sometimes in March 2005 Written, 3 hours Allowed help
– 2 A4 page summary– calculator
Official Q&A session 2 weeks before the exam
System-Software WS 04/05
6
© P. Reali / M. Corti
IntroductionLecture Goals
Operating System Concepts– bottom-up approach– no operating system course– learn most important concepts– feel the complexity of operating systems
there‘s no silver-bullet!
Basic knowledge for other lectures / term assignments– Compilerbau– Component Software– ....– OS-related assignments
System-Software WS 04/05
7
© P. Reali / M. Corti
IntroductionWhat is an operating system?
An operating system has two goals: Provide an abstraction of the hardware
– ABI (application binary interface)– API (application programming interface)– hide details
Manage resources– time and space multiplexing– resource protection
System-Software WS 04/05
8
© P. Reali / M. Corti
IntroductionOperating system target machines
Targets mainframes servers multiprocessors desktops real-time systems embedded systems
Different goals and requirements!
memory efficiency reaction time abstraction level resources security ...
System-Software WS 04/05
9
© P. Reali / M. Corti
IntroductionMemory vs. Speed Tradeoff
Example: retrieve a list of namesmemory time
1. Array Nn N2. List N(n+4) N/23. Bin. Tree N(n+8)
log(N)4. Hash Table 3Nn 1
N = # namesn = name length
System-Software WS 04/05
10
© P. Reali / M. Corti
IntroductionOperating System as resource manager
... in the beginning was the hardware!
Most relevant resources:
CPU Memory Storage Network
System-Software WS 04/05
11
© P. Reali / M. Corti
IntroductionLecture Topics
MemoryCPU Network
Abs
trac
tion
leve
l
Disk
Scheduling
Virtual Memory
Demand Paging
Thread
Process
Coroutine
Memory Management
Garbage CollectionConcurrencySupport
File System
Object-OrientedRuntime Support
DistributedFile-System
DistributedObject-System
Virtual Machine
Runtime support
System-Software WS 04/05
12
© P. Reali / M. Corti
IntroductionA word of warning....
Most of the topics may seem simple.....
.... and in fact they are!
Problems are mostly due to: complexity when integrating system low-level („bit fiddling“) details bootstrapping (X needs Y, Y needs X)
System-Software WS 04/05
13
© P. Reali / M. Corti
Locks
Storage
Modules
Processor
Memory
Interrupts
ActiveTraps
TimersSMP
IntroductionBootstrapping (Aos)
Leve
l
System-Software WS 04/05
14
© P. Reali / M. Corti
IntroductionLecture Topics
Overview
Runtime Support
Virtual Addressing
Memory Management
Distributed Obj. System
Concurrency
Concurrency
Disc / Filesystem
Case Study: JVM
Oct
‘04
Nov
‘04
Dec
‘04
Jan
‘05
Feb
‘05
System-Software WS 04/05
15
© P. Reali / M. Corti
Run-time SupportOverview
Support for programming abstractions– Procedures
calling conventions parameters
– Object-Oriented Model objects methods (dynamic dispatching)
– Exceptions Handling– ... more ...
System-Software WS 04/05
16
© P. Reali / M. Corti
Call a.P Call b.Q Call b.q Call b.q Return b.q Return b.q Call c.R Return c.R Return b.QReturn a.P
a.P
b.Q
b.q
1
1
Run-time SupportApplication Binary Interface (ABI)
Object a, b, c, … with methods P, Q, R, … and internal procedures p, q, r, …
Call SequenceStack
Pointer (SP)
ProcedureActivation
Frame (PAF)a.P
b.Q
b.q
b.q
2
2
a.P
b.Q
3
3
a.P
b.Q
c.R
4
4
Stack
System-Software WS 04/05
17
© P. Reali / M. Corti
locals
params
Run-time SupportProcedure Activation Frame
DynamicLink
FramePointer (FP)
Save RegistersPush ParametersSave PCBranchSave FPFP := SPAllocate Locals
Remove Locals Restore FPRestore PCRemove ParametersRestore Registers
FP‘PC
StackPointer (SP)
Ca
ller
Ca
llee
CallerFrame
Call
Return
Ca
ller
System-Software WS 04/05
18
© P. Reali / M. Corti
Run-time SupportProcedure Activation Frame, Optimizations
Many optimizations are possible– use registers instead of stack– register windows– procedure inlining– use SP instead of FP addressing
System-Software WS 04/05
19
© P. Reali / M. Corti
Run-time SupportProcedure Activation Frame (Oberon / x86)
push paramscall P push fp
mov fp, spsub sp, size(locals)
mov sp, fppop fpret size(params)
Caller Callee
...
push pcpc := P
pop pcadd sp,size(params)
System-Software WS 04/05
20
© P. Reali / M. Corti
Run-time SupportCalling Convention
Convention between caller and callee– how are parameters passed
data layout left-to-right, right-to-left registers register window
– stack layout dynamic link static link
– register saving reserved registers
System-Software WS 04/05
21
© P. Reali / M. Corti
Run-time SupportCalling Convention (Oberon)
Parameter passing:– on stack (exception: Oberon/PPC uses registers)– left-to-right– self (methods only) as last parameter– structs and arrays passed as reference, value-parameters
copied by the callee Stack
– dynamic link– static link as last parameter (for local procedures)
Registers– saved by caller
System-Software WS 04/05
22
© P. Reali / M. Corti
Run-time SupportCalling Convention (C)
Parameter passing:– on stack – right-to-left– arrays passed as reference (arrays are pointers!)
Stack– dynamic link
Registers– some saved by caller
System-Software WS 04/05
23
© P. Reali / M. Corti
Run-time SupportCalling Convention (Java)
Parameter passing– left-to-right– self as first parameter– parameters pushed as operands– parameters accessed as locals– access through symbolic, type-safe operations
System-Software WS 04/05
24
© P. Reali / M. Corti
Run-time SupportObject Oriented Support, Definitions
Obj x = new ObjA();
• static type of x is Obj• dynamic type of x is ObjA
x compiled as being compatible with Obj, but executes as ObjA.
static and dynamic type can be different the system must keep track of the
dynamic type with an hidden„type descriptor“
Obj0
Obj
ObjA
ObjB
Class Hierarchy
Polymorphism
System-Software WS 04/05
25
© P. Reali / M. Corti
Run-Time SupportPolymorphism
VARt: Triangle;s: Square;o: Figure;
BEGINt.Draw();s.Draw();o.Draw();
END;
WHILE p # NIL DOp.Draw(); p := p.next
END;
Type is discovered at runtime!
Type is statically known!
System-Software WS 04/05
26
© P. Reali / M. Corti
Run-time SupportObject Oriented Support, Definitions
Obj x = new ObjA();
if (x IS ObjA) { ... }// type test
ObjA y = (ObjA)x// type cast
x = y;// type coercion// (automatic convertion)
Obj0
Obj
ObjA
ObjB
Class Hierarchy
System-Software WS 04/05
27
© P. Reali / M. Corti
Run-time SupportObject Oriented Support (High-level Java)
.... a IS T ....
if (a != null) {Class c = a.getClass();while ((c != null) && (c != T)) {
c = c.getSuperclass();}return c == T;
} else {return false;
}
Type Test Implementation
System-Software WS 04/05
28
© P. Reali / M. Corti
Run-Time SupportType Descriptors
struct TypeDescriptor {int level;type[] extensions;method[] methods;
}
class Object {TypeDescriptor type;
}
many type-descriptor layouts are possible
layout depends on the optimizations choosen
System-Software WS 04/05
29
© P. Reali / M. Corti
Run-Time SupportType Tests and Casts
0
1
2
Obj0
Obj
ObjAObjB
0: Obj01: NIL2: NIL3: NIL
TD(Obj0)
TD(Obj)
0: Obj01: Obj2: NIL3: NIL
0: Obj01: Obj2: ObjA3: NIL
TD(ObjA)
(obj IS T)
obj.type.extension[ T.level ] = T
mov EAX, objmov EAX, -4[EAX]cmp T, -4 * T.level - 8[EAX]bne ....
“extension level”
System-Software WS 04/05
30
© P. Reali / M. Corti
Run-time SupportObject Oriented Support (High-level Java)
.... a.M(.....) ....
Class[] parTypes = new Class[params.Length()];for (int i=0; i< params.Length(); i++) {
parTypes[i] = params[i].getClass();}Class c = a.getClass();Method m = c.getDeclaredMethod(“M”, parTypes);res = m.invoke(self, parValues);
Method Call Implementation
Use method implementation for the actual
class(dynamic type)
System-Software WS 04/05
31
© P. Reali / M. Corti
Disadvantages:• memory usage• bad integration (explicit self)• non constantAdvantages:• instance bound• can be changed at run-time
Run-Time SupportHandlers / Function Pointers
TYPESomeType = POINTER TO SomeTypeDesc;Handler = PROCEDURE (self: SomeType; param: Par);SomeTypeDesc = RECORD
handler: Handler;next: SomeType;
END
handler
next handler
next handler
next
PROC Q
PROC R
root
System-Software WS 04/05
32
© P. Reali / M. Corti
Run-Time SupportMethod tables (vtables)
TYPEA = OBJECT
PROCEDURE M0;PROCEDURE M1;
END A;
B = OBJECT (A)PROCEDURE M0;PROCEDURE M2;
END B;
B.M0 overrides A.M0
B.M2 is new
0: A.M0
1: A.M1
A.MethodTable
0: A.M0
1: A.M1
B.MethodTable
2: B.M2
B.M0
Idea:have a per-type table of function pointers.
• New methods add a new entry in the method table• Overrides replace an entry in the method table• Each method has an unique entry number
System-Software WS 04/05
33
© P. Reali / M. Corti
Run-Time SupportMethod tables
TYPEA = OBJECT
PROCEDURE M0;PROCEDURE M1;
END A;
B = OBJECT (A)PROCEDURE M0;PROCEDURE M2;
END B;
0: A.M0
1: A.M1
A.MethodTable
0: A.M0
1: A.M1
B.MethodTable0: B.M0
2: B.M2
Virtual Dispatch
o.M0;
call o.Type.Methods[0]
mov eax, VALUE(o)mov eax, type[eax]mov eax, off + 4*mno[eax]call eax
o
Fields
Type
System-Software WS 04/05
34
© P. Reali / M. Corti
Run-Time SupportOberon Type Descriptors
obj size
obj fields
ext table
mth table
type name
type desc
td size
type desc
• method table• superclass table• pointers in object for GC
type descriptor is also an object!
type desc
ptr offsetsfor garbage collection
for object allocation
for type checks
for method invocation
System-Software WS 04/05
35
© P. Reali / M. Corti
Run-Time SupportInterfaces, itables
interface A {void m();
}
interface B {void p();
}
Object x;A y = (A)x;
y.m();
does x implement A?
x has an method table (itable) for each
implemented interface
multiple itables:how is the right itable
discovered?
System-Software WS 04/05
36
© P. Reali / M. Corti
Run-Time SupportInterface support
How to retrieve the right method table (if any)? Global table indexed by [class, interface] Local (per type) table / list indexed by
[interface]
Many optimizations are availableuse the usual trick:
enumerate interfaces
System-Software WS 04/05
37
© P. Reali / M. Corti
Run-Time SupportInterface support (I)
methodtable(vtable)
interfaces
methodtable(itable)
Intf0
methodtable(itable)
Intf7
Type Descriptor
Intf0 y = (Intf0)x;y.M();
interface i = x.type.interfaces;while ((i != null) && (i != Intf0) {
i = i.next;}if (i != null) i.method[mth_nr]();
Call is expensive because requires traversing a list: O(N) complexity
System-Software WS 04/05
38
© P. Reali / M. Corti
Run-Time SupportInterface support (II)
vtable
interfaces
itable2
0
itable7
1 2 3 4 5 6 7 sparse array!
Intf0 y = (Intf0)x;y.M();
interface i = x.type.interfaces[Intf0];
if (i != null) i.method[mth_nr]();
Lookup is fast (O(1)), but wastes
memory
Type Descriptor
System-Software WS 04/05
39
© P. Reali / M. Corti
Run-Time SupportInterface Implementation (III)
vtablet
interfaces
itablet,2
0
itablet,7
1 2 3 4 5 6 7vtablet
interfaces
itableu,2
itableu,0
0 1 2 3 4 5 6 7
overlapinterface table
indexType Descriptor t
Type Descriptor u
System-Software WS 04/05
40
© P. Reali / M. Corti
Run-Time SupportInterface Implementation (III)
vtable
interfaces
itable
itable
vtable
interfaces
itable
itable
overlappedinterface table indexType Descriptor
Type Descriptor
System-Software WS 04/05
41
© P. Reali / M. Corti
Run-Time SupportInterface Implementation (III)
vtable
interfaces
itable
itable
itable
itable
overlappedinterface
tables
Type DescriptorIntf0 y = (Intf0)x;y.M();
itable i = x.type.interfaces[Intf0];
if ((i != null) && (i in x.type))i.method[mth_nr]();
System-Software WS 04/05
42
© P. Reali / M. Corti
Run-Time Support Exceptions
void catchOne() {
try {
tryItOut();
} catch (TestExc e) {
handleExc(e);
}
}
void catchOne()0 aload_01 invokevirtual tryItOut();4 return5 astore_16 aload_07 aload_18 invokevirtual handleExc11 return
ExceptionTableFrom To Target Type0 4 5 TestExc
System-Software WS 04/05
43
© P. Reali / M. Corti
Run-Time Support Exception Handling / Zero Overhead
void ExceptionHandler(state){
pc = state.pc, exc = state.exception;
while (!Match(table[i], pc, exc)){
i++;if (i == TableLength) {
PopActivationFrame(state); pc = state.pc; i = 0;
}}state.pc = table[i].pchandler;ResumeExecution(state)
}
try {.....
} catch (Exp1 e) {.....
} catch (Exp2 e) {.....
}
pcstart
pcend
pchandler1
pchandler2
start end exception handler
pcstart pcend Exp1 pchandler1
pcstart pcend Exp2 pchandler2
Global Exception Table
System-Software WS 04/05
44
© P. Reali / M. Corti
Run-Time Support Exception Handling / Zero Overhead
exception table filled by the loader / linker traverse whole table for each stack frame system has default handler for uncatched
exceptions
no exceptions => no overhead exception case is expensive
system optimized for normal case
System-Software WS 04/05
45
© P. Reali / M. Corti
Run-Time Support Exception Handling / Fast Handling
try {.....
} catch (Exp1 e) {.....
} catch (Exp2 e) {.....
}
pchandler1
pchandler2
try {save (FP, SP, Exp1, pchandler1)save (FP, SP, Exp2, pchandler2).....remove catch descr.jump end
} catch (Exp1 e) {.....remove catch descr.jump end
} catch (Exp2 e) {.....remove catch descr.
jump end}end:
push catchdescriptors on
the stack
add codeinstrumentation
use an exception stack to
keep track of the handlers
System-Software WS 04/05
46
© P. Reali / M. Corti
Run-Time Support Exception Handling / Fast Handling
void ExceptionHandler(ThreadState state){
int FP, SP, handler;Exception e;
do{retrieve(FP, SP, e, handler);
} while (!Match(state.exp, e));
state.fp = FP; // set frame to the onestate.sp = SP; // containing the handlerstate.pc = handler; // resume with the handlerResumeExecution(state)
}
pop next exception descriptor from exception stack
can resume in a different
activation frame
System-Software WS 04/05
47
© P. Reali / M. Corti
Run-Time Support Exception Handling / Fast Handling
code instrumentation insert exception descriptor at try remove descriptor before catch
fast exception handling overhead even when no exceptions
system optimized for exception case
System-Software WS 04/05
48
© P. Reali / M. Corti
Virtual Addressing Overview
Virtual Addressing: abstraction of the MMU(Memory Management Unit)
Work with virtual addresses, whereaddressreal = f(addressvirtual)
Provides decoupling from real memory– virtual memory– demand paging– separated address spaces
System-Software WS 04/05
49
© P. Reali / M. Corti
Virtual AddressingPages
Memory as array of pages
12345
76
0
3
1
20
0
5
virtual address-space 2
real memory:pool of page frames
virtual address-space 1
unmapped(invalid) page
pagepageframe
unmappedrange
mapping
programs use and run in this
address spaces
memory address
System-Software WS 04/05
50
© P. Reali / M. Corti
Translation Lookaside BufferAssociative Cache
(PT, VA, RA)(PT, VA, RA)(PT, VA, RA)
Virtual Address Real Address
page-no off
Virtual AddressingPage mapping
Page Table Ptr
Register
off
Virtual Address Real Addressframe
Real Memory
Page Frameframe
Page Table
page-no
off
frame
MMU
TLB
System-Software WS 04/05
51
© P. Reali / M. Corti
Virtual AddressingDefinitions
page smallest unit in the virtual address space
page frame unit in the physical memory
page table table mapping pages into page frames
page fault access to a non-mapped page
working set pages a process is currently using
System-Software WS 04/05
52
© P. Reali / M. Corti
pr
Virtual AddressingAlternate Page Mapping
Multilevel page tables Multipart Virtual Address Page table as (B*-)Tree
Inverted Page-Table
pno1 pno2 off
0
1
N
vp
vp
vp
pr
pr
pr
Hash
pr, vp pf
vp
vp
Next probe
pr
Processpf
pf
pf
pf
pfHashtable
64 bit Address Space
1. Level Table2. Level Table
unassigned
unassigned
System-Software WS 04/05
53
© P. Reali / M. Corti
Virtual AddressingWhat for?
Decoupling from real memory– virtual memory (cheat: use more virtual memory than the
available real memory)– dynamically allocated contiguous memory blocks (for
multiple stacks in multitasking systems)– some optimizations
null reference checks garbage collection (using dirty flag)
Virtual Addressing is not for free!– address mapping may require additional memory accesses– page table takes space
System-Software WS 04/05
54
© P. Reali / M. Corti
Virtual AddressingVirtual Memory
Use secondary storage (disc) to keep currently unused pages (swapping)
Page table usually keeps some per-page flag invalid page not mapped referenced page has been referenced dirty page has been modified
Accessing an invalid page causes a page-fault interrupt select page frame to be swapped out (victim or candidate) swap-in requested page frame
System-Software WS 04/05
55
© P. Reali / M. Corti
Virtual AddressingVirtual Memory / Demand Paging
“Page-out”
“Page-in”Real Memory
Disc
Page Table
victimset to invalid
requestedpage
System-Software WS 04/05
56
© P. Reali / M. Corti
Virtual AddressingDemand Paging Sequence
ELSE Access Page Table; IF Page invalid THEN Page-Fault ELSE RETURN RA ENDEND
IF Free Page Frame exists THEN Assign frame to VAELSE Search victim page; IF victim page modified THEN page-out to secondary storage END; Invalidate victim page; Assign frame to VAEND;Page-in from secondary storage;Reset invalid flag
MMU
OSPage-Fault
Handler
IF VA IN TLB THEN RETURN RA
TLB
E[t] = PTLB * tTLB +PPT * tPT +Pdisc * tdisc
Expected time to translateVA into RA
System-Software WS 04/05
57
© P. Reali / M. Corti
Virtual Addressing Example
Page size 4 KBAddress size 32 Bits
addressable memory: 232 = 4GB
page offset: 12 Bits (4KB = 212)page number: 20 Bits (32 - 12)
page table size: 220 * 32 Bits = 4 MB
Real Memory 128 MB page table overhead: ca. 3%
System-Software WS 04/05
58
© P. Reali / M. Corti
Virtual Addressing Example
mov EAX, @Addr
1-PTLB
PageTable
TLBPTLB
DiscPpage
fault
Memory1-Ppage
fault 1 disc read1 disc write
1 memory read
E[t] = PTLB tTLB + (1- PTLB)(tPT + PPF tdisc + (1-PPF)tmem)
System-Software WS 04/05
59
© P. Reali / M. Corti
Virtual AddressingDemand Paging: Page Replacement
Optimal Strategy (Longest Unused) Take the page, that will remain unused for the
longest time Requires oracle
Pref ref mod
3 0 0
2 0 1
1 1 0
0 1 1
NRU: ”Not Recently Used” Reset the referenced flag at each
tick Create page categories (good
candidate to bad candidate) choose best candidate
System-Software WS 04/05
60
© P. Reali / M. Corti
Virtual AddressingDemand Paging: Page Replacement (2)
LRU: “Least Recently Used” Assumption:
not used in past ==> not used in the future Hardware implementation
64-Bit time-stamp for each page Software implementation
“Aging”-Algorithm Choose page with lowest value
t
0 0 0 01 111 1 1
0 1 11
01 11
Reference Flag
t(i)
t(i+1)
set if page accessed
System-Software WS 04/05
61
© P. Reali / M. Corti
Virtual AddressingDemand Paging: Page Replacement (3)
“Least Recently Created” LRC (FIFO) Page Lifespan as metric (old are swapped out) Chain sorted by creation time Bad handling for often-used pages
Fix: “second chance” when accessed (ref flag set) during the last tick
earliest
Ref-Flag
cur := earliest;WHILE cur.ref DO cur.ref := FALSE; cur := cur.nextEND
next
System-Software WS 04/05
62
© P. Reali / M. Corti
Virtual AddressingDemand Paging: Page Replacement (4)
Strategies:– optimal– LRU / NRU / LRC
Exceptions:– “page pinning”: page cannot be swapped out
kernel code
System-Software WS 04/05
63
© P. Reali / M. Corti
Virtual AddressingExample
Accessed Pages: 1, 2, 1, 3, 4, 1, 2, 3, 4Available Page Frames: 3
working set{1,2,3,4}
Page Access
1 2 1 3 4 1 2 3 4
Ideal 1 1, 2 1, 2 1, 2, 3 1, 2, 4 1, 2, 4 1, 2, 4 2, 3, 4 2, 3 ,4
FIFO 1 1, 2 1, 2 1, 2, 3 2, 3, 4 3, 4, 1 4, 1, 2 1, 2, 3 2, 3, 4
LRU 1 1, 2 1, 2 1, 2, 3 1, 3, 4 1, 3, 4 1, 4, 2 4, 2, 3 4, 2, 3
PF!
PF!
PF! PF! PF! PF! PF! PF! PF!
PF! PF! PF! PF!
PF! PF! PF! PF! PF!
PF!
System-Software WS 04/05
64
© P. Reali / M. Corti
Demand PagingBelady’s Anomaly
LRC Strategie• 3 Page Frames
9 Page Faults
• 4 Page Frames10 Page Faults
0 1 2 3 0 1 4 0 1 2 3 4
0 1 2 3 0 1 4 4 4 2 3 3 0 1 2 3 0 1 1 1 4 2 2 0 1 2 3 0 0 0 1 4 4
0 1 2 3 0 1x x x x x x x x x
0 1 2 3 3 3 4 0 1 2 3 4 0 1 2 2 2 3 4 0 1 2 3 0 1 1 1 2 3 4 0 1 2 0 0 0 1 2 3 4 0 1
0 1 2 3 4 0x x x x x x x x x x
Victim
Victim
Page access sequence
Belady’s Anomaly:More page frames cause more page faults
System-Software WS 04/05
65
© P. Reali / M. Corti
Demand PagingHow many page frames per process?
Even Distribution Every process has the same amount of memory Thrashing
every memory access causes a page-fault not enough page-frames for the current working-set
Process Count
CPU-Load
100 %
1 2 n n+1
System is swapping instead
of running
System-Software WS 04/05
66
© P. Reali / M. Corti
Demand PagingHow many page frames per process? (2)
Depending on the process needs (1) use Working-Set
Page Frames assigned according to the process’ working-set size. Swap-out a process when not enough memory available.
1 3 2 2 3 3 1 2 2 3 3 3 4 2 2 1 1 1 2 1 3 3 3 1 3 1 2 3 4 1
{ 1, 2, 3, 4 }Sliding
Window
Page Access
{ 2, 3, 4 }Working
Set
System-Software WS 04/05
67
© P. Reali / M. Corti
Demand PagingHow many page frames per process? (3)
Depending on the process needs (2) use Page-Fault Rate
Time
HIGH
LOW
Page-Fault Rate
Swap out one process Swap in
System-Software WS 04/05
68
© P. Reali / M. Corti
Virtual AddressingAos/Bluebottle, Memory Layout Example
Stacks
4 GB
2 GB
Heap
Kernel
PROCEDURE PageFault;BEGIN
IF adr > 2GB THENadd page to stack
ELSEException(NilTrap)
ENDEND PageFault;
• 128 KB per stack• max. 32768 active objects• first stack page allocated on process creation
System-Software WS 04/05
69
© P. Reali / M. Corti
Virtual AddressingExample: UNIX, Fork
code
text
data
a UNIX Program consists of.....
Process B
Fork()read-only
read-only
read-only
read-only
Process A
read-only
read-only
Page Table
data’read-write“copy on write”
System-Software WS 04/05
70
© P. Reali / M. Corti
Virtual AddressingOS Control
Oberon– no virtual memory
Windows– Virtual Memory configuration– Task Manager
Linux– Swap partition / Swap files– ps / top
System-Software WS 04/05
71
© P. Reali / M. Corti
Virtual AddressingSegmentation
e.g. Intel x86 Problem
– 640KB Max Memory– 16bit addresses (i.e. 64KB)
Solution– work in a segment– code / data segments– check segment boundaries
Addrreal = Segbase+Offset
real memory
datasegment
codesegment
segment limit
segment base
System-Software WS 04/05
72
© P. Reali / M. Corti
Virtual AddressingSummary
virtual addresses, addressreal = f(addressvirtual)
Decoupling from real memory– virtual memory– demand paging– separate address spaces
Keywords– page– page frame– page table– page fault– page flags
dirty, used, unmapped
– page replacement strategy LRC, LRU, ideal, ...
– swapping– thrashing, belady’s anomaly
System-Software WS 04/05
73
© P. Reali / M. Corti
Memory ManagementOverview
Abstractions for applications
– heap– memory blocks
( << memory pages)
Operations:– Allocate– Deallocate
Topics:– memory organization– free lists– allocation strategies– deallocation explicit– garbage collection
type-aware conservative copying / moving incremental generational
System-Software WS 04/05
74
© P. Reali / M. Corti
Memory ManagementObjects on the heap
Object Instances: a, b, c, d, … Sequence:
NEW(a)NEW(b)NEW(c)DISPOSE(b)NEW(d)NEW(e)
a
b
c
dynamicallocation
explicitdisposal
„Heap“
e
a
c
d
!
e
Case 1
e
e
Case 2
not enough space
System-Software WS 04/05
75
© P. Reali / M. Corti
Memory ManagementProblem overview
Problems Heap size limitation ( e, case 1) External Fragmentation ( e, case 2) Dangling Pointers (a points to b)
Solutions System-managed list of free blocks
(„free list“) Vector of blocks with fixed size
(Bitmap, with 0=free, 1=used) Automated detection and reclamation of unused blocks
(„garbage collection“)
System-Software WS 04/05
76
© P. Reali / M. Corti
Memory ManagementTheory: 50% rule
Assumption: stable state M free blocks, N block allocated 50%-Rule: M = 1/2 N
A B B B BC C
N = A + B + CM = 1/2 (2A + B + e) e = 0,1, or 2
block disposal: ΔM = (C - A) / Nblock allocation:(splitting likelihood)
ΔM = 1 - p
B
(C - A) / N = 1 - pC - A - N + pN = 0
2M = 2A + B + e2M = 2A + N - A - C + e2M = N + A - C + e
2M +e = pN
System-Software WS 04/05
77
© P. Reali / M. Corti
Memory ManagementTheory: Memory Fragmentation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
{ 50%-Rule }(b/2)*F = H - b*B, /2*b*B = H - b*BH/(b*B) = 1 + /2, = 2/ - 2
Criticalpoint
System-Software WS 04/05
78
© P. Reali / M. Corti
Memory ManagementFree-list management with a Bitmap
Idea– partition heap in blocks of size s– use bitmap to track allocated blocks
bitmap[i] = true blocki allocated Problems
– internal fragmentationround up block size to next multiple of s
– map sizesize is (heap_size / s) bits
loss due to internal
fragmentation
System-Software WS 04/05
79
© P. Reali / M. Corti
Memory ManagementFree-list management with a list
List organization– sorted / non-sorted
merging of empty blocks is simpler with sorted list– one list / many lists (per size)
search is simpler, merging is more difficult– management data stored in the free block
size, next pointer
Operations– Allocation– Disposal with merge
find free blocks next to current block, merge into bigger free block
System-Software WS 04/05
80
© P. Reali / M. Corti
Memory ManagementMemory allocation strategies
block splitting:– if a free-block is bigger than the requested block, then it is split
first-fit– use first free block which is big enough
best-fit– take smallest fitting block causes a lot of fragmentation
worst-fit– take biggest available block
quick-fit– best-fit but multiple free-lists (one per block size) fast allocation!
freeused used freeused usedused
usedused used
internal fragmentation
System-Software WS 04/05
81
© P. Reali / M. Corti
Memory ManagementBuddy System (for fast block merging)
Blocks have size 2k
Block with size 2i has address j*2i (last i bits are 0)
Blocks with address x=j*2i and (j XOR 1)*2i are buddies (can be merged into a block of size 2i+1)
buddy = x XOR 2i
32
64
32
321616
32816 8b1 xxxx 0 0000b2 xxxx 1 00002k+1
2k
2k-1Merge
Split
System-Software WS 04/05
82
© P. Reali / M. Corti
Memory ManagementBuddy System (for fast block merging)
Problem: only buddies can be merged
Cascading merge
321616
32816 8
321616
32816 8
no buddiesbuddies
32816 8
321616
3232
System-Software WS 04/05
83
© P. Reali / M. Corti
Memory ManagementBuddy System (for fast block merging)
Allocation– allocate(8)
328 168
321616
3232split
split
quickfit
328 168
System-Software WS 04/05
84
© P. Reali / M. Corti
Block size = k*32free-lists for k = 1..9, one list for blocks > 9*32
Allocate quick-fit, splitting may be required Free-list management and block-merging done
by the Garbage Collector
Memory ManagementExample: Oberon / Aos
k * 32966432
ALLOCATE(50)
initialstate
k * 32966432
Allocated Block
System-Software WS 04/05
85
© P. Reali / M. Corti
Memory ManagementGarbage Collection
Two steps:
1. Free block detection– type-aware
collector is aware of the types traversed, i.e. know which values are pointers
– conservative collector doesn’t know
which values are pointers
2. Block Disposal return unused blocks to the
free-lists
GC Characteristics– incremental
gc is performed in small steps to minimize program interruption
– moving / copying / compactingblocks are moved around
– generationalblocks are grouped in generations; different treatment or collection priority
Barriers– read
intercept and check every pointer read operation
– writeintercept and check every pointer write operation
System-Software WS 04/05
86
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Reference Counting
Every object has a Reference counter rc rc = 0 Object is „Garbage“ Problems
Overhead
no support for circular structures
Useful for... Module hierarchies DAG-Structures (z. B. LISP)
p, q Pointers to Objectq := p
rc
p
rc
q
write barrier
INC p.rcDEC q.rcIF q.rc = 0 THEN Collect q^END;q := p
M
A B
C D
rc >= 1
rc >= 1
System-Software WS 04/05
87
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Mark & Sweep
Mark-Phase (Garbage Detection) Compute the Root-set consisting of
global pointers (statics) in each module local pointers on the stack in each PAF temporary pointers in the CPU’s registers
Traverse the graph of the live objects starting from the root-set with depth-first strategy; mark all reached objects.
Sweep-Phase (Garbage Collection) Linear heap traversal. Non-marked blocks are inserted into
free-lists. Optimization: lazy sweeping (sweep during allocation,
allocation gets slower)
System-Software WS 04/05
88
© P. Reali / M. Corti
Run-time support from object-system. Hidden data structures with (compiler generated) information about pointers (metadata).
Conservative approach. Guess which values could be pointers and threat them as such
Memory ManagementGarbage Collection: root-set
off
off1
off2
off2off1
off
ModuleDescriptor
ModuleData
ObjectInstance
TypDescriptor
Type Tag
globalpointer
instance
pointer
System-Software WS 04/05
89
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Mark with Pointer Rotation/1
Problem:Garbage collection called when free memory is low, but mark may require a lot of memory
Solution:Pointer rotation algorithm (Deutsch, Schorre , Waite)
+ Memory efficient+ iterative
– structures are temporarily inconsistent– non-concurrent– non-incremental
System-Software WS 04/05
90
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Mark with Pointer Rotation/2
q pq p
p.link
Simple case: list traversal
System-Software WS 04/05
91
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Mark with Pointer Rotation/3
q
p q
p
Generic case: structure traversal
System-Software WS 04/05
92
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Memory Compaction
nextavail Pointer: partition heap between allocated and free space
Allocate: increment nextavail Garbace Collector performs memory compaction
nextavail
ALLOC
GC
MS .NET
System-Software WS 04/05
93
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Stop & Copy
Partition heap in from and to regions Collection:
– traverse objects in from, copy to to– leave forwarding pointer behind– requires read barrier– swap from and to
Characteristics– copying– incremental– (generational)
IF p is moved THENreplace p with forwarding pointer
END;access p
access p
instrument code with read barrier
System-Software WS 04/05
94
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Stop & Copy
from to
1
from to
2
from to
3
to from
4
System-Software WS 04/05
95
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Concurrent GC
„Stop-and-Go“ Approach
„Incremental“ Approach
Mutator Mutator MutatorGC GC
Mutator
GC
Mutator Mutator Mutator
User Process
Real-TimeConstraint
System-Software WS 04/05
96
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Tricolor marking
„Wave-front“ Model
State Color
already traversed,behind wave
black
being traversed,on the wave
grey
not reached yet,
in front of the wave
white
System-Software WS 04/05
97
© P. Reali / M. Corti
Mutator can change pointers at any time Critical case: black white
Remedy Write-Barrier
color B gray color W gray
Memory ManagementGarbage Collection: Tricolor marking / Isolation
W
unreachable
B
WriteBarrier
System-Software WS 04/05
98
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Backer‘s Treadmill
To-SpaceFrom-Space
Free-Space
Heap: double-linked chain of
objects
curscan
System-Software WS 04/05
99
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Backer‘s Treadmill
To-Space From-Space
Free-Space
curscan
conservativeallocation
progressiveallocation
System-Software WS 04/05
100
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Backer‘s Treadmill
collect
To-SpaceFrom-Space
Free-Space
curscan
reference
curscancurscan
System-Software WS 04/05
101
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Backer‘s Treadmill
State transitions after GC is complete From-Space + Free-Space Free-Space ToSpace FromSpace
Fragmentation External: not removed Internal: depends on
supported block sizes Allocation
conservative: black progressive: white
Root Set
x
y
NEW(y)
NEW(x)
curscan
System-Software WS 04/05
102
© P. Reali / M. Corti
Memory ManagementGenerational Garbage Collection
Generations Expected object life
young short life (temp data)old long life
Generations G0, G1, G2
A
B
C
D
E
A
D
F
G
A
G
H
I
J
G2
G1
G0 special handling for pointers
across different generations
required
GenGC
frequency
G0 high
G1 medium
G2 low
collect where it is garbage is most
likely to be found
System-Software WS 04/05
103
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Finalization
Finalization (after-use cleanup) User-defined routine when object is collected Establish Consistency
save buffers flush caches
Release Resources close connections release file descriptors
Dangers: Resurrection of objects: objects added to live structures Finalization sequence is undefined
System-Software WS 04/05
104
© P. Reali / M. Corti
Memory ManagementGarbage Collection: .NET Finalization Example
Rules:objects with finalizer belong to
older generation finalizer only called once
(ReRegisterForFinalize)FinalizationQueue: live object
with finalizerFreachableQueue: collected
objects to be finalizedFinalization executed by
different process for security reasons
ABCDE E
BA
garbageFinalizationQueue
ABCDE
EB
A FinalizationQueue
FreachableQueue
GC
thread
System-Software WS 04/05
105
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Weak Pointers
„Weak“ Pointers Objects referenced only
through a weak pointer can be collected by the GC in case of need
Used for Caches and Buffers
Implementation1. Weak Pointers are not
registered to the GC
2. Use a weak reference table (indirect access)
garbagegarbage in use
weak pointer
weak reference
weak reference table
System-Software WS 04/05
106
© P. Reali / M. Corti
Memory ManagementGarbage Collection: Weak Pointers Example
Oberon: internal file list– system must keep track of open files to avoid buffer
duplication– file descriptor must be collected once user has no
more reference to it– use weak pointer in the system (otherwise would
keep file alive!)
System-Software WS 04/05
107
© P. Reali / M. Corti
Memory ManagementObject Pools
Application keeps a pool of preallocated object instances; handles allocation and disposal Simulation discrete events Buffers in a file system Provide dynamic allocation in real-time systemPROCEDURE NewT (VAR p: ObjectT);BEGIN IF freeT = NIL THEN NEW(p) ELSE p := freeT; freeT := freeT.next ENDEND NewT;
PROCEDURE DisposeT (p: ObjectT);BEGIN p.next := freeT; freeT := pEND DisposeT;
System-Software WS 04/05
108
© P. Reali / M. Corti
Garbage Collection, Recap
GC kinds: compacting copying incremental generationalHelpers: write barrier read barrier forwarding pointer pointer rotation
Algorithms: Ref-Count Mark & Sweep Stop & Copy Mark & Copy (.NET) Baker’s Threadmill
– Dijkstra / Lamport– Steele
System-Software WS 04/05
109
© P. Reali / M. Corti
Distributed Object SystemsOverview
Goals– object-based approach– hide communication
details
Advantages– more space– more CPU– redundancy– locality
Problems Coherency
– ensure that same object definition is used
Interoperability– serialization– type consistency– type mapping
Object life-time– distributed garbage collection
System-Software WS 04/05
110
© P. Reali / M. Corti
Distributed Object SystemsArchitecture
Proxy Stub Impl.
NamingService
IDL
ObjectBroker
Client Server
ObjectBroker
CallContex
tMessage
IDL-Compiler IDL-Compiler
Impl.Skeleton
Application
System-Software WS 04/05
111
© P. Reali / M. Corti
Remote Procedure InvocationOverview
Problem– send structured
information from A to B– A and B may have
different memory layouts– “endianness”
– How is 0x1234 (2 bytes) representend in memory?
12 340 1
1234
Big-Endian: MSB before LSB• IBM, Motorola, Sparc
Little-Endian: LSB before MSB•VAX, Intel
network byte-ordering
little end first
System-Software WS 04/05
112
© P. Reali / M. Corti
Definitions
Serialization– conversion of an object‘s instance into a byte stream
Deserialization– conversion of a stream of bytes into an object‘s instance
Marshaling– gathering and conversion (may require serialization) to an
appropriate format of all relevant data, e.g in a remote method call; includes details like name representation.
System-Software WS 04/05
113
© P. Reali / M. Corti
Remote Procedure InvocationProtocol Overview
Protocols– RPC + XDR (Sun)
RFC 1014, June 1987 RFC 1057, June 1988
– IIOP / CORBA (OMG) V2.0, February 1997 V3.0, August 2002
– SOAP / XML (W3C) V1.1, May 2000
– ...
XDR Type System– [unsigned] Integer (32-bit)– [unsigned] Hyper-Integer (64-bit)– Enumeration (unsigned int)– Boolean (Enum)– Float / Double (IEEE 32/64-bit)– Opaque– String– Array (fix + variable size)– Structure– Union– Void
big-endian representation
System-Software WS 04/05
114
© P. Reali / M. Corti
Remote Procedure InvocationRPC Protocol
Remote Procedure Call Marshalling of procedure
parameters
Message Format Authentication Naming
Client
PROCEDURE P(a, b, c)• pack parameters• send message to
server• await response• unpack response
Server
Server• unpack parameters• find procedure• invoke• pack response• send response
P(a, b, c)
System-Software WS 04/05
115
© P. Reali / M. Corti
Distributed Object SystemsDetails
References vs. Values– client receives reference to
remote object– data values are copied to
client for efficiency reasons– decide whether an object is
sent as reference or a value serializable (Java, .NET),
valuetype (CORBA) MarshalByRefObject (.NET),
java/RMI/Remote (Java), default (CORBA)
object creation– server creates objects– client creates objects– server can return references
object instances– one object for all requests– one object for each requests– one object per proxy
conversation state– stateless– stateful
System-Software WS 04/05
116
© P. Reali / M. Corti
Distributed Object SystemsDistr. Object Systems vs. Service Architecture
Dist. Object System– object oriented model– object references– stateful / stateless– tight coupling
Service Architecture– OO-model / RPC– service references– stateless– loose coupling
internal communication between application’s
tiers
external communication
between applications
System-Software WS 04/05
117
© P. Reali / M. Corti
Distributed Object SystemsDistr. Object Systems vs. Service Architecture
heterogeneoushomogeneous
tightloose
CORBAR
emoting
RM
I
Web Services
• components / objects(distributed object system)
• stateful and statelessconversation
• transactions
• servicesremote procedure calls
• stateless conversation(session?)
• messageenvironment
coupling
System-Software WS 04/05
118
© P. Reali / M. Corti
Distributed Object SystemsType Mapping
Type System 1InteroperabilityType System Type System 2
Possible Types Possible Types Possible Types
MappableTypes
MappableTypes
InteropSubset
System-Software WS 04/05
119
© P. Reali / M. Corti
Distributed Object SystemsType Mapping, Example
JavaType System
CORBAType System
CLSType System
wchar
doubledouble double
char
char
char
enumenum
union union union
custom implementation custom implementation
System-Software WS 04/05
120
© P. Reali / M. Corti
Distributed Object SystemsExamples
Standards– OMG CORBA
IIOP
– Web Services SOAP
Frameworks– Java RMI (Sun)– DCOM (Microsoft)– .NET Remoting (Microsoft)
IIOP.NET
System-Software WS 04/05
121
© P. Reali / M. Corti
Distributed Object SystemsCORBA
Common Object Request Broker Architecture
TCP/IP Socket
ORBORB
InterfaceRepository
ImplementationRepositoryCORBA
Runtime
Object AdaptorCORBARuntime
Client StubObject Skeleton
ObjectClient Application
Remote Architecture
Client Server
GIOP/IIOP
„Object-Bus“
System-Software WS 04/05
122
© P. Reali / M. Corti
Distributed Object SystemsCORBA
– CORBA is a standard from OMG
Object Management Group Common Object Request
Broker Architecture
– CORBA is useful for... building distributed object
systems heterogeneous
environments tight integration
– CORBA defines... an object-oriented type system an interface definition language
(IDL) an object request broker (ORB) an inter-orb protocol (IIOP) to
serialize data and marshall method invocations
language mappings from Java, C++, Ada, COBOL, Smalltalk, Lisp, Phyton
... and many additional standards and interfaces for distributed security, transactions, ...
System-Software WS 04/05
123
© P. Reali / M. Corti
Distributed Object SystemsCORBA
Basic Types– integers
16-, 32-, 64bit integers (signed and unsigned)
– IEEE floating point 32-, 64-bit and extended-
precision numbers– fixed point– char, string
8bit and wide– boolean– opaque (8bit), any– enumerations
Compound Types– struct– union– sequence (variable-length array)– array (fixed-length)– interface
concrete (pass-by-reference) abstract (pure definition)
– value type pass-by-value abstract (no state)
Operations in / out / inout parameters raises
Attributes
System-Software WS 04/05
124
© P. Reali / M. Corti
Distributed Object SystemsCORBA / General Inter-ORB Protocol (GIOP)
CDR (Common Data Representation)
– Variable byte ordering– Aligned primitive types– All CORBA Types supported
IIOP (Internet IOP)– GIOP over TCP/IP– Defines Interoperable Object
Reference (IOR) host post key
Message Format– Defined in IDL– Messages
Request, Reply CancelRequest, CancelReply LocateRequest, LocateReply CloseConnection MessageError Fragment
– Byte ordering flag– Connection Management
request multiplexing asymmetrical / bidirectional
connections
System-Software WS 04/05
125
© P. Reali / M. Corti
Distributed Object SystemsCORBA / GIOP Message in IDL
module GIOP {struct Version {
octet major;octet minor;
}enum MsgType_1_0 {
Request, Reply, CancelRequest,CancelReply, LocateRequest,LocateReply, CloseConnection, Error
}
struct MessageHeader {
char Magic[4];
Version GIOP_Version;
boolean byte_order;
octet message_size;
unsigned long message_type;
}
} // module end GIOP
System-Software WS 04/05
126
© P. Reali / M. Corti
Distributed Object SystemsCORBA Services
CORBA Services– System-level services defined
in IDL– Provide functionality required by
most applications Naming Service
– Allows local or remote objects to be located by name
– Given a name, returns an object reference
– Hierarchical directory-like naming tree
– Allows getting initial reference of object
Event Service– Allows objects to
dynamically register interest in an event
– Object will be notified when event occurs
– Push and pull models ... and more
– Trader, LifeCycle, Persistence, Transaction, Security
System-Software WS 04/05
127
© P. Reali / M. Corti
Web Services
Distributed Object SystemsWebServices
Service-oriented architecture Rely on existing protocols
– SOAP messaging protocol
– WSDL service description protocol
– UDDI service location protocol
SOAP
HTTP
TCP/IP
System-Software WS 04/05
128
© P. Reali / M. Corti
Distributed Object SystemsSOAP
Simple Object Access Protocol communication protocol XML-based describes object values XML Schemas as interface
description language– basic types
string, boolean, decimal, float, double, duration, datetime, time, date, hexBinary, base64Binary, URI, Qname, NOTATION
– structured types list, union
SOAP Message– SOAP Envelope– SOAP Header– SOAP Body
Method Call– packed as structure– messages are self-
contained– no external object
references
System-Software WS 04/05
129
© P. Reali / M. Corti
Distributed Object SystemsSOAP Message
SOAP Message– SOAP Envelope
SOAP Header SOAP Body
Example
float Multiply(float a, float b);
System-Software WS 04/05
130
© P. Reali / M. Corti
Distributed Object SystemsSOAP Example (Request)
POST /quickstart/aspplus/samples/services/MathService/CS/MathService.asmx HTTP/1.1
Host: samples.gotdotnet.com Content-Type: text/xml; charset=utf-8 Content-Length: length SOAPAction: "http://tempuri.org/Multiply" <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body>
<Multiply xmlns="http://tempuri.org/"> <a>float</a> <b>float</b> </Multiply></soap:Body>
</soap:Envelope>
System-Software WS 04/05
131
© P. Reali / M. Corti
Distributed Object SystemsSOAP Example (Answer)
HTTP/1.1 200 OK Content-Type: text/xml; charset=utf-8 Content-Length: length <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body>
<MultiplyResponse xmlns="http://tempuri.org/"> <MultiplyResult>float</MultiplyResult>
</MultiplyResponse> </soap:Body>
</soap:Envelope>
System-Software WS 04/05
132
© P. Reali / M. Corti
Distributed Object SystemsSOAP Example (Service Description-1)
<?xml version="1.0" encoding="utf-8"?><definitions ....> <types> <s:schema elementFormDefault="qualified"
targetNamespace="http://tempuri.org/"> <s:element name="Multiply"> <s:complexType><s:sequence> <s:element minOccurs="1" maxOccurs="1" name="a" type="s:float" /> <s:element minOccurs="1" maxOccurs="1" name="b" type="s:float" /> </s:sequence></s:complexType> </s:element> </s:schema> </types> <message name="MultiplySoapIn"> <part name="parameters" element="s0:Multiply" /> </message>
System-Software WS 04/05
133
© P. Reali / M. Corti
Distributed Object SystemsSOAP Example (Service Description-2)
<binding name="MathServiceSoap" type="s0:MathServiceSoap"> <soap:binding transport="http://schemas.xmlsoap.org/soap/http"
style="document" /> <operation name="Multiply"> <soap:operation soapAction="http://tempuri.org/Multiply" style="document" /> <input><soap:body use="literal" /></input> <output><soap:body use="literal" /></output> </operation> </binding> <service name="MathService"> <port name="MathServiceSoap" binding="s0:MathServiceSoap"> <soap:address
location="http://samples.gotdotnet.com/quickstart/aspplus/samples/services/MathService/CS/MathService.asmx" />
</port> </service></definitions>
System-Software WS 04/05
134
© P. Reali / M. Corti
Distributed Object SystemsWebServices
Comments– XML (easily readable)– system independent– standard– stateless (encouraged design
pattern)
– bloated– big messages (but easily
compressed)– requires expensive parsing
Constraints– Services
no object references server-activated servant
– Goes over HTTP requires web server
System-Software WS 04/05
135
© P. Reali / M. Corti
Distributed Object SystemsWebService Future
Use SOAP-Header to store additional information about message or context
Many standards to come...– WS-Security– WS-Policy– WS-SecurityPolicy– WS-Trust– WS-SecureConversation– WS-Addressing
System-Software WS 04/05
136
© P. Reali / M. Corti
Distributed Object SystemsJava RMI
Java Remote Method Invocation
TCP/IP Socket
TransportLayer
RemoteReferences
Object Stub
ObjectClient Application
Network
Remote Architecture
Client Server
LookupRegister
TransportLayer
RemoteReferences
Object Stub
LookupRegister
System-Software WS 04/05
137
© P. Reali / M. Corti
Distributed Object SystemsJava RMI Details
Framework– supports various implementations
e.g. RMI/IIOP– mapping limited to the Java type system, workarounds
needed
– uses reflection to inspect objects
System-Software WS 04/05
138
© P. Reali / M. Corti
Distributed Object-SystemsLow-Level Details: Java RMI/IIOP
Common Type-System– restricted CORBA
Marshalling– name mapping– remote objects
only references
Interface Description Language (IDL)
– java to IDL mapping
Message representation Underlying protocol
– IIOP (CORBA)
System-Software WS 04/05
139
© P. Reali / M. Corti
Distributed Object SystemsMicrosoft DCOM
Distributed Common Object Model
RPC Channel
SCMSCM
SCMs and RegistrationCOMRuntime
COMRuntime
Object Proxy Object Stub
ObjectClient Application
Network
Remote Architecture
Client ServerRegistry Registry
OXID Resolver
Ping Server
System-Software WS 04/05
140
© P. Reali / M. Corti
Distributed Object SystemsMicrosoft .NET Remoting
InstanceInstance
Network
ChannelChannelChannelChannel
TransparentProxy
TransparentProxy
ObjRefObjRef
ClientClient
new Instace()or
Activator.GetObject(...) Application D
omain B
oundaryIChannelInfo ChannelInfo;IEnvoyInfo EnvoyInfo;IRemotingTypeInfo TypeInfo;string URI;
System-Software WS 04/05
141
© P. Reali / M. Corti
channel
channel
Distributed Object SystemsMicrosoft .NET Remoting
ClientClient InstanceInstance
Instance s = new Instance();s.DoSomething();
Network
ProxyProxy DispatcherDispatcher
FormatterFormatter FormatterFormatter serialize object
TransportSink
TransportSink
TransportSink
TransportSink
handle communication
Stream Chan.Sink(s)
Stream Chan.Sink(s)
Stream Chan.Sink(s)
Stream Chan.Sink(s)
custom operations
MessageChan.Sink(s)
MessageChan.Sink(s)
MessageChan.Sink(s)
MessageChan.Sink(s)
custom operations
System-Software WS 04/05
142
© P. Reali / M. Corti
Distributed Object SystemsMicrosoft .NET Remoting
Activation client
– one instance per activation server / Singleton
– one instance of object server / SingleCall
– one instance per call
Leases (Object Lifetimes)– renew lease on call– set maximal object lifetime
Serialization– SOAP
Warning: non-standard types, only for .NET use
– binary– user defined
Transport– TCP– HTTP– user defined
System-Software WS 04/05
143
© P. Reali / M. Corti
AppDomain 2AppDomain 1
Distributed Object SystemsMicrosoft .NET Remoting (Object Marshalling)
MarshalByRefObjects remoted by reference client receives an ObjRef
object, which is a“pointer“ to the original object
[Serializable] all fields of instance are
cloned to the client [NonSerialized] fields are
ignored ISerializable
object has method to define own serialization
Obj Proxy
AppDomain 2AppDomain 1
Obj Obj‘
SerializedObjRef
Serializedfld1... fldn
System-Software WS 04/05
144
© P. Reali / M. Corti
Distributed Object SystemsMicrosoft .NET Remoting, Activation
Server-Side Activation (Well-Known Objects)
– Singleton Objects only one instance is allocated to process all
requests– SingleCall Objects
one instance per call is allocated
Client-Side Activation– Client Activated Objects
the client allocates and controls the object on the server
“stateless”
“stateful”
System-Software WS 04/05
145
© P. Reali / M. Corti
Distributed Object SystemsMicrosoft .NET Remoting, Limitations
–Server-Activated Objects object configuration limited to the default constructor
–Client-Activated Objects class must be instantiated, no access over interface class hierarchy limitations use Factory Pattern
– to get interface reference– to allow parametrization of the constructor
–Furthermore... interface information is lost when passing an object reference to another
machine no control over the channel
– which channel is used– which peer is allowed to connect
System-Software WS 04/05
146
© P. Reali / M. Corti
Distributed Object SystemsCase Study: IIOP.NET
Opensource project based on ETH-Diploma thesis– http://iiop-net.sourceforge.net/
IIOP.NET (marketing)– „Provide seamless interoperability between .NET and CORBA-
based peers (including J2EE)“
IIOP.NET (technical) .NET remoting channel implementing the CORBA IIOP protocol Compiler to make .NET stubs from IDL definitions IDL definition generator from .NET metadata
System-Software WS 04/05
147
© P. Reali / M. Corti
Distributed Object SystemsCase Study: IIOP.NET
IIOP rather than SOAP transparent reuse of
existing servers tight coupling object-level granularity efficiency
Runtime: standard .NET remoting channel for IIOP transport sink formatter type-mapper
Build tools IDL CLS compiler CLS IDL generator
.NETserver
.NETclient
J2EEserver
Javaclient
CORBAobjects
IIOPbinary IIOP
Java Type System IDL Type System CLS Type System
Possible Types Possible Types Possible Types
IDL MappableTypes
IDLMappableTypesInterop
Subset
System-Software WS 04/05
148
© P. Reali / M. Corti
Distributed Object SystemsCase Study: IIOP.NET, Interoperability
CommunicationProtocols
Data Model
Message Format
Contextual DataInterception Layer
Conversation
Services
Application
TCP/UDP, Byte stream, point-to-point communication
Type system, mapping and conversion issues
RPC, IIOP, HTTP, SOAP, proprietary binary format,messages, unknown data (exceptions), encryption
SessionID, TransactionID, cultureID, logical threadID …
Activation model (EJB, MBR), global naming,distributed garbage collection, conversational state,…
Distributed Transaction Coordinator, Active Directory, …
This is what we want
System-Software WS 04/05
149
© P. Reali / M. Corti
Distributed Object SystemsCase Study: IIOP.NET, Granularity
Service Component Object
Message-basedInterface,Stateless
Strongly-typedInterface,
Stateless or Stateful
ImplementationDependency,
Stateful
Object Object
Component
Object Object
Component
Object Object
Component
Service Service
System
Granularity
Coupling,Interaction
System-Software WS 04/05
150
© P. Reali / M. Corti
Distributed Object SystemsCase Study: IIOP.NET
1.01.11.2
1.3
1.4
1.5
1st A
rtic
le
2nd A
rtic
le1.6
System-Software WS 04/05
151
© P. Reali / M. Corti
Distributed Object SystemsCase Study: IIOP.NET, Performance
Test Case:–WebSphere 5.0.1 as server–Clients
IBM SOAP-RPC Web Services IBM Java RMI/IIOP IIOP.NET
–Response time receiving 100 beans from
server– WS: 4.0
seconds– IIOP.NET: 0.5
seconds when sending many more
beans, WS are then 200% slower than IIOP.NET
Source: posted on IIOP.NET forum
System-Software WS 04/05
152
© P. Reali / M. Corti
Processes and ThreadsIntroduction
CPU as resource, provide abstraction to it
Allow multiprogramming– pseudo-parallelism
(single-processors)– real parallelism
(multi-processors)Required abstractions
– multiple activities -- execution of instructions
– protection of resources– synchronization of activities
Topics– coroutines– processes – threads– scheduling
fairness starvation
– synchronization deadlocks
System-Software WS 04/05
153
© P. Reali / M. Corti
Processes and ThreadsMultithreading
Call a.run Call b.Q Call b.q Call b.q Return b.q Return b.q Call c.R Return c.R Return b.QReturn a.run
a.run
b.Q
Thread 1 b.q
b.q
Call c.run Call d.Q Call d.q Call d.q Return d.q Return d.q Call e.R Return e.R Return d.QReturn c.run
Thread 2
c.run
d.Q
e.R
Stack 1
Stack 2
12
time12
time12
time
System-Software WS 04/05
154
© P. Reali / M. Corti
Processes and ThreadsCoroutines (1)
Coroutines– each activity has its own stack, address-space is
shared– explicit context switch (stack only) under
programmer‘s control– uses Transfer call switch to another coroutine
System-Software WS 04/05
155
© P. Reali / M. Corti
Processes and ThreadsCoroutines (2)
Subroutines
Coroutinen
Call
Return
Start
Start
Transfer
Transfer
Call
Return
System-Software WS 04/05
156
© P. Reali / M. Corti
Processes and ThreadsCoroutines (3)
TYPE Coroutine = POINTER TO RECORDFP: LONGINT;stack: POINTER TO ARRAY OF SYSTEM.BYTE;
END;
VAR cur: Coroutine; (* Current Coroutine *)
PROCEDURE Transfer*(to: Coroutine);BEGIN
SYSTEM.GETREG(SYSTEM.EBP, cur.FP);cur := to;SYSTEM.PUTREG(SYSTEM.EBP, cur.FP);
END Transfer;
MOV ESP, EBPPOP EBPRET 4
PUSH EBPSUB ESP, 4
save FP
restore FP
System-Software WS 04/05
157
© P. Reali / M. Corti
Processes and ThreadsCoroutines (4)
to’SPFP
PC’FP’
locals
stackQstackP
to’
SP
FP PC’FP’
locals
stackQstackP
Qpcx
localsFP”
FP := Q.FP
to’
SP
FP PC’FP’
locals
stackQstackP
Qpcx
localsFP”
Transfer(Q)
FP
stackQstackP
Qpcx
localsFP”
SP
returnjump at PC’
System-Software WS 04/05
158
© P. Reali / M. Corti
Processes and ThreadsCoroutines (5)
Current stack: current execution state All other stacks: top PAF (proc activation frame) contains last
Transfer call Start: create stack with fake Transfer-like PAF
PROCEDURE Start(C: Coroutine; size: LONGINT);BEGIN
NEW(C.stack, size);tos := SYSTEM.ADR(C.stack[0])+LEN(C.stack);SYSTEM.PUT(tos-4, 0); (* par = null *)SYSTEM.PUT(tos-8, 0); (* PC’ = null, not allowed to return *)SYSTEM.PUT(tos-12, 0); (* FP’ *)cur.FP := tos-12;
END;
System-Software WS 04/05
159
© P. Reali / M. Corti
Processes and ThreadsProblems caused by multitasking
Concurrent access to resources
– protectionlimit access to a resource
– synchronizationsynchronize task with resource state or other task
Concurrent access to CPU– task priorities– scheduling
One problem’s solution is another problem’s cause....
– deadlocks– fairness– deadlines / periodicity
constraints
System-Software WS 04/05
160
© P. Reali / M. Corti
Processes and ThreadsProtection: Mutual Exclusion
Mutual Exclusiononly one activity is allowed to access one resource at a time disable interrupts (single CPU only, avoid switches) locks
flag: lock taken / lock free spin lock (uses busy waiting) exclusive lock read-write lock (multiple reader, one writers)
System-Software WS 04/05
161
© P. Reali / M. Corti
Processes and ThreadsProtection: Monitor
Shared resources as Monitor resources are passive objects execution of critical sections inside monitor is
mutually exclusive Global Monitor Lock Shared Monitor Lock for read-access (optional)
monitor as a special module [original version (Hoare, Brinch Hansen)]
object instance as monitor method and code block granularity Java, C#, Active Oberon, ...
Resource
task P task Q
acquire
releaseacquire
release
System-Software WS 04/05
162
© P. Reali / M. Corti
Processes and ThreadsProtection
Simplistic implementation with coroutinesNon-reentrant lock (no recursion allowed)
PROCEDURE Acquire(r: Resource);BEGIN
IF r.taken THENInsertList(r.waiting, cur);SwitchToNextRoutine()
ELSEr.taken := TRUE
ENDEND Acquire;
PROCEDURE Release(r: Resource);BEGIN
next := GetFromList(r.waiting);IF next # NIL THEN
InsertList(ready , next);Transfer(GetNextTask());
ELSEr.taken := FALSE
ENDEND Release;
one waiting queue per resource is
required
System-Software WS 04/05
163
© P. Reali / M. Corti
Processes and ThreadsProtection
Shared resource as Process synchronization during communication Communicating Sequential Processes (CSP)
C.A.R. Hoare (1978) Model of communication
„Rendez-vous“ between two processes P!x (send x to process P) Q?y (ask y from process Q)
Used in Ada, Occam
task P task Q
Q!z
P?x
task P task QQ!z
P?x
System-Software WS 04/05
164
© P. Reali / M. Corti
Processes and ThreadsProtection
Some variations on the theme....– Reentrant Locks– Readers / Writers
one writer or multiple readers allowed
– Binary Semaphores one activity can get the resource
– Generic Semaphores N activities are allowed to get the resource
System-Software WS 04/05
165
© P. Reali / M. Corti
Processes and ThreadsSynchronization
Wait on a condition / state Signals with Send/Wait Methods
Require cooperation from all processes
Example: Producer/Consumer with conditions nonempty/nonfull
Semantic of Send
Send-and-Pass vs. Send-and-Continue
Generic system-handled conditions (Active Oberon) AWAIT(x > y);
Wait on partner process CSP
System-Software WS 04/05
166
© P. Reali / M. Corti
Processes and ThreadsSynchronization: Implementation Example
Process list double-chained list of all coroutines cur points to current (running) coroutine each signal has a LIFO list
C2
C1C4
C5C3s
link
ready
Signal cur
System-Software WS 04/05
167
© P. Reali / M. Corti
Processes and ThreadsSynchronization: Implementation Example
Scheduleprev := cur;WHILE ~cur.ready & cur.next # prev DO cur := cur.nextEND;IF cur.ready THEN Transfer(cur) ELSE (*deadlock*) END
Terminatecur.next.prev := cur.prev;cur.prev.next := cur.next;Schedule
System-Software WS 04/05
168
© P. Reali / M. Corti
Processes and ThreadsSynchronization: Implementation Example
Send(s)IF s # NIL THEN (*send-and-pass*) cur := s; s.ready := TRUE; s := s.linkEND;Schedule (*to next ready from cur*)
Wait(s)cur.link := s; s := cur; cur.ready := FALSE;Schedule (*to next ready from cur*)
Init(s)s := NIL
System-Software WS 04/05
169
© P. Reali / M. Corti
Processes and ThreadsActive Oberon: Bounded Buffer
Buffer* = OBJECTVAR
data: ARRAY BufLen OF INTEGER;in, out: LONGINT;
(* Put - insert element into the buffer *)
PROCEDURE Put* (i: INTEGER);BEGIN {EXCLUSIVE}
(*AWAIT ~full *)AWAIT ((in + 1) MOD BufLen #
out);data[in] := i; in := (in + 1) MOD
BufLenEND Put;
(* Get - get element from the buffer *)PROCEDURE Get* (VAR i: INTEGER);
BEGIN {EXCLUSIVE}(*AWAIT ~empty *)AWAIT (in # out);i := data[out]; out := (out + 1) MOD
BufLenEND Get;
PROCEDURE & Init;BEGIN
in := 0; out := 0;END Init;
END Buffer;
System-Software WS 04/05
170
© P. Reali / M. Corti
Processes and ThreadsCSP: Bounded Buffer (I)
[bounded_buffer || producer || consumer]
producer ::
*[<produce item>
bounded_buffer ! item;
]
consumer ::
*[bounded_buffer ? item;
<consume item>
]
Geoff CoulsonLancaster University
System-Software WS 04/05
171
© P. Reali / M. Corti
Processes and ThreadsCSP: Bounded Buffer (II)
bounded_buffer ::
buffer: (0..9) item;in, out: integer;
in := 0; out := 0;*[
in < out+10; producer ? buffer(in mod 10)-> in := in + 1;
||out < in; consumer ! buffer(out mod 10)-> out := out + 1;
]
System-Software WS 04/05
172
© P. Reali / M. Corti
Processes and ThreadsProcess State
Process states1. Running: actually using the
CPU
2. Ready: waiting for a CPU
3. Blocked: unable to run, waiting for external event
– Process state transitions1. wait for external event
2. system scheduler
3. system scheduler
4. external event happens
Running
Blocked Ready
12
3
4
System-Software WS 04/05
173
© P. Reali / M. Corti
Processes and ThreadsProcess State (Active Oberon)
Active Oberon provides– monitor-like object
protection– conditions
Condition are checked by the system.
No explicit help or knowledge from user is required (no x.Signal)
Running
AwaitingObject
Ready
AwaitingCondition
System-Software WS 04/05174 © P. Reali / M. Corti
Activities Program (static concept) ≠ Process (dynamic) Processes, jobs, tasks, threads (differences later)
– program code– context:
program counter (PC) and registersstack pointerstate
– [new]– running– waiting– ready– [terminated]
– stack– data section (heap)
System-Software WS 04/05175 © P. Reali / M. Corti
Processes vs. Threads
Process or job (heavyweight)
– code– address space– processor state– private data
(stack+registers)
– can have multiple threads
Thread (lightweight)– shared code– shared address space– processor state– private data
(stack+registers)
CPU
Kernel
System-Software WS 04/05176 © P. Reali / M. Corti
Processes vs. Threads: Example
PROC 1
instr
instr
…
instr
PROC 2
instr
instr
…
instr
HEAP 1
STACK 1
HEAP 2
STACK 2
PROC
instr
instr
…
instr
HEAP
STACK 1 STACK 2
System-Software WS 04/05177 © P. Reali / M. Corti
Programmed events that can cause a task switch– protection (locks)
acquire release
– synchronization wait on a condition send a signal (send-and-pass)
System events that can cause a task switch– voluntary switch (“yield”, task termination)– process with higher priority becomes available– consumption of the allowed time quantum
syn
chro
nou
sa
syn
chro
nou
s
task preemption
Multitasking
System-Software WS 04/05178 © P. Reali / M. Corti
Preemption
Assign each process a time-quantum (normally in the order of tens of ms)
Asynchronous task switches can happen at any time!– task can be in the middle of a computation– save whole CPU state (registers, flags, ...)
Perform switch– on resource conflict– on synchronization request– on timer-interrupt (time-quantum is over)
System-Software WS 04/05179 © P. Reali / M. Corti
Context switch Scheduler invocation:
– preemption interrupt– cooperation explicit call
Operations:– store the process state (PC, regs, …)– choose the next process (strategy)– [accounting]– restore the state of the next process (regs, SP, PC, …)– jump to the restored PC
A context switch is usually expensive: 1–1000s depending on the system and number of processes
– hardware optimizations (e.g., multiple sets of registers – SPARC, DECSYSTEM-20)
System-Software WS 04/05180 © P. Reali / M. Corti
Scheduling algorithms
Three categories of environments: batch systems (e.g., VPP, DOS)
– usually non-preemptive (i.e., task is not stopped by scheduler, only synchronous switches)
interactive systems (UNIX, Windows, Mac OS)– cooperative or preemptive– no task allowed to have the CPU forever
real-time systems (PathWorks, RT Linux)– timing constraints (deadlines, periodicity)
System-Software WS 04/05181 © P. Reali / M. Corti
Scheduling Performance CPU utilization Throughput
– number of jobs per time unit– minimize context switch penalty
Turnaround time– = exit time - arrival time– execution, wait, I/O
Response time– = start time - request time
Waiting time (I/O, waiting, …) Fairness
System-Software WS 04/05182 © P. Reali / M. Corti
Scheduling algorithm goals
All systems– Fairness
give every task a chance
– Policy enforcement– Balance
keep all subsystems busy
Interactive systems– Response time
respond quickly
– Proportionality meet user’s expectations
Batch systems– Throughput
maximize number of jobs
– Turnaround time minimize time in system
– CPU utilization keep CPU busy
Real-time systems– Meet deadlines
avoid losing data
– Predictability avoid degradation
– Hard- vs. soft-real-time systems
System-Software WS 04/05183 © P. Reali / M. Corti
Batch Scheduling Algorithms
Choose task to run (task is usually not preempted) First Come First Serve (FCFS)
– fair, may cause long waiting times
Shortest Job First (SJF)– requires knowledge about job length
Longest Response Ratio– response ratio = (time in the system / CPU time)– depends on the waiting time
Highest Priority First– with or without preemption
Mixed– the priority is adjusted dynamically (time in queue, length, priority, …)
ETH-VPP is a batch system!
Which algorithm does it use?
System-Software WS 04/05184 © P. Reali / M. Corti
Time sharing– Each task has a predefined time quantum– Round-Robin
Schedule next task on the ready list
– Quantum choice:small: may cause frequent switchesbig: may cause slow response
– Implicit assumption: all task have same importance
P1
P4
P3
P2
next
next
Preemptive Scheduling Algorithms
System-Software WS 04/05185 © P. Reali / M. Corti
Preemptive Scheduling Algorithms
Priority scheduling– process with highest priority is scheduled first
Variants– multilevel queue scheduling
one list per priority, use round-robin on list– dynamic priorities
proportional to time in system inversely proportional to part of quantum used
– make time quantum proportional to priority
System-Software WS 04/05186 © P. Reali / M. Corti
Real-Time Scheduling Algorithms
Task needs to meet the deadline!
Task cost is known (should)
Two task kind:– aperiodic– periodic
Reservation– scheduler decides if
system has enough resources for the task
Algorithms:– Rate Monotonic
Schedulingassign static priorities (priority proportional to frequency)
– Earliest Deadline Firsttask with closest deadline is chosen
System-Software WS 04/05187 © P. Reali / M. Corti
Scheduling Algorithm Example Situation:
– Tasks P1, P2, P3, P4 Arrive at time t = 0 Priority: P1 highest, P4 lowest Time to process: 10, 2, 5, 3
System-Software WS 04/05188 © P. Reali / M. Corti
P1
P2
P3
P40 10 12 17 20
Scheduling Algorithm Example Highest Priority First
System-Software WS 04/05189 © P. Reali / M. Corti
P1
P2
P3
P40 2 20105
Scheduling Algorithm Example Shortest Job First
System-Software WS 04/05190 © P. Reali / M. Corti
P1
P2
P3
P40 2 4 6 8 10 12
1314 16 18 20
Scheduling Algorithm Example Timesharing with quantum = 2
System-Software WS 04/05191 © P. Reali / M. Corti
P1
P2
P3
P40 8 11 15 20
runningat 1/4
runningat 1/3
runningat 1/2
Scheduling Algorithm Example Timesharing with quantum 0
System-Software WS 04/05192 © P. Reali / M. Corti
Scheduling Algorithm Example: Results
Situation:– Tasks P1, P2, P3, P4
Arrive at time t = 0 Priority: P1 highest, P4 lowest Time to process: 10, 2, 5, 3
– Results turnaround response time Highest Priority First: 14.75 9.75 Shortest Job First: 9.25 4.25 Timesharing with Quantum = 2: 12.75 3.0 Timesharing with Quantum 0: 13.5 0
System-Software WS 04/05193 © P. Reali / M. Corti
Scheduling Examples UNIX
– preemption– 32 priority levels (round robin)– each second the priorities are recomputed (CPU usage,
nice level, last run)
BSD similar– every 4th tick priorities are recomputed (usage
estimation)
Windows NT– “real time” priorities: fixed, may run forever– variable: dynamic priorities, preemption– idle: last choice (swap manager)
System-Software WS 04/05194 © P. Reali / M. Corti
Scheduling Examples: Quantum & Priorities
Win2K:– quantum = 20ms (professional) 120ms (user),
configurable– depending on type (I/O bound)
BSD:– quantum = 100ms– priority = f(load,nice,timelast)
Linux:– quantum = quantum / 2 + priority– f(quantum, nice)
System-Software WS 04/05195 © P. Reali / M. Corti
Scheduling Problems Starvation
A task is never scheduled (although ready) “fairness”
DeadlockNo task is ready (nor it will ever become ready) detection+recovery or avoidance
System-Software WS 04/05196 © P. Reali / M. Corti
Coffman conditions for a deadlock (1971): Mutual exclusion Hold and wait No resource preemption Circular wait (cycle)
R1
R2
A holds R
A wants S
T1 T2
B holds S
B wants RT Thread
R Resource
Deadlock Conditions
System-Software WS 04/05197 © P. Reali / M. Corti
Deadlock Remedies
Coarser lock granularity: use a single lock for all resources (e.g., Linux 2.0-2.4 “Big
Kernel Lock”)
Locking order: resources are ordered resource locking according to the resource order (ticketing)
Two-phase-locking: try to acquire all the resources if successful, lock them; otherwise free them and try again
System-Software WS 04/05198 © P. Reali / M. Corti
Deadlock Detection, Prevention & Recovery
Deadlock detection: the system keeps a graph of locks and tries to detect cycles.– time consuming– the graph has to be kept consistent with the actual state
Deadlock prevention (avoidance): remove one of the four Coffman conditions cycles
Recovery:– kill processes and reclaim the resources– rollback: requires to save the states of the processes
regularly
System-Software WS 04/05199 © P. Reali / M. Corti
A
B
C
+S +T -S -T+T +R -T -R
+R +S -R -S
Simple Deadlock Scenario Example
– Resources R, S, T– Tasks A, B, C require { R, S }, { S, T }, { T, R } respectively
Case 1: Sequential execution, no deadlock
System-Software WS 04/05200 © P. Reali / M. Corti
A
B
C
+R+S
+T
+S+T
+R
Simple Deadlock Scenario Case 2: Interleaving, deadlock
C A
B
R
T S
System-Software WS 04/05201 © P. Reali / M. Corti
DC
A
F
B
E
G
R
S
W
T
U V
graphicalrepresentation
is this a case of deadlock?
Complex Deadlock Scenario Case with 6 resources and 7 tasks
System-Software WS 04/05202 © P. Reali / M. Corti
Locks
Blocks
Modules
Configuration
Memory
Interrupts
ThreadsTraps
TimersProcessors
Module Lock
Mod
ule
Hie
rarc
hy
Each Kernel Modulehas a lock to protectits data
When multiple locks areneeded, acquire themaccording to the module hierarchy
Deadlock Avoidance Strategy in Bluebottle
System-Software WS 04/05203 © P. Reali / M. Corti
Priority Inversion A high-priority task can be blocked by a lower
priority one. Example:
Low
running
High
ready
waiting
Medium
System-Software WS 04/05204 © P. Reali / M. Corti
Priority Inversion Big problem for RTOS Solutions
– priority inheritancelow-priority task holding resource inherits priority of high-priority task wanting the resource
– priority ceilings each resource has a priority corresponding to the highest priority
of the users +1 the priority of the resource is transferred to the locking process can be used instead of semaphores
System-Software WS 04/05205 © P. Reali / M. Corti
Example: Mars Pathfinder (1996–1998) VxWorks real-time system: preemptive, priorities Communication bus: shared resource (mutexes) Low priority task (short): meteorological data
gathering Medium priority task (long): communication High priority: bus manager
Detection: watchdog on bus activity system reset Fix: activate priority inheritance via an uploaded on-
the-fly patch (no memory protection).
System-Software WS 04/05206 © P. Reali / M. Corti
Locking on Multiprocessor Machines Real parallelism! Cannot “disable interrupts” like on single processor
machines (could stop every task, but not efficient) Software solutions
– Peterson, Dekker, ...
Hardware support– bus locking– atomic instructions
(Test And Set, Compare And Swap)
System-Software WS 04/05207 © P. Reali / M. Corti
Locking on multiprocessor machines Test And Set
TAS s:
IF s = 0 THEN
s := 1
ELSE
CC := TRUE
END
Compare and Swap (Intel)CAS R1, R2, A:
R1: expected value
R2: new value
A: address
IF R1 = M[A] THEN
M[A] := R2; CC := TRUE
ELSE
R1 := M[A]; CC := FALSE
END
These instructions are atomic even on multiprocessors!The usually do so by locking the data bus
System-Software WS 04/05208 © P. Reali / M. Corti
Counter s: available resources Binary Semaphores with TAS
Try TAS sJMP TryCS
Spinning(busy wait)
TAS sJMP QueuingCS
Blocking
Example: Semaphores on SMP
System-Software WS 04/05209 © P. Reali / M. Corti
Counter s: available resources Generic Semaphores with CAS
P(s)Enter CS
Load R1sTryP MOVE R1R2
DEC R2CAS R1, R2, sBNE TryPCMP R2, 0BN Queuing[CS]
[CS]Load R1s
TryV MOVE R1R2INC R2CAS R1, R2, sBNE TryVCMP R2, 0BNP Dequeuing
Exit CSV(s)
Example: Semaphores on SMP
P(S): { S := S - 1}IF S < 0 THEN
jump queuingEND
V(S): { S := S + 1}IF S <= 0 THEN
jump dequeuingEND
System-Software WS 04/05210 © P. Reali / M. Corti
PROCEDURE AcquireSpinTimeout(VAR locked: BOOLEAN);CODE {SYSTEM.i386}MOV EBX, locked[EBP] ; EBX := ADR(locked)MOV AL, 1 ; AL := 1CLI ; switch interrupts off before
; acquiring lock
test:XCHG [EBX], AL ; set and read the lock
; atomically. ; LOCK prefix implicit.
CMP AL, 1 ; was locked?JE test ; retry
..
END AcquireSpinTimeout;simplified
version
Spin-Locks: the Bluebottle/i386 way
System-Software WS 04/05211 © P. Reali / M. Corti
Z = OBJECTVAR myT: T; I: INTEGER;
PROCEDURE & NEW (t: T);BEGIN myT := tEND NEW;
PROCEDURE P (u: U; VAR v: V);BEGIN { EXCLUSIVE } i := 1END P;
BEGIN { ACTIVE } BEGIN { EXCLUSIVE }
AWAIT (i > 0); ENDEND Z;
Condition
State
Object Activity
Method
Initializer
Mutual Exclusion
Active Objects in Active Oberon
System-Software WS 04/05212 © P. Reali / M. Corti
Ready Queue
Running
AwaitingAssertion
AwaitingObject
Ready
NIL
Ready
CPUs
1
Lock
Queue
Wait
Queue
2
Active Oberon Runtime Structures
System-Software WS 04/05213 © P. Reali / M. Corti
Running
AwaitingAssertion
AwaitingObject
Ready
NIL
NIL
1
2 3
4 5
0
7
6
END Run next ready
Preempt Set to ready; Run next ready
6
7
1
1
NEW Create object; Create process; Set to ready
0
Active Oberon Implementation
System-Software WS 04/05214 © P. Reali / M. Corti
Enter Monitor IF monitor lock set THEN Put me in monitor obj wait list; Run next ready ELSE set monitor lock END
Exit Monitor Find first asserted x in wait list; IF x found THEN set x to ready ELSE Find first x in obj wait list; IF x found THEN set x to ready ELSE clear monitor lock END END Run next ready
4
5
1
1
2
Running
AwaitingAssertion
AwaitingObject
Ready
NIL
NIL
1
2 3
4 5
0
7
6
Active Oberon Implementation
System-Software WS 04/05215 © P. Reali / M. Corti
Running
AwaitingAssertion
AwaitingObject
Ready
NIL
NIL
1
2 3
4 5
0
7
6AWAIT Put me in monitor assn wait list; Call Exit monitor
3
Active Oberon Implementation
System-Software WS 04/05216 © P. Reali / M. Corti
p
pq < p
end of quantum
Case Study: Windows CE 3.0 Real-time constraints
– Reaction time on events– Execution time
Threads with priorities and time quanta– Priorities: 0 (high), …, 255 (low)– Time quanta in ms
Default 100 ms 0 no quantum
Single processor
System-Software WS 04/05217 © P. Reali / M. Corti
IST
ISR
EventEvent IRQ
NK.EXE
Kernel Modus
User Modus
Case Study: Windows CE 3.0 Interrupt Handling
– ISR (Interrupt Service Routine) 1st level handling Kernel mode, uses kernel stack Installed at boot-time Creates event on-demand Preempted by ISR with higher priority
– IST (Interrupt Service Thread) 2nd level handling User mode Awaits events
System-Software WS 04/05218 © P. Reali / M. Corti
[
[
][ ]
]
CS
Case Study: Windows CE 3.0 Synchronization on common resources:
– Critical sections: enter, leave operations– Semaphores and mutexes (binary semaphores)
Synchronization is performed with system/library calls (they are not part of a language).
Priority inversion avoidance– priority inheritance (thread inherits priority of task wanting
the resource)
System-Software WS 04/05219 © P. Reali / M. Corti
Case Study: Java Activities are mapped to threads (no processes) Synchronization in the language
– locks– signals
Threads provided by the library Scheduling depends on the JVM
System-Software WS 04/05220 © P. Reali / M. Corti
Case Study: Java
public class MyThread() extends Thread {
public void run() { System.out.println("Running"); }
public static void main(String [] arguments) { MyThread t = (new MyStread()).start(); }
}
System-Software WS 04/05221 © P. Reali / M. Corti
Case Study: Java
public class MyThread() implements Runnable {
public void run() { System.out.println("Running"); }
public static void main(String [] arguments) { Thread t = (new Thread(this)).start(); }
}
System-Software WS 04/05222 © P. Reali / M. Corti
Case Study: Java Protection with monitor-like objects
– with method granularitypublic synchronized void someMethod()
– with statement granularitysynchronized(anObject) { ... }
Synchronization with signals– wait() (with optional time-out)– notify() / notifyAll() (“send and continue” pattern)
System-Software WS 04/05223 © P. Reali / M. Corti
Case Study: Java
private Object o;
public synchronized consume() { while (o == null) { try { wait(); } catch (InterruptedException e) {} } use(o); o = null; notifyAll();}public synchronized void produce(Object p) { while (o != null) { try { wait(); } catch (InterruptedException e) {} } o = p; notifyAll();}
System-Software WS 04/05224 © P. Reali / M. Corti
Case Study: POSIX Threads Standard interface for threads in C Mostly UNIX, possible on Windows Provided by a library (libpthread) and not part of the
language. IEEE POSIX 1003.1c standard (1995) Various implementations (both user and kernel
level)
System-Software WS 04/05225 © P. Reali / M. Corti
Case Study: POSIX Threads
#include <pthread.h>
pthread_mutex_t m;
void *run(){ pthread_mutex_lock(&m); // critical section pthread_mutex_unlock(&m); pthread_exit(NULL);}
int main (int argc, char *argv[]){ pthread_t t; pthread_create(&t, NULL, run,NULL); pthread_exit(NULL);}
File Systems
System-Software WS 04/05227 © P. Reali / M. Corti
File Systems - Overview Hardware File abstraction File organization File systems
– Oberon– Unix– FAT
Distributed file systems– NFS– AFS
Special topics– Error recovery– ISAM– B* Trees
System-Software WS 04/05228 © P. Reali / M. Corti
Hardware: the ATA Bus ATA / IDE (1986)
– Advanced Technology Attachment
– Integrated Drive Electronics ATA-2 / EIDE ATA-4 / ATAPI
– ATA Packet Interface(SCSI command set)
ATA-5– UDMA 66
ATA-6– UDMA 100– SATA
ATA-7– UDMA 133
bus with 2 devices– master / slave
low-level interface– head / cylinder / sector– support for LBA
(logical block addressing)
PIO mode– read byte by byte through
hardware port
DMA mode– use DMA transfer
System-Software WS 04/05229 © P. Reali / M. Corti
Hardware: the SCSI Bus SCSI: Small Computer
Systems Interface SCSI-2
– Fast SCSI– Wide SCSI
SCSI-3
Bus with 8 devices– wide: 16 / 32 devices– bus arbitration– disconnected mode
Device kinds– direct access– CD-ROM– ...
Block-oriented access– read-block, write-block
Transfer mode selection– asynchronous (hand-shake)– synchronous (period / offset)
System-Software WS 04/05230 © P. Reali / M. Corti
surf
ace
(h
ead
)rotationaxis
track (cylinder)sector
Hardware: Hard Disk Organization
– cylinder (c)– head (h)– sector (s)
Addressing– sector (c, h, s)– block (LBA)
System-Software WS 04/05231 © P. Reali / M. Corti
Hardware: Example
Current disk example: ATA-100 250GB 512 bytes per sector (488·106 sectors) 8MB cache 8.9ms average seek time 7200 rpm
System-Software WS 04/05232 © P. Reali / M. Corti
1
2
3
4
5
67
cylinder
Hardware: Hard Disk Improvements Interleaving
optimize sequential sector access
Read-ahead Caching Sector defect management
System-Software WS 04/05233 © P. Reali / M. Corti
Hardware: Disk Scheduling Disk controllers have a queue of pending requests:
– type: read or write– block number: translated into the (h,c,s)-tuple– memory address (where to copy from and to)– amount to be transferred (byte or block count)
System-Software WS 04/05234 © P. Reali / M. Corti
Hardware: Disk Scheduling
First-come, first-served (FCFS)
Shortest-seek-time-first (SSTF)
SCAN (elevator) &C-SCAN
LOOK &C-LOOK
Performance: minimize head movements, maximize throughput
Scheduling is now in the hardware
System-Software WS 04/05235 © P. Reali / M. Corti
Hardware: Disk Scheduling Example (head position, track number):
queue = 31, 72, 4, 18, 147, 193, 199, 153, 114, 72
System-Software WS 04/05236 © P. Reali / M. Corti
Hardware: Disk Scheduling
System-Software WS 04/05237 © P. Reali / M. Corti
Abstractions
Block: array of sectors some systems call
them “clusters” user configured reduces address space increases access
speed causes internal
fragmentation
Disk: array of sectors
File: stream of bytes sequential access random access stored on disk
– mapping byte to block– block allocation
management
System-Software WS 04/05238 © P. Reali / M. Corti
Disk
ReadSector, WriteSector
Volume
ReadBlock, WriteBlockAllocateBlock, FreeBlock
File System
OpenFile, WriteFile, ReadFile, SeekFile, CloseFile
Abstractions Implementations
ATA driver
SCSI driver
FAT
Oberon
ISO 9660
Abstraction Layers
ext3
NTFS
System-Software WS 04/05239 © P. Reali / M. Corti
File Organization How can we map groups of blocks into files? How do we manage free space? How can I jump to a certain location?
Operation: read n bytes at position p.
System-Software WS 04/05240 © P. Reali / M. Corti
File Organization: Contiguous Allocation
File is a group of contiguous blocks Simple management Fast transfers IBM MVS (mainframe)
start length
System-Software WS 04/05241 © P. Reali / M. Corti
File Organization: Contiguous Allocation
external fragmentation allocation
– how much space does a file need?– first fit, best fit, …?
file growth (error? move? extensions?) preallocation: internal fragmentation
start length
System-Software WS 04/05242 © P. Reali / M. Corti
File Organization: Linked Allocation File is a linked list of blocks
– no external fragmentation– no growth problems
Problems– sequential files only (positioning requires traversal)– space for pointers (1TB, 5B addr., 1% with 512B blocks)– reliability (lost pointers)
start
System-Software WS 04/05243 © P. Reali / M. Corti
File Organization: Linked Allocation Clusters: series of contiguous blocks
– faster (less jumps)– less space wasted for pointers– internal fragmentation
start
System-Software WS 04/05244 © P. Reali / M. Corti
File Organization: Linked Allocation Pointer tables
– the list of pointers is stored in a separate table– can be cached– usually is stored twice (reliability)– FAT (MS-DOS, OS/2, Windows, solid-state memory)
start
System-Software WS 04/05245 © P. Reali / M. Corti
File Organization: Indexed Allocation Index with block addresses Fast access for random-access files No external fragmentation Problems
– high management overhead– limited file size (depending on the index structure)– pointer overhead
file
System-Software WS 04/05246 © P. Reali / M. Corti
File Organization: Indexed Allocation Variation:
– linked list of indexes
Advantage:– no file size limitation
Disadvantage:– Index lookup requires sequential traversal of index list
file
System-Software WS 04/05247 © P. Reali / M. Corti
File Organization: Indexed Allocation multi-level indexes
(index of indexes) UNIX
Advantage:– fast index lookup
Disadvantage:– limited file size
file
System-Software WS 04/05248 © P. Reali / M. Corti
File Organization: Indexed Allocation
Example: blocks 2KB address 4B
First level index blocks:512 entries · 2KB = 1MB
Second level index block:512 entries · 2KB = 0.5GB
file
System-Software WS 04/05249 © P. Reali / M. Corti
Free Space Management Bitmap (e.g., HFS)
– bit vector to mark free blocks– simple– needs caching
Linked lists– list of free blocks (similar to linked allocation)
Grouping– free blocks contain n address of free blocks (similar to
multilevel indexing)
Counting– list of 2-tuples of series of free blocks (start, length)
System-Software WS 04/05250 © P. Reali / M. Corti
Case Study: Oberon File System Disk module: controller driver
– block management FileDir module:
– maps files to locations– implemented with B-trees– garbage collection (files)
the directory is the root set anonymous (nonregistered) files are collected
Files module:– allows user operations (read, create, write,
…)– access is performed through riders
Files
FileDir
Disk
System-Software WS 04/05251 © P. Reali / M. Corti
Characteristics Block size = 1KB File organization
– multilevel index: 64 direct 12 1st level indirect
– 672 data bytes in file header
Block allocation– allocation table created at boot-time (partition GC)– no collection at run-time (partition fills up!)
designed to optimize
small files
Case Study: Oberon File System
System-Software WS 04/05252 © P. Reali / M. Corti
d
01
63
75
d
(672B)(672B)
ddd
dddd
dddd
dddd
ddi1d
i2
i1
12 index blocks with 256data blocks each
64 blocks
Case Study: Oberon File System Block = 1KB
System-Software WS 04/05253 © P. Reali / M. Corti
Free block management: bitmap
Garbage collection at startup
Case Study: Oberon File System
11111111111111111111111111111111
11010010011110111101110100011100
11010010011110110001110100011100
11010010011110110000110100011100
startup / GC
allocate 16,17
allocate 19
0 8 16 24
0 8 16 24
0 8 16 24
0 8 16 24
System-Software WS 04/05254 © P. Reali / M. Corti
Rf
f
File Handle
R
Buffer
R
Rider
“Hint”
Case Study: Oberon File System
Internals “Rider”: current read
or write position Buffer (cache) for
consistency (each filesees the write operationson it)
System-Software WS 04/05255 © P. Reali / M. Corti
Case Study: Oberon RAM Disk
File = POINTER TO Header;Index = POINTER TO Sector;
Rider = RECORDeof: BOOLEAN;file: File;pos: LONGINT;adr: LONGINT;
END;
Header = RECORDmark: LONGINT;name: FileDir.Name;len, time, date: LONGINText: ARRAY 12 OF Index;sec: ARRAY 64 OF SectorTable;
END;
exttable
primarysectortable
header
points tosectors 0 - 63
indexsector 0
points tosectors64 - 319
indexsector 1
points tosectors320 - 575
System-Software WS 04/05256 © P. Reali / M. Corti
Case Study: Oberon RAM Disk
PROCEDURE Read(VAR r: Rider; VAR x: SYSTEM.BYTE);VAR m: INTEGER;
BEGINIF r.pos < r.file.len THEN SYSTEM.GET(r.adr, x); INC(r.adr); INC(r.pos);IF r.adr MOD SS = 0 THEN (*end of sector *)m := SHORT(r.pos DIV SS);IF m < STS THEN
r.adr := r.file.sec[m]ELSE
r.adr := r.file.ext[(m-STS) DIV XS].x[(m-STS) MOD XS]END
ENDELSE x := 0X; r.eof := TRUEEND
END Read;
SS = Sector SizeSTS = Sector Table SizeXS = Index Size
System-Software WS 04/05257 © P. Reali / M. Corti
Case Study: Oberon RAM Disk
PROCEDURE Write(VAR r: Rider; x: SYSTEM.BYTE);VAR k, m, n: INTEGER; ix: LONGINT;
BEGINIF r.pos < r.file.len THENm := SHORT(r.pos DIV SS); INC(r.pos);IF m < STS THEN
r.adr := r.file.sec[m]ELSE
r.adr := r.file.ext[(m-STS) DIV XS].x[(m-STS) MOD XS]END
ELSE....
END;SYSTEM.PUT(r.adr, x); INC(r.adr);
END Write;
overwrite
System-Software WS 04/05258 © P. Reali / M. Corti
Case Study: Oberon RAM DiskIF r.pos < r.file.len THEN ....ELSE IF r.adr MOD SS = 0 THEN m := SHORT(r.pos DIV SS); IF m < STS THEN Kernel.AllocSector(0, r.adr); r.file.sec[m] := r.adr ELSE n := (m-STS) DIV XS; k := (m-STS) MOD XS; IF k = 0 THEN Kernel.AllocSector(0, ix); r.file.ext[n] := SYSTEM.VAL(Index, ix) END; Kernel.AllocSector(0, r.adr); r.file.ext[n].x[k] := r.adr END; INC(r.pos); r.file.len := r.pos END; SYSTEM.PUT(r.adr, x); INC(r.adr);
expand
System-Software WS 04/05259 © P. Reali / M. Corti
Case Study: UNIX, inodes
Inode: file owner file type
– regular / directory / special
access permissions access time reference count (links) table of contents file size
Inode table of contents 10 (12) direct blocks 1 indirect block 1 double indirect block 1 triple indirect block
File system: files and directories (files with a special content) A file is represented by an inode
System-Software WS 04/05260 © P. Reali / M. Corti
ddi3i3 i2i2
i2i2
i1i1
i1i1
i1i1
Case Study: UNIX, inodes
01
101112
i3 i2 i1 d
info
inode
typeaccess
refc
i2 i1
i1
ddd
ddd
ddd
System-Software WS 04/05261 © P. Reali / M. Corti
Case Study: UNIX, directories Directories are normal files with a special content. The data part contains a list with
– inode– name
Every directory has two special entries– . the directory itself– .. the parent directory
System-Software WS 04/05262 © P. Reali / M. Corti
Case Study: UNIX, inodes
type: dirblocks: 132owner: rootref count: 1
inode 2
/2 .2 ..4 bin3 root
block 132
type: dirblocks: 406owner: rootref count: 1
inode 3
/root/3 .2 ..5 .tcshrc6 mbox
block 406
type: fileblocks: 42, 103owner: rootref count: 1
inode 6
datablock 42
data
block 103
inode # name
inodes
disk block
System-Software WS 04/05263 © P. Reali / M. Corti
Case Study: UNIX, soft and hard links Hard links:
– two directories entries with the same inode number– each file has a reference counter
42 file42 hardlink
Soft links– the directory entry points to a special file with the path of
the linked file
42 file43 softlink
(inode 43 points to a special file with the path of file)
System-Software WS 04/05264 © P. Reali / M. Corti
Case Study: UNIX, hard links
type: dirblocks: 132owner: rootref count: 1
inode 2
/2 .2 ..4 bin3 root
block 132
type: dirblocks: 406owner: rootref count: 1
inode 3
/root/3 .2 ..5 mails5 mbox
block 406
type: fileblocks: 42, 103owner: rootref count: 2
inode 5
datablock 42
data
block 103
inodes
disk block
System-Software WS 04/05265 © P. Reali / M. Corti
Case Study: UNIX, soft links
type: dirblocks: 132owner: rootref count: 1
inode 2
/2 .2 ..4 bin3 root
block 132
type: dirblocks: 406owner: rootref count: 1
inode 3
/root/3 .2 ..5 mbox6 mails
block 406
type: fileblocks: 42owner: rootref count: 1
inode 5datablock 42
type: fileblocks: 43owner: rootref count: 1
inode 6/root/mboxblock 43
System-Software WS 04/05266 © P. Reali / M. Corti
Case Study: UNIX, Volume Layout
A volume (partition) contains boot block
– bootstrap code
super block– size– max file– free space– …
inodes data blocks
bootblock
superblock inode list data blocks
System-Software WS 04/05267 © P. Reali / M. Corti
Case Study: UNIX, Functions
Core functions breadread block bwrite write block
iget get inode from disk iput put inode to disk bmapmap (inode, offset) to disk block namei convert path name to inode
System-Software WS 04/05268 © P. Reali / M. Corti
Case Study: UNIX, namei
namei (path)
if (absolute path)inode = root;
elseinode = current directory inode;
while (more path to process) {read directory (inode);if match(directory, name component) {inode = directory[name component];iget(inode);
} else {return no inode;
}}
return inode;
System-Software WS 04/05269 © P. Reali / M. Corti
FATnn: nn corresponds to the FAT size in bits FAT12, FAT16, FAT32 used by MS-DOS and
Windows for disks and floppies Volume Layout
bootblock
FAT1 FAT2 rootdirectory
data
Case Study: FAT
System-Software WS 04/05270 © P. Reali / M. Corti
Case Study: FAT, Example0
1
2 EOF
3 EOF
4 12
5 FREE
6 9
7 BAD
8 3
9 11
10 EOF
11 10
12 EOF
13 FREE
…
6 9 11 10
4 12
38
File 1:
File 2:
File 3:
disksize
System-Software WS 04/05271 © P. Reali / M. Corti
Case Study: FAT, Directory Information about files is kept in the directory
File name (8)
Extension (3)
A D V S H R
Reserved (10)
Time (2)
Date (2)
First block (2)
File size (4)
System-Software WS 04/05272 © P. Reali / M. Corti
Case Study: FAT, Max. Partition Size
Block size FAT-12 FAT-16 FAT-32
0.5 KB 2 MB
1 KB 4 MB
2 KB 8 MB 128 MB
4 KB 16 MB 256 MB 1 TB
8 KB 512 MB 2 TB
16 KB 1024 MB 2 TB
32 KB 2048 MB 2 TB
System-Software WS 04/05273 © P. Reali / M. Corti
File System Mounting More than one volume mounted in the same
directory tree.
/ usr
mnt
floppy
dos
cd
home corti
bin
afs ethz.ch
System-Software WS 04/05274 © P. Reali / M. Corti
Virtual File System Support for several file systems
– disk based– network– special
VFS: unifies the system calls Mirrors the traditional UNIX file system model
Applications
ext3 FAT NFS AFS proc pts
ext3 FAT NFS AFS proc ptsVFS
System-Software WS 04/05275 © P. Reali / M. Corti
File System Mounting Each file system type has a method table System calls are indirect function calls through the method
table Common interface (open, write, readdir, lock, …) Each file is associated with a the method table
System-Software WS 04/05276 © P. Reali / M. Corti
File System Mounting: Special Files Devices
– disks– memory– USB devices– serial ports– …
Kernel communication (e.g., proc) Uniform interface (open, close, read, write) Uniform protection (user, groups)
System-Software WS 04/05277 © P. Reali / M. Corti
File Systems: Protection Restrict: access (who), operations (what),
management– FAT: flags in the directory
e.g., read only execution based on name
– UNIX: restrictions in inodes based on users and groups operations: read, write, execute directories: manage files not so flexible
– VMS: access lists list of users and rights per file
Distributed File Systems
System-Software WS 04/05279 © P. Reali / M. Corti
Distributed File Systems (DFS) Clients, servers and storage are dispersed among
machines in a distributed system.
Client Client
ClientClient
Client
Server
ServerServer
Client
Server
System-Software WS 04/05280 © P. Reali / M. Corti
Overview
Naming (dynamic): location
transparency: file name does not reveal the file location
location independence: file name does not change when storage is moved
Caching (efficiency) write-through delayed-write write-on-close
Consistency client-initiated: poll
server for changes server-initiated: notify
clients
System-Software WS 04/05281 © P. Reali / M. Corti
Naming Simple approaches:
– file is identified by a host, path pair– Ibis (host:path)– SMB (\\host\path)
Transparent– remote directory are mounted in the local file system– not uniform (the mount point is not defined)– NFS (/mnt/home, /home/)– SMB (\\host\path mounted on Z:)
Global name structure– uniform and transparent naming– AFS (/afs/cell/path)
System-Software WS 04/05282 © P. Reali / M. Corti
Caching Reduces network and disk load Consistency problems Granularity:
– How much? Big/small chunks of data? Entire files?– Big: +hit ratio, +hit penalty, +consistency problems
Location:– memory: +diskless stations, +speed– disk: +cheaper, +persistent– hybrid
Space consumption on the clients
System-Software WS 04/05283 © P. Reali / M. Corti
Caching
Policies: write-through: +reliability, -performance (cache is
effective only for read operations) delayed-write: +write speed, +unnecessary writes
eliminated, -reliability– write when the cache is full (+performance, -long time in
the cache)– regular intervals
write-on-close
System-Software WS 04/05284 © P. Reali / M. Corti
Consistency Is my cached copy up-to-date? Client-initiated approach:
– the client performs validity checks– when? open/fixed intervals/every access
Server-initiated approach:– the server keeps track of cached files (parts)– notifies the clients when conflicts are detected– should the server allow conflicts?
System-Software WS 04/05285 © P. Reali / M. Corti
Stateless and Stateful Servers
Stateful: the server keeps track of each accessed file session IDs (e.g., identifying an inode on the server) fast
– simple requests– caches– fewer disk accesses– read ahead
volatile– server crash: rebuild structures (recovery protocol)– client crash: orphan detection and elimination
System-Software WS 04/05286 © P. Reali / M. Corti
Stateless and Stateful Servers
Stateless: each request is self-contained request: file and position complex requests need for uniform low-level naming scheme (to avoid
name translations) need idempotent operations (same results if
repeated)– absolute byte counts
No locking possible
System-Software WS 04/05287 © P. Reali / M. Corti
File Replication A file can be present on failure independent
machines Naming scheme manages the mapping
– same high-level name– different low-level names
Transparency Consistency
System-Software WS 04/05288 © P. Reali / M. Corti
Distributed File-Systems (mainstream) NFS: Network File System (Sun) AFS: Andrew File System (CMU) SMB: Server Message Block (Microsoft) NCFS: Network Computer FS (Oberon)
System-Software WS 04/05289 © P. Reali / M. Corti
Network File System (NFS) UNIX - based (Sun) mount file system from
another machine into local directory
stateless (no open/close)
uses UDP to communicate
based on RPC and XDR (External Data Representation)– every operation is a
remote procedure call
known problems:– no caching– no disconnected mode– efficiency
security: IP based
System-Software WS 04/05290 © P. Reali / M. Corti
NFS: Example
/home
corti
reali
etc
server
exports
/home/ client(rw)
mount -t nfs server:/home /home client
/home
etc
/home
corti
reali
etc
System-Software WS 04/05291 © P. Reali / M. Corti
NFS No special servers (each machine can act as a
server and as a client) Cascading mounts are allowed
– mount -t nfs server1:/home /home– mount -t nfs server2:/projects/corti /home/corti/projects
Limited scalability (limited number of exports)
System-Software WS 04/05292 © P. Reali / M. Corti
NFS: Stateless Protocol Each request contains a unique file identifier and an
absolute offset No concurrency control (locking has to be
performed by the applications) Committed information is assumed to be on disk
(the server cannot cache writes)
System-Software WS 04/05293 © P. Reali / M. Corti
Network File System (NFS)
Virtual file system layer
System call layer
Local filesystem
NFS client
RPC / XDR
Virtual file system layer
Local filesystem
NFS server
RPC / XDR
network (UDP)
System-Software WS 04/05294 © P. Reali / M. Corti
12 34
0 1
1234
Big-endian: MSB before LSB• IBM, Motorola, SPARC
Little-endian: LSB before MSB•VAX, Intel
network byte-
ordering
little end first
Remote Procedure Invocation: Overview
Problem– send structured information
from A to B– A and B may have different
memory layouts– byte order problems
– How is 0x1234 (2 bytes) represented in memory?
System-Software WS 04/05295 © P. Reali / M. Corti
Marshalling / SerializationMarshalling: packing one or
more data items into a buffer using a standard representation
Presentation layer (OSI)
RPC + XDR (Sun)– RFC 1014, June 1987– RFC 1057, June 1988
IIOP / CORBA (OMG)– V2.0, February 1997– V3.0, August 2002
SOAP / XML (W3C)– V1.1, May 2000
XDR Type System [unsigned] integer (32-bit) [unsigned] hyper-integer
(64-bit) enumeration (unsigned int) boolean (enum) float / double (IEEE 32/64-
bit) opaque string array (fix + variable size) structure union void
System-Software WS 04/05296 © P. Reali / M. Corti
Client
procedure P(a, b, c)•pack parameters•send message toserver
•await response
•unpack response
Server
Server
•unpack parameters• find procedure• invoke•pack response•send response
P(a, b, c)
RPC Protocol Remote procedure call Marshalling of procedure
parameters
Message format Authentication Naming
System-Software WS 04/05297 © P. Reali / M. Corti
NFS
Client Server
lookup lookup
read read
write write
RPC - protocol
System-Software WS 04/05298 © P. Reali / M. Corti
NFS Efficiency Stateless protocols are inherently slow
– e.g., directory lookup
Caching:– file blocks (data)– file attributes (inodes)
– read-ahead– delayed write
– tradeoff between speed and consistency
It is possible that two machines see different data
System-Software WS 04/05299 © P. Reali / M. Corti
NFS: Security Exports based on IP addresses
– low security– low granularity
Data is not encrypted Permissions based on user and group ID
– uniform naming needed (e.g., NIS)
System-Software WS 04/05300 © P. Reali / M. Corti
Andrew File System (AFS) 1983 CMU (later IBM, now open source) Scalable (>5000 workstations):
– network divided in clusters (cells)
Client/user mobility (files are accessible from everywhere)
Security: encrypted communication (Kerberos) Protection: control access lists Heterogeneity: clear interface to the server
System-Software WS 04/05301 © P. Reali / M. Corti
Andrew File System (AFS) server provides a cell world-wide addressing
scheme (name cell) client caches a whole
file server-synchronization
on file open and close
AFS is efficient low network overhead stateful: consistency is
implemented with callbacks
callback = client is in synch with server
on store, server changes the callbacks
System-Software WS 04/05302 © P. Reali / M. Corti
AFS: Logical View
/
afs
dir dir
vol
bin
binusr
Shared Space
PrivateSpace
f
Volume
Mount Point
System-Software WS 04/05303 © P. Reali / M. Corti
AFS: Physical View
ethz.ch
epfl.ch
cmu.edu
clientsever
cell
network
System-Software WS 04/05304 © P. Reali / M. Corti
AFS
Client Server
open open
RPC - protocol
close
Cache
read
write
close
System-Software WS 04/05305 © P. Reali / M. Corti
AFS: Consistency Interaction only when opening and closing files. Writes are not visible on other machines before a
close. Clients assume that cached files are up-to-date. Servers keep track of caching by the clients
(callbacks)– clients are notified in case of changes
System-Software WS 04/05306 © P. Reali / M. Corti
AFS: Kerberos Kerberos (Cerberos: three-headed dog guarding the
Hades)– authentication– accounting– audit
Needham-Schroeder shared key protocol Distributed AFS: communication is encrypted
System-Software WS 04/05307 © P. Reali / M. Corti
AFS: Protection
Access lists:%> fs listacl thesis
Access list for thesis is
Normal rights:
system:anyuser l
trg rlidwk
corti rlidwka It’s possible to allow (or deny) access to users or
customized groups Restriction on: read, write, lookup, insert,
administer, lock and delete. Supports UNIX control bits.
System-Software WS 04/05308 © P. Reali / M. Corti
The Eight Fallacies of Distributed Computing (Peter Deutsch)
Network Fallacies
The network is reliable Latency is zero Bandwidth is infinite The network is secure
The network topology doesn’t change
There is one administrator Transport cost is zero The network is
homogeneous
System-Software WS 04/05309 © P. Reali / M. Corti
General Principles (Satyanarayan)
From DFSs we learned the following lessons: we should try to move computations to the clients use caching whenever possible special files (e.g., temporary) can be specially
treated. make scalable systems. trust the fewest possible entities batch work if possible
Kernel Structure
System-Software WS 04/05311 © P. Reali / M. Corti
Introduction Kernel performs “dangerous” operations
– page table mapping– scheduling
Kernel must be protected against malign user code– access to other processes’ data– increasing own processes’ priority
Kernel must have more rights than user code Solution:
– distinguish between kernel mode and user mode– access kernel through system calls– the system calls define the interface to the kernel
System-Software WS 04/05312 © P. Reali / M. Corti
application application applicationsystem calls
Kernel Protection
application application application
driversmemorymanager
filesystems
System-Software WS 04/05313 © P. Reali / M. Corti
Kernel Protection
Means: hardware support
– privileged instructions– supervisor mode
separate address spaces– user process has no access to kernel structures
access memory / functions through symbolic names– user has no access to hardware
System-Software WS 04/05314 © P. Reali / M. Corti
Kernel Protection Privileged instructions in user mode generate a trap Mode switch:
– interrupts– gated calls (user generated sw interrupt calls)
Parameters:– stack– registers
Examples:– Intel x86: 4 protection levels (code/segment attribute),
interrupt– PowerPC: 2 levels (CPU attribute), special instruction
System-Software WS 04/05315 © P. Reali / M. Corti
Linux System Calls (Intel) System calls are wrapped in libraries (e.g., libc) The library function
– writes the parameters in registers (5)– writes the parameters on the stack (>5)– writes the system call number in EAX– calls int 0x80
The kernel– jumps to the corresponding function in sys_call_table
System-Software WS 04/05316 © P. Reali / M. Corti
Linux System CallsExamples: pid_t fork(void): creates a child process ssize_t write(int fd, const void *buf, size_t count): writes count bytes from buf to fd
int kill(pid_t pid, int sig): send signal to a process
int gettimeofday(struct timeval *tv, struct timezone *tz): gets the current time
int open(const char *pathname, int flags): opens a file
int ioctl(int d, int request, ...): manipulates special devices
…
System-Software WS 04/05317 © P. Reali / M. Corti
Windows System Calls Layered system: system
call must be performed by a wrapper (NTDLL.DLL).
The system call position in the KiSystemServiceTable is not known (depends on the build)
call WriteFile()
KiSystemServiceTable
NtWriteFile()
application
KERNEL32.DLL
…int 0x2e
NTDLL.DLL
System-Software WS 04/05318 © P. Reali / M. Corti
Kernel Design: API vs. System Calls
Linux system-calls are clearly
specified (POSIX standard) system-calls do not change about 100 calls
Windows system-calls are hidden only Win32 API is published Win32 is standard “thousands” of API calls,
still growing some API calls are handled
in user space More than one API:
– POSIX– OS/2
System-Software WS 04/05319 © P. Reali / M. Corti
Protection and SMP What happens when two process (on two CPUs)
enter in kernel mode?– Big kernel lock: not allowed (OpenBSD, NetBSD)– Fine grained locks in the kernel (FreeBSD 5, Linux 2.6)
CPU 1 CPU 2
proc1:
int 0x80
proc1:
int 0x80
System-Software WS 04/05320 © P. Reali / M. Corti
Kernel Structure monolithic kernel
– big mess, no structure, one big block, fast– MS-DOS (no protection), original UNIX– micro-kernel (AIX, OS X)
layered system– layern uses functions from layern-1
– OS/2 (some degree of layering)
virtual machine– define artificial environment for programs
client-server – tiny communication microkernel to access various
services
System-Software WS 04/05321 © P. Reali / M. Corti
Monolithic Kernels
terminal controllersdevice drivers
memory controllers
schedulersignal handling
file systemswapping
virtual memory
user-levelapplications
terminal controllersdevice drivers
memory controllers
schedulersignal handling
file systemswapping
virtual memory
user-levelapplications
Monolithic Micro-kernel
System-Software WS 04/05322 © P. Reali / M. Corti
Layered Systems THE operating system A layer uses only functions from
below What goes where? Less efficient
user programs
buffering I/O
console drivers
memory management
CPU scheduling
hardware
System-Software WS 04/05323 © P. Reali / M. Corti
Virtual Machines VM operating system (IBM) slow and difficult to implement complete protection no sharing of resources useful for development and
research compatibility
hardware
virtual machine
procs procs procs
System-Software WS 04/05324 © P. Reali / M. Corti
Design: Kernel or User Space?
Big monolithic kernel: fast (less switches) less protection
Examples: HTTP server in the Linux
kernel. graphic routines in
Windows
Modular and micro-kernels: structured more separation move code to user space less efficient more secure
Example: user level drivers
System-Software WS 04/05325 © P. Reali / M. Corti
Virtual Machines Machine specification in
software– instruction set– memory layout– virtual devices– ....
JVM (Java Virtual Machine) .NET / Mono VMWare
– specified machine is a whole PC
– allows multiple PC environments on same machine
IBM VM/370
Case Study: JVM
System-Software WS 04/05327 © P. Reali / M. Corti
Reality is somewhat fuzzy!
Is a Pentium-II a machine?
Hardware and software are
logically equivalent
(A. Tanenbaum)
RISCCore
instructions
decoderOp1
Op2
Op3
Virtual Machines
What is a machine? does something (...useful) programmable concrete (hardware)
What is a virtual machine? a machine that is not
concrete a software emulation of a
physical computing environment
System-Software WS 04/05328 © P. Reali / M. Corti
Virtual Machine, Intermediate Language Pascal P-Code (1975)
– stack-based processor– strong type machine language– compiler: one front end, many back ends– UCSD Apple][ implementation, PDP 11, Z80
Modula M-Code (1980)– high code density– Lilith as microprogrammed virtual processor
JVM – Java Virtual Machine (1995)– Write Once – Run Everywhere– interpreters, JIT compilers, Hot Spot Compiler
Microsoft .NET (2000)– language interoperability
System-Software WS 04/05329 © P. Reali / M. Corti
JVM Case Study compiler (Java to bytecode) interpreter, ahead-of-time
compiler, JIT dynamic loading and linking exception Handling memory management,
garbage collection
OO model with single inheritance and interfaces
system classes to provide OS-like implementation
– compiler– class loader– runtime– system
System-Software WS 04/05330 © P. Reali / M. Corti
JVM: Type System Primitive types
– byte– short– int– long– float– double– char
– reference– boolean mapped to int
Object types– classes– interfaces– arrays
Single class inheritance Multiple interface
implementation Arrays
– anonymous types– subclasses of
java.lang.Object
System-Software WS 04/05331 © P. Reali / M. Corti
JVM: Java Byte-Code
Memory access tload / tstore ttload / ttstore tconst getfield / putfield getstatic / putstatic
Operations tadd / tsub / tmul / tdiv tshifts
Conversions f2i / i2f / i2l / .... dup / dup2 / dup_x1 / ...
Control ifeq / ifne / iflt / .... if_icmpeq / if_acmpeq invokestatic invokevirtual invokeinterface athrow treturn
Allocation new / newarray
Casting checkcast / instanceof
System-Software WS 04/05332 © P. Reali / M. Corti
JVM: Java Byte-Code Example
bipush
Operation Push byte
Format
Forms bipush = 16 (0x10)
Operand Stack ... => ..., value
Description The immediate byte is sign-extended to an int value. That value is pushed onto the operand stack.
bipush
byte
System-Software WS 04/05333 © P. Reali / M. Corti
JVM: Machine Organization
Virtual Processor stack machine no registers typed instructions no memory addresses, only
symbolic names
Runtime Data Areas pc register stack
– locals– parameters– return values
heap method area
– code
runtime constant pool native method stack
System-Software WS 04/05334 © P. Reali / M. Corti
iload 5iload 6iaddistore 4
iaddv5+v6
v5
v6
locals
v4
istore 4
pro
gra
m
Time
v5
iload 5
v6
iload 6
operand stack
JVM: Execution Example
System-Software WS 04/05335 © P. Reali / M. Corti
JVM: Reflection
java.lang.Class– getFields– getMethods– getConstructors
java.lang.reflect.Field– setObject getObject– setInt getInt– setFloatgetFloat– .....
java.lang.reflect.Method– getModifiers– invoke
java.lang.reflectConstructor
Load and manipulate unknown classes at runtime.
System-Software WS 04/05336 © P. Reali / M. Corti
JVM: Reflection – Example
import java.lang.reflect.*;
public class ReflectionExample {
public static void main(String args[]) { try { Class c = Class.forName(args[0]); Method m[] = c.getDeclaredMethods(); for (int i = 0; i < m.length; i++) { System.out.println(m[i].toString()); } } catch (Throwable e) { System.err.println(e); } }}
System-Software WS 04/05337 © P. Reali / M. Corti
JVM: Java Weaknesses
Transitive closure of java.lang.Object contains 1.1 47 1.2 178 1.3 180 1.4 248 5 (1.5) 280 classpath 0.03 299
class Object {public String toString();....
}class String {
public String toUpperCase(Locale loc);....
} public final class Locale implements Serializable, Cloneable {
....}
System-Software WS 04/05338 © P. Reali / M. Corti
B
static {y = A.f();
}
A
static {x = B.f();
}
JVM: Java Weaknesses
Class static initialization T is a class and an instance of T
is created
T tmp = new T(); T is a class and a static method
of T is invoked
T.staticMethod(); A nonconstant static field of T is
used or assigned(field is not static, not final, and not initialized with compile-time constant)
T.someField = 42;
Problem circular dependencies in static
initialization code
System-Software WS 04/05339 © P. Reali / M. Corti
JVM: Java Weaknessesinterface Example {
final static String labels[] = {“A”, “B”, “C”}}
hidden static initializer:labels = new String[3];
labels[0] = “A”; labels[1] = “B”; labels[2] = “C”;
Warning: in Java final means write-once! interfaces may contain code
System-Software WS 04/05340 © P. Reali / M. Corti
JVM: Memory Model The JVM specs define a memory model:
– defines the relationship between variables and the underlying memory
– meant to guarantee the same behavior on every JVM
The compiler is allowed to reorder operations unless synchronized or volatile is specified.
System-Software WS 04/05341 © P. Reali / M. Corti
JVM: Reordering read and writes to ordinary variables can be
reordered.public class Reordering { int x = 0, y = 0;
public void writer() { x = 1; y = 2; }
public void reader() { int r1 = y; int r2 = x; }}
System-Software WS 04/05342 © P. Reali / M. Corti
JVM: Memory Model synchronized: in addition to specify a monitor it
defines a memory barrier:– acquiring the lock implies an invalidation of the caches– releasing the lock implies a write back of the caches
synchronized blocks on the same object are ordered.
order among accesses to volatile variables is guaranteed (but not among volatile and other variables).
System-Software WS 04/05343 © P. Reali / M. Corti
JVM: Double Checked Lock
Singleton
public class SomeClass {
private static Resource resource = null;
public Resource synchronized getResource() { if (resource == null) { resource = new Resource(); } return resource; }}
System-Software WS 04/05344 © P. Reali / M. Corti
JVM: Double Checked Lock
Double checked locking
public class SomeClass {
private static Resource resource = null;
public Resource getResource() { if (resource == null) { synchronized (this) { if (resource == null) { resource = new Resource(); } } } return resource; }}
System-Software WS 04/05345 © P. Reali / M. Corti
JVM: Double Checked Lock
Thread 1 Thread 2
public class SomeClass {
private Resource resource = null;
public Resource getResource() { if (resource == null) { synchronized { if (resource == null) { resource = new Resource(); } } } return resource; }}
public class SomeClass {
private Resource resource = null;
public Resource getResource() { if (resource == null) { synchronized { if (resource == null) { resource = new Resource(); } } } return resource; }}
The object isinstantiated
but not yet initialized!
System-Software WS 04/05346 © P. Reali / M. Corti
JVM: Immutable Objects are not Immutable Immutable objects:
– all types are primitives or references to immutable objects– all fieds are final
Example (simplified): java.lang.String– contains
an array of characters the length an offset
– example: s = “abcd”, length = 2, offset = 2, string = “cd”
String s1 = “/usr/tmp”String s2 = s1.substring(4); //should contain “/tmp”
Sequence: s2 is instantiated, the fields are initialized (to 0), the array is copied, the fields are written by the constructor.
What happens if instructions are reordered?
System-Software WS 04/05347 © P. Reali / M. Corti
JVM: Reordering Volatile and Nonvolatile Stores
volatile reads and writes are totally ordered among threads
but not among normal variables example
Thread 1 Thread 2
o = new SomeObject;initialized = true;
while (!initialized) {sleep();
}o.field = 42;
volatile boolean initialized = false;SomeObject o = null;
?
System-Software WS 04/05348 © P. Reali / M. Corti
JVM: JSR 133 Java Community Process Java memory model revision
Final means final Volatile fields cannot be reordered
System-Software WS 04/05349 © P. Reali / M. Corti
Java JVM: Execution Interpreted (e.g., Sun JVM)
– bytecode instructions are interpreted sequentially– the VM emulates the Java Virtual Machine– slower– quick startup
Just-in-time compilers (e.g., Sun JVM, IBM JikesVM)– bytecode is compiled to native code at load time (or later)– code can be optimized (at compile time or later)– quicker– slow startup
Ahead-of time compilers (e.g., GCJ)– bytecode is compiled to native code offline– quick startup– quick execution– static compilation
System-Software WS 04/05350 © P. Reali / M. Corti
JVM: Loader – The Classfile Format
ClassFile {
version
constant pool
flags
super class
interfaces
fields
methods
attributes
}
Constants: Values
String / Integer / Float / ... References
Field / Method / Class / ...
Attributes: ConstantValue Code Exceptions
System-Software WS 04/05351 © P. Reali / M. Corti
JVM: Class File Formatclass HelloWorld {
public static void printHello() {System.out.println("hello, world");
}
public static void main (String[] args) {HelloWorld myHello = new HelloWorld();myHello.printHello();
}
}
System-Software WS 04/05352 © P. Reali / M. Corti
JVM: Class File (Constant Pool)1. String hello, world2. Class HelloWorld3. Class java/io/PrintStream4. Class java/lang/Object5. Class java/lang/System6. Methodref HelloWorld.<init>()7. Methodref
java/lang/Object.<init>()8. Fieldref java/io/PrintStream
java/lang/System.out9. Methodref
HelloWorld.printHello()10. Methodref
java/io/PrintStream.println(java/lang/String )
11. NameAndType <init> ()V12. NameAndType out
Ljava/io/PrintStream;13. NameAndType printHello ()V14. NameAndType println
(Ljava/lang/String;)V
15. Unicode ()V16. Unicode (Ljava/lang/String;)V17. Unicode
([Ljava/lang/String;)V18. Unicode <init>19. Unicode Code20. Unicode ConstantValue21. Unicode Exceptions22. Unicode HelloWorld23. Unicode HelloWorld.java24. Unicode LineNumberTable25. Unicode Ljava/io/PrintStream;26. Unicode LocalVariables27. Unicode SourceFile28. Unicode hello, world29. Unicode java/io/PrintStream30. Unicode java/lang/Object31. Unicode java/lang/System32. Unicode main33. Unicode out34. Unicode printHello
System-Software WS 04/05353 © P. Reali / M. Corti
JVM: Class File (Code)Methods
0 <init>() 0 ALOAD0 1 INVOKESPECIAL [7] java/lang/Object.<init>() 4 RETURN
1 PUBLIC STATIC main(java/lang/String []) 0 NEW [2] HelloWorld 3 DUP 4 INVOKESPECIAL [6] HelloWorld.<init>() 7 ASTORE1 8 INVOKESTATIC [9] HelloWorld.printHello() 11 RETURN
2 PUBLIC STATIC printHello() 0 GETSTATIC [8] java/io/PrintStream java/lang/System.out 3 LDC1 hello, world 5 INVOKEVIRTUAL [10] java/io/PrintStream.println(java/lang/String ) 8 RETURN
System-Software WS 04/05354 © P. Reali / M. Corti
JVM: Compilation – Pattern Expansion Each byte code is translated according to fix
patterns+ easy- limited knowledge
Example (pseudocode) switch (o) {case ICONST<n>: generate(“push n”); PC++; break;case ILOAD<n>: generate(“push off_n[FP]”); PC++; break;case IADD: generate(“pop -> R1”); generate(“pop -> R2”); generate(“add R1, R2 -> R1”); generate(“push R1”); PC++; break;…
System-Software WS 04/05355 © P. Reali / M. Corti
JVM: Optimizing Pattern Expansion
Main Idea: use internal virtual stack stack values are consts / fields / locals / array
fields / registers / ... flush stack as late as possible
iload 4iload 5iaddistore 6
local4 local4
local5
EAX
local5
EAX
MOV EAX, off4[FP] ADD EAX, off5[FP]
iload4 iload5 iadd istore6
MOV off6[FP], EAXemittedcode
virtualstack
System-Software WS 04/05356 © P. Reali / M. Corti
JVM: Compiler Comparison
pattern expansionpush off4[FP]
push off5[FP]
pop EAX
add 0[SP], EAX
pop off6[FP]
optimizedmov EAX, off4[FP]
add EAX, off5[FP]
mov off6[FP], EAX
iload_4iload_5iaddistore_6
5 instructions9 memory accesses
3 instructions3 memory accesses
System-Software WS 04/05357 © P. Reali / M. Corti
Linking (General) A compiled program contains references to external
code (libraries) After loading the code the system need to link the
code to the library– identify the calls to external code– locate the callees (and load them if necessary)– patch the loaded code
Two options:– the code contains a list of sites for each callee– the calls to external code are jumps to a procedure
linkage table which is then patched (double indirection)
System-Software WS 04/05358 © P. Reali / M. Corti
Linking (General)
0 instr
1 instr
2 jump -
3 instr
4 instr
5 jump -
6 instr
7 jump 2
9 instr
10 instr
proc 0 5
proc 1 7
0 instr
1 instr
2 jump 101
3 instr
4 instr
5 jump 100
6 instr
7 jump 101
9 instr
10 instr
100 jump
101 jump
System-Software WS 04/05359 © P. Reali / M. Corti
Linking (General)
0 instr
1 instr
2 jump &p1
3 instr
4 instr
5 jump &p0
6 instr
7 jump &p1
9 instr
10 instr
proc 0 5
proc 1 7
0 instr
1 instr
2 jump 101
3 instr
4 instr
5 jump 100
6 instr
7 jump 101
9 instr
10 instr
100 jump &p0
101 jump &p1
System-Software WS 04/05360 © P. Reali / M. Corti
JVM: Linking Bytecode interpreter
– references to other objects are made through the JVM (e.g., invokevirtual, getfield, …)
Native code (ahead of time compiler)– static linking– classic native linking
JIT compiler– only some classes are compiled– calls could reference classes that are not yet loaded or
compiled (delayed compilation) code instrumentation
System-Software WS 04/05361 © P. Reali / M. Corti
JVM: Methods and Fields Resolution method and fields are accessed through special VM
functions (e.g., invokevirtual, getfield, …) the parameters of the special call defines the target the parameters are indexes in the constant pool the VM checks id the call is legal and if the target is
presentl
System-Software WS 04/05362 © P. Reali / M. Corti
class A {
....
...B.x}
class B {int x;
}
B.x CheckClass(B);B.x
IF ~B.initialized THEN
Initialize(B)END;
JVM: JIT – Linking and Instrumentation Use code instrumentation to detect first access of static
fields and methods
System-Software WS 04/05363 © P. Reali / M. Corti
C header C source
Compiler
ObjectFileObject
FileObjectFile
Object file
Linker
C header
Loader
LoadedCode
Compilation and Linking Overview
System-Software WS 04/05364 © P. Reali / M. Corti
Oberonsource
Compiler
ObjectFileObject
FileObjectFile
Object &Symbol
LoaderLinker
LoadedModule
LoadedModule
LoadedModuleLoaded
Module
Compilation and Linking Overview
System-Software WS 04/05365 © P. Reali / M. Corti
LoaderLinker
JITCompiler
Javasource
ClassFile
Compiler
ClassLoader
Class
ReflectionAPI
ClassClass
ClassClass
Compilation and Linking Overview
System-Software WS 04/05366 © P. Reali / M. Corti
Jaos Jaos (Java on Active Object System) is a Java
virtual machine for the Bluebottle system goals:
– implement a JVM for the Bluebottle system– show that the Bluebottle kernel is generic enough to
support more than one system– interoperability between the Active Oberon and Java
languages– interoperability between the Oberon System and the Java
APIs
System-Software WS 04/05367 © P. Reali / M. Corti
Metadata
LoaderLinker
LoadedModuleLoaded
Module
OberonLoaderLinker
LoadedModule
Oberonsource
Compiler
Object &Symbol
OberonMetadata
Loader
OberonBrowser
JavaReflection
API
JITCompiler
LoadedClass
Linker
ClassFile
LoaderJava
MetadataLoader
Jaos (Interoperability Framework)
System-Software WS 04/05368 © P. Reali / M. Corti
JVM: Verification Compiler generates
“good” code.... .... that could be
changed before reaching the JVM
need for verification
Verification makes the VM simpler (less run-time checks):
– no operand stack overflow– load / stores are valid– VM types are correct– no pointer forging– no violation of access
restrictions– access objects as they are
(type)– local variable initialized before
load– …
System-Software WS 04/05369 © P. Reali / M. Corti
JVM: Verification
Pass1 (Loading): class file version check class file format check class file complete
Pass 2 (Linking): final classes are not
subclassed every class has a
superclass (but Object) constant pool references constant pool names
System-Software WS 04/05370 © P. Reali / M. Corti
Byte-CodeVerification
Delayed forperformance
reasons
JVM: Verification
Pass 3 (Linking):
For each operation in code
(independent of the path): operation stack size is the
same accessed variable types are
correct method parameters are
appropriate field assignment with
correct types opcode arguments are
appropriate
Pass 4 (RunTime):
First time a type is referenced: load types when referenced check access visibility class initialization
First member access: member exists member type same as
declared current method has right to
access member
System-Software WS 04/05371 © P. Reali / M. Corti
JVM: Byte-Code Verification
Verification: branch destination must
exists opcodes must be legal access only existing locals code does not end in the
middle of an instruction
types in byte-code must be respected
execution cannot fall of the end of the code
exception handler begin and end are sound
Addendum: Security
System-Software WS 04/05373 © P. Reali / M. Corti
Security internal protection
– memory protection– file system accesses
external protection– accessibility
problems:– program threats
System-Software WS 04/05374 © P. Reali / M. Corti
Security: Program Threats Trojan horses: a code segment
that misuses its environment– mail attachments– web downloads (e.g., SEXY.EXE
which formats your hard disk)– programs with the same name as
common utilities– misleading names (e.g.,
README.TXT.EXE) Trap door (in programs or
compilers): an intentional hole in the software
System-Software WS 04/05375 © P. Reali / M. Corti
Security: System Threats worms: a standalone program that spawns other
processes (copies of itself) to reduce system performance– example: Morris worm (1988)
exploited holes in rsh, finger and sendmail to gainaccess to other machines
once on the other machine it was able to replicate itself
– used by spammers to spread and distribute spamming applications
viruses: similar to worms but embedded in other programs– they usually infect other programs and
the boot sector
System-Software WS 04/05376 © P. Reali / M. Corti
Security: System Threats Denial of service
– perform many requests to steal all the available resources– often distributed (using worms)
Example: SYN flooding attacks– the attacker tries to connect– the victim answers with a synchronize and acknowledge
packet– and waits for acknowledgment
Countermeasures– active filtering– request dropping– cookie based protocols (requests must be authenticated)– stateless protocols
System-Software WS 04/05377 © P. Reali / M. Corti
Security: System Threats badly implemented and designed software:
– lpr (setuid) with an option to delete the printed file– mkdir (first create the inode then change the owner)
it was possible to change the inode before the chown …– buffer overflows– password in memory or swap files– insecure protocols (FTP, SMTP)– missing sanity checks (syscalls, command in input, …)– short keys and passwords– proprietary protocols
System-Software WS 04/05378 © P. Reali / M. Corti
Bad design: A very recent example Texas Instruments produces RFID tags offering
cryptographic functionalities. used for cars and electronic payments 40 bit keys proprietary protocol Attack from Johns Hopkins University and RSA
Labs– less than 2 hours for 5 keys– less than 3500$
System-Software WS 04/05379 © P. Reali / M. Corti
Security: Buffer Overflows Overwrite a function’s return
address
function foo(int p1, int p2) { char array[10]; strcpy(array, someinput);}
array
FP
RET
p1 & p2
array
Avoid strcpy and check the length, e.g., strncpy
System-Software WS 04/05380 © P. Reali / M. Corti
Security: Monitoring check for suspicious patterns
– login times
audit logs periodic scans for security holes (bad passwords,
set-uid programs, changes to system programs)– system integrity checks (checksums for executable files)
[tripwire]
network services– monitor network activity
System-Software WS 04/05381 © P. Reali / M. Corti
Example: Firewalling Many applications use network sockets to
communicate (even on a single machine) Many applications are not protected
Solution: filter all the incoming connections by default and allow only the trusted ones
System-Software WS 04/05382 © P. Reali / M. Corti
Security: (some) Design Principles Open systems (programs and protocols) Default is deny access Check for current authority (timeouts, …) Give the least privilege possible Simple protection mechanisms Do not ask to much to the users (or they will avoid
to protect themselves)
System-Software WS 04/05383 © P. Reali / M. Corti
Security and Systems: Some Examples
Enhancements to memory management: Intel XD bit, AMD NX bit mark pages according to the content (data or code) an exception is generated if the PC is moved to a
data address prevents some buffer overflow attacks dynamically generated code has to be generated
through special system calls Windows XP SP2, Linux, BSD …
System-Software WS 04/05384 © P. Reali / M. Corti
Security and Systems: Some Examples
SELinux National Security Agency (USA) patches to the Linux kernel to enforce mandory
access control open source independent from the traditional UNIX roles (users
and groups) configurable policies restricting what a program is
able to do
System-Software WS 04/05385 © P. Reali / M. Corti
Security and Systems: Some Examples
OpenBSDaudit process (proactive bug search)random gaps in the stackProPolice: gcc puts a random integer on the
stack in a call prologue and checks it when returning
W^X: pages are writable xor executable
System-Software WS 04/05386 © P. Reali / M. Corti
Security and Systems: Some Examples
OpenBSDrandomized shared library order and
addressesmmap() and malloc() return randomized
addressesguard pages between objectsprivilege separation and revocation
System-Software WS 04/05387 © P. Reali / M. Corti
Privilege Separation unprivileged child process to contain and restrict
the effects of programming errors e.g., openssh
listen *22network connection
monitornetwork
processing
request authauth result
key exchange
authentication
fork unprivileged child
monitoruser requestprocessing
request PTYpass PTY user network data
state export
fork user child
time