improving ipc by kernel design jochen liedtke

Improving IPC by Kernel DesignJochen Liedtke

Shane MatthewsPortland State University

3/12/2004 Portland State University

Summary

• Review• Performance improved

– Architecture Level– Algorithmic Level– Interface Level– Coding Level

Micro-kernels• Minimal OS, providing

a set of primitives used to implement thread/address space management and IPC [1]

• Everything else is moved to user-space (servers)

Terminology (L3)• Dataspace

– Memory object, mapped into address space• Task

– Composed of threads, dataspaces, and an address space• Message

– String/memory object

L3 Architecture & IPC• Active components communicate via

messages• Applies to:

– Device drivers• Implemented as user level tasks

– Hardware Interrupts• Interrupt message from micro-kernel to thread

L3 Redesign Principles• IPC performance is the master

– Security and performance must not be affected• Synergetic effects taken into consideration

– (Think combined effects)– May lead to reinforcement or diminution

• Design must aim at performance goal– Per short message transfer– 350 cycles (7 micro-seconds)

Architectural Level

• Messages• Process Structure• Control Blocks

Compound Messages• Multiple

send/receive -> 1 send/receive

• Messages consists of direct/indirect strings, and memory objects

Twofold message copy

• [A space] -> [kernel] -> [B space]

• O(20 + .75n) cycles, n:= bytes

• Good for small messages

• Need something better as n grows

LRPC and SRC RPC• Client/server share user level memory

– sender -> shared buffer• Problems

– When server to client is 1 to many, shared regions of address space become critical resources

– Shared regions require explicit opens (unlike L3)– Message change during/after checking

Direct Message Copy Via Windows

• L3's method– Destination mapped

into window– Message copied to

window

• Window– per address space– Accessed exclusivly

by kernel

Communication Windows• Problems

– Must be fast– Different threads

coxisting within address space

• L3 Implementation– One word page

directory B to A.

Process Structure• Threads running kernel mode have 1 kernel

stack per thread– Efficient since interupts, page faults, IPC,

already save state on kernel stack• Continuations

– Pro: • Reduce kernel stack

– Cons: • Require additional copies between kernel and

continutation• Interfere with other optimizations

Tread Control Blocks• Implemented as large array in kernel

– fast tcb access• Array base + tcb # + tcb size

– Saves TLB misses (IPC)• kernel stacks of sender and reciever located in TCB

– Locking done via unmapping on TCB

Algorithmic Level• Thread Identifier

• Lazy Scheduling

• Short Messages Via Registers

Thread Identifier

• Thread addressed by 64-bit UID in user-mode

• Thread number in lower 32-bits of UID– AND with bit mask, add to TCB’s array base

Lazy Scheduling

• IPC operation call or reply & receive next– Delete sending thread from ready queue– Insert into waiting queue– Delete receiving thread from waiting queue– Insert into ready queue

• Too many queue operations!

Lazy Scheduling cont.

• L3 queue invariants– Ready queue contains all ready threads– Waiting queue contains at least all threads

waiting• TCB contains threads state (ready/waiting)• Scheduler removes all threads not

belonging to queue during queue parsing

Short Messages Via Registers

• High proportion of messages are short– Ex. Driver ack/error, hardware interrupts

• 486– 7 general registers– 3 needed: sender ID, result code– 4 available

• 8-byte messages using coding scheme

Interface Level

• Simple RPC stubs– Load registers, system call, check success– Compiler generates stubs inline

• Parameter Passing– Use registers when possible

Coding Level

• Reduce cache and TLB misses– Short kernel code

• Short jumps, use registers, short address displacements

– IPC kernel code in one page– Handle save/restore of coprocessor lazily

• Delayed until different thread needs to use it

Results• 100% would indicate

double the time increase

• Removal of all increase IPC time by 134% for 8 byte message

Results• L3 VS Mach• System

– Intel 486 DX-50– 256 KB external

cache– 16 MB memory

Results cont.

Conclusions

• IPC improved by applying– Performance based reasoning– Synergetic effects– Architecture -> coding

References• [1]

http://en.wikipedia.org/wiki/Micro_kernel• [2] Improving IPC by Kernel Design -

Jochen Liedtke

improving ipc by kernel design jochen liedtke

kernel mode

kernel stackcons

kernel stackcontinuationspro

queue parsingshort messages

l3 queue invariantsready

queue operations

ready threadswaiting

tcb accessarray base

Documents

p2 2-jochen rode

liedtke, m.: die effektive lärmdosis (eld) - grundlagen...

dr. jochen winkler jw 127.03.2003 produktkataloge dr....

jochen bÖhler - znak

jochen klein

jochen schmithjochenschmith.com/portfolio jochen...

artist: jochen schlenker

ip-set · axiom sensis xp 16 contatti maschio. ipc 72733...

ipc training - processors.wiki.ti.com · ipc module •ipc...

software dinÂmico alzira maria liedtke becker cristiane...

ipc technologiebarometer jochen halfmann 10.11.2006 ipc...

inter-processor communication (ipc). agenda ipc overview ipc...

effective innovation don liedtke october 2005 svp, emerging...

improving ipc by kernel design jochen liedtke german...

jochen mikesz projects

© 2001 universität karlsruhe, system architecture group l4...

2013 presentation liedtke

sistema neuromuscular mda giane veiga liedtke. geração de...

fiasco kernel debugger manual - tu...

antoni liedtke katakumby rzymskie w świetle najnowszych...