ipc in microkernel systems, capabilities

30
http://d3s.mff.cuni.cz http://d3s.mff.cuni.cz/aosy Marn Děcký [email protected] IPC in Microkernel Systems, Capabilities IPC in Microkernel Systems, Capabilities

Upload: martin-decky

Post on 12-Apr-2017

78 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: IPC in Microkernel Systems, Capabilities

http://d3s.mff.cuni.czhttp://d3s.mff.cuni.cz/aosy

Martin Děcký

[email protected]

IPC in Microkernel Systems, Capabilities

IPC in Microkernel Systems, Capabilities

Page 2: IPC in Microkernel Systems, Capabilities

2Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Inter-Process CommunicationInter-Process Communication

Sharing data between processes (tasks)

Crossing the process isolation in a managed and predictable way

Technically, any means of sharing data can be considered IPC (e.g. files, networking, middleware)

In monolithic systems, this usually works without usinga dedicated IPC mechanism

Crucial for microkernel systems

In microkernel systems, even files and networking cannot be implemented without an IPC mechanism

Page 3: IPC in Microkernel Systems, Capabilities

3Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Classical IPCClassical IPC

POSIX signals

Since UNIX Version 4

Asynchronous notification sent to a process (thread)

Similar to level-triggered interrupts (including masking)

Sender uses the kill(2) syscallRun-time exceptions and state changes also cause signals (SIGFPE, SIGSEGV; SIGPIPE, SIGINT, SIGSTOP/SIGTSTP, SIGCONT, SIGTRAP)

Receiver thread is interrupted and a signal handler is executed (installed using signal(2) or sigaction(2))

Race conditions due to nested signals

Calling non-reentrant functions (e.g. malloc(), printf()) is undefined behavior

Interruption of some syscalls

Real-time signalsQueued, guaranteed sending order

Page 4: IPC in Microkernel Systems, Capabilities

4Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Classical IPC (2)Classical IPC (2)

Anonymous pipes

Named pipes

Persistent uni-directional pipes

Same API as files (anonymous pipes)

Pipe identification: File system i-node (bound to a directory entry)

No identification of senders on the receiver endWrites of data larger than PIPE_BUF bytes can be interleaved

Windows named pipes

Dedicated namespace (Named Pipe File System \\.\pipe\)

Non-persistent (removed when all clients close the pipe)

Anonymous pipes are named pipes with random names

Page 5: IPC in Microkernel Systems, Capabilities

5Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Classical IPC (3)Classical IPC (3)

UNIX domain sockets

Reliable bi-directional stream of bytes (akin to TCP), or

Unordered unreliable datagrams (akin to UDP), or

Reliable ordered stream of datagrams ...

... between local processes

Same API as BSD sockets

Socket identification: File system i-node (bound to a directory entry or to an abstract socket namespace)

Sending file descriptors (sendmsg(), rescvmsg())Rudimentary capabilities

Page 6: IPC in Microkernel Systems, Capabilities

6Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Classical IPC (4)Classical IPC (4)

Software shared memory

POSIX Shared Memory, System V Shared Memory

Persistent shared memory objects in dedicated namespaceIn Linux, objects created as tmpfs files (usually /dev/shm)

shm_open(3), mmap(2), munmap(2), shm_unlink(3) vs. shmget(2), shmat(2), shmdt(2)

Memory mapped files

Shared memory backed by a file (or anonymous memory)

mmap(2), munmap(2)

memfd_create(2)Removed when no longer referenced

File sealing

Page 7: IPC in Microkernel Systems, Capabilities

7Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Classical IPC (5)Classical IPC (5)

Message passing

Synchronous/asynchronous, blocking/non-blocking, symmetrical/asymmetrical/indirect addressing

POSIX message queues, System V Message Passing

Indirect addressing using a message queue (key for msgget(2), i-node for mq_open(3))

msgsnd(2), mq_send(3) asynchronous, non-blocking (unless the queue is full), msgrcv(2), mq_receive(3) blocking by default

Windows Messages

Symmetrical addressing using window/thread handles

SendMessage() synchronous, SendMessageCallback(), SendNotifyMessage(), PostMessage() asynchronous

GetMessage() blocking, PeekMessage() non-blocking

Page 8: IPC in Microkernel Systems, Capabilities

8Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Classical IPC (6)Classical IPC (6)

IPC abstractions

D-Bus

Single-node middleware replacing CORBA (GNOME) and DCOP (KDE)

Software bus abstraction (end-points communicating over a shared virtual channel)

System bus vs. session bus

End-points identified by a component string and unique connection (instance) name

Method calls and signals implemented on top of message passing– Synchronous one-to-one request-response (libdbus)

● UNIX domain sockets, TCP sockets, kdbus (abandoned), BUS1– Asynchronous publish/subscribe (dbus-daemon)

Page 9: IPC in Microkernel Systems, Capabilities

9Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Classical IPC (7)Classical IPC (7)

IPC abstractions

Doors

Synchronous remote procedure call

Originally implemented for Spring, later ported to Solaris

Request and reply buffer, request and reply list of file descriptors

Binder (OpenBinder)

Middleware and component framework, originally implemented for BeOS, now used by Android (similar to Microsoft COM)

Synchronous remote method invocationCustom kernel IPC mechanism

– Method invocation implemented as thread migration● BINDER_WRITE_READ ioctl with a request and reply buffer

– Object reference tracking

Windows Dynamic Data Exchange, Object Linking and Embedding, Component Object Model

Based on [Advanced] Local Procedure Call ([A]LPC)

Page 10: IPC in Microkernel Systems, Capabilities

10Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

CapabilitiesCapabilities

Capability

Object identifying an OS resource

Logical objects (open files, connections), typed memory areas (physical memory regions)

Capability reference

Local user space identification of a capability (file handles, virtual memory regions)

Operations with capabilities

Invoking a method with a capability reference

Permissible methods defined by the capability itself

Page 11: IPC in Microkernel Systems, Capabilities

11Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Trivial Capability ExampleTrivial Capability Example

kernel space

user space

read(0, ...);

0 1 2 3 file descriptor table(capabilities)

file descriptor(capability reference)

vfs_file_t operating system resource(open file)

Page 12: IPC in Microkernel Systems, Capabilities

12Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Trivial Capability Example (2)Trivial Capability Example (2)

kernel space

user space

struct msghdr msg;struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);// ...

memmove(CMSG_DATA(cmsg), &fd, sizeof(fd));sendmsg(socket, &msg, 0);

0 1 2 3

vfs_file_t

0 1 2 3

Page 13: IPC in Microkernel Systems, Capabilities

13Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Trivial Capability Example (2)Trivial Capability Example (2)

kernel space

user space

struct msghdr msg;struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);// ...

memmove(CMSG_DATA(cmsg), &fd, sizeof(fd));sendmsg(socket, &msg, 0);

0 1 2 3

vfs_file_t

struct msghdr msg;struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);// ...

recvmsg(socket, &msg, 0);

int fd;memmove(&fd, CMSG_DATA(cmsg), sizeof(fd));

0 1 2 3 4

Page 14: IPC in Microkernel Systems, Capabilities

14Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Capabilities as a Directed GraphCapabilities as a Directed Graph

kernel space

user space

00 01 11

cnodecap

cnode_t

cnodecap

untypedcap

cnode_t

untypedcap

endpointcap

pagecap

untypedcap cnode_t

mem_region_t

cspace

cref_t

Page 15: IPC in Microkernel Systems, Capabilities

15Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Capabilities as a Directed Graph (2)Capabilities as a Directed Graph (2)

Capability space (cspace)

Directed graph of capability nodes (cnodes) belonging to a task (defines the root cnode)

Capability nodes (cnodes)

Fixed array of capabilities (power-of-two sized)

Given number of bits from the capability reference identifies the capability slot

cnode capability points to another cnodeThe node contains a radix value to define how many bits should be skipped

Other capability types point to other kernel objects

Similar to hierarchical page tables

Page 16: IPC in Microkernel Systems, Capabilities

16Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Operations with CapabilitiesOperations with Capabilities

Cloning

Creating a new capability pointing to the same resource as the original capability

Minting

Creating a new capability with a subset of permissions to the original resource

Deriving

Creating a new capability based on a hierarchy of capability types

Untyped memory (representing a region of physical memory)Can be split into several untyped memory regions

Can be retyped to a different capability (thus creating a different type of kernel object)

Capability derivation tree

Maintaining sole authority on the cloned/minted/derived capabilities by the original owner of the capability

Page 17: IPC in Microkernel Systems, Capabilities

17Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

User Space-managed Capability SpaceUser Space-managed Capability Space

Microkernel provides capability mechanisms

User space is responsible for managing the cspace by deriving capabilities from the untyped memory capability and managing cnodes

untypedcap

untypedcap

untypedcap

untypedcap

untypedcap

untypedcap

untypedcap

cnodecap

TCBcap

L1 PTcap

L2 PTcap

Page 18: IPC in Microkernel Systems, Capabilities

18Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

User Space-managed Capability Space (2)User Space-managed Capability Space (2)

History of capability-based microkernels

HYDRA (Carnegie-Mellon, William Wulf, 1971)

Targeting early MIMD architecture (16 PDP-11 computing nodes)

GNOSIS, KeyKOS (Tymshare Inc., Key Logic, 1979)

IBM S/370, emulating VM, MVS and UNIX environments

EROS, CapROS, Coyotos (Johns Hopinks University, University of Pennsylvania, Jonathan Shapiro, 1991)

386, 486

seL4 (NICTA, General Dynamics, 2006)

Capability-based mechanisms inspired by Jonathan Shapiro

Page 19: IPC in Microkernel Systems, Capabilities

19Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Mach IPCMach IPC

Prototypical asynchronous message passing

Ports

Receive end-points and associated message queues

Port rights

Client capabilities for accessing a port (send, receive, send-once)Only a single server can have a receive right

Each task has an initial set of port rightsCommunicating with the kernel, etc.

Tagged message structure

Kernel enforces type correctness

Port rights can be also passed

Timeouts

Page 20: IPC in Microkernel Systems, Capabilities

20Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Mach IPC (2)Mach IPC (2)

Prototypical performance issues

50% IPC overhead compared to monolithic UNIX

With a single UNIX server

Complex kernel-side code

Tagged data type evaluation, handling of timeouts, etc.

Dynamic data structuresBut unfortunately only linked lists

Excessive cache footprint

Asynchronicity rarely used in practice

User space tasks (originally ported from UNIX) usually written for sequential performance and blocking I/O

Page 21: IPC in Microkernel Systems, Capabilities

21Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

The Era of Synchronous IPCThe Era of Synchronous IPC

L3 (1988), L4 (1993) by Jochen Liedke [1]

3% IPC overhead compared to monolithic UNIX

With a single UNIX server

Single IPC call overhead comparable to single syscall overhead in UNIX (approx. 20 times faster than on Mach)

Synchronous IPC

Explicit client/server rendez-vous and thread migrationNo need for full context switch (just address space switch)

No buffering, no scheduling, data passed directly in registers

Highly target-optimized implementationSmall working set, cache-friendly code

No complex algorithms (capabilities, etc.)

Page 22: IPC in Microkernel Systems, Capabilities

22Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

The Era of Synchronous IPC (2)The Era of Synchronous IPC (2)

[2]

Page 23: IPC in Microkernel Systems, Capabilities

23Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

The Era of Synchronous IPC (3)The Era of Synchronous IPC (3)

Drawbacks

The microkernel is unportable (by design)

The readability and maintainability of the IPC code is very poor

Preoccupation with single-threaded performance conflicted with other goals

Design issues of synchronous IPC

Unresponsive server blocks the client indefinitivelyInitial response: Synchronous IPC with timeouts

– In hindsight a bad solution [3]

Asynchronous communication on top of synchronous IPCAbstraction inversion anti-pattern (e.g. multithreading)

Paradoxically, on massive multicore machines the scalability suffers

Page 24: IPC in Microkernel Systems, Capabilities

24Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

The Return of Asynchronous IPCThe Return of Asynchronous IPC

The best of both worlds

Reasonably simple, cache-friendly, fast-path kernel code

Bounded kernel buffers (additional buffering possible on the client user space side)

Intelligent bookkeeping data structures (hash tables, trees)

Simple IPC message structure (only integer payload that fits into registers)

Additional semantics for memory copying and memory sharing

Possibility to build rich abstractions in user space

Actors, agents, continuations, futures, promises

Page 25: IPC in Microkernel Systems, Capabilities

25Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

HelenOS IPCHelenOS IPC

Basic design

Asynchronous messages over uni-directional connections

6-integer payload (1st integer interpreted as method ID)

Bounded kernel buffers

Every message needs to be answered (6-integer return value)

Connections identified by capabilities

The receiver can forward a message via its connection to a different receiver (even recursively)

The responsibility to reply is passed on

Kernel events and hardware interrupts generate IPC messages (no reply needed)

Initially, every task has just the connection to the naming service

Page 26: IPC in Microkernel Systems, Capabilities

26Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

HelenOS IPC (2)HelenOS IPC (2)

Kernel API

Global method IDs with special semantics

IPC_M_CONNECTION_CLONE (clone a connection capability from the client to the server)

IPC_M_CONNECT_TO_ME (establish a callback connection)

IPC_M_CONNECT_ME_TO (establish a new connection)When forwarded, the connection is established to the next receiver

– A broker (naming service, location service, device manager, VFS) connects a client to the target server

IPC_M_SHARE_IN / IPC_M_SHARE_OUT (receive/send a shared virtual address space area)

IPC_M_DATA_READ / IPC_M_DATA_WRITE (receive/send data)

IPC_M_STATE_CHANGE_AUTHORIZE (update a server state on behalf of a different client)

Three-way handshake

Page 27: IPC in Microkernel Systems, Capabilities

27Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

HelenOS IPC (3)HelenOS IPC (3)

Async framework

Writing single-threaded IPC code that can make use of the parallelism

User space-scheduler cooperative threads (fibrils)Preempted only when blocking on IPC calls

Abstracting the low-level IPC connections into sessions

Each session can have a different threading model

Grouping of atomic IPC messages into logical exchanges

Page 28: IPC in Microkernel Systems, Capabilities

28Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

HelenOS IPC (4)HelenOS IPC (4)

async_exch_t *ns_exch = async_exchange_begin(session_ns);

async_sess_t *sess =async_connect_me_to_iface(ns_exch, INTERFACE_VFS, SERVICE_VFS, 0);

async_exchange_end(ns_exch);

async_exch_t *exch = async_exchange_begin(sess);

ipc_call_t answer;aid_t req =

async_send_3(exch, VFS_IN_OPEN, lflags, oflags, 0, &answer);

async_data_write_start(exch, path, path_size);

async_exchange_end(exch);

// Do some other useful work in the meantime

sysarg_t rc;async_wait_for(req, &rc);

if (rc == EOK)fd = (int) IPC_GET_ARG1(answer);

Page 29: IPC in Microkernel Systems, Capabilities

29Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

Q&A

Page 30: IPC in Microkernel Systems, Capabilities

30Martin Děcký, Advanced Operating Systems, March 10th 2017 IPC, Capabilities

ReferencesReferences

[1] Härtig H., Hohmuth M., Liedtke J., Schönberg S., Wolter J.: The Performance of μ-Kernel-Based Systems, Proceedings of 16th ACM Symposium on Operating Systems Principles (SOSP), ACM, 1997 [2] Gernot Heiser: https://en.wikipedia.org/wiki/File:L4_family_tree.png[3] Heiser G., Elphinstone K.: L4 Microkernels: The Lessons from 20 Years of Research and Deployment, ACM Transactions on Computer Systems (TOCS), Volume 34, Issue 1, 2016