slides’on’cross,domain’call’and’ …chase/cps510/slides/rpc.pdf · network communication...
TRANSCRIPT
This classic paper is a good example of a microbenchmarking study. It also explains the RPC abstraction and serves as a case study of the nuts-and-bolts of I/O, and related performance issues. Or is it “just hacking”?
Messaging: examples and variations
• Details vary! – Supercomputing: MPI over fast interconnect – High-level messages (e.g., HTTP) over sockets and
network communication – Microkernel / Mach / MacOS: high-speed local cross-
domain messaging ports. (Also Windows/NT) – Android: binder, and per-thread message queues
• Common abstraction: “Remote Procedure Call” – RPC for clients/serves talking over a network. – For local processes it is often called cross-domain
call or “Local Procedure Call” (LPC, in Windows).
Cross-domain call: the basics
A B
Request: block A, wakeup B. Reply: block B, wakeup A.
A: syscall to post a message to B (e.g., a message queue). Wait for reply.
B: syscalls to receive an incoming message. Wait for request.
Cross-domain call: the basics
A B
A: syscall to post a message to B (e.g., a message queue). Wait for reply.
B: syscalls to receive an incoming message. Wait for request.
Copy data from A to B, or use a shared memory region.
Transfer control through kernel: block A, wakeup B. Note: could use a socket, or fast IPC for processes on same host.
“Marshalling” (“serializing”)
A B
What if the data is a complex linked structure? Must “pack” it as a sequence of bytes into a message, and reconstitute it on the other side.
Concept: RPC Remote Procedure Call (RPC) is request/response interaction through a published API, using IPC messaging to cross an inter-process boundary.
API stubs
RPC is used in many standard Internet services. It is also the basis for component frameworks like DCOM, CORBA, and Android. Software is packaged into named “objects” or components. Components may publish interfaces and/or invoke published interfaces of other components. Components may execute in different processes and/or on different nodes.
generated from an Interface Description Language (IDL)
Establishing an RPC connection to a named remote interface is often called binding.
RPC Execution • In general, RPC enables
request/response exchanges (e.g., by messaging over a network) that “looks like” a local procedure call.
• In Android, RPC allows flexible interaction among apps running in different processes, across the kernel boundary.
• How is this different from a local procedure call?
• How is it different from a system call?
RPC: Language integration Stubs link with the client/server code to “hide” the boundary crossing.
– They “marshal” args/results – i.e., translate to/from some standard
network stream format – Also known as linearize, serialize
– …or “flatten”
– Propagate PL-level exceptions – Stubs are auto-generated from an
Interface Description Language (IDL) file by a stub compiler tool at software build time, and linked in.
– Client and server must agree on the protocol signatures in the IDL file.
Stubs • RPC stubs are procedures linked into the client and server.
– RPC stubs are similar to system call stubs, but they do more than just trap to the kernel.
– The RPC stubs construct/deconstruct a message transmitted through a messaging system.
– Binder is an example of such a messaging system, implemented as a Linux kernel plug-in module (a driver) and some user-space libraries.
• The stubs are generated by a tool that takes a description of the application’s RPC API written in an Interface Description Language. – Looks like any interface definition…
– List of method names and argument/result types and signatures. – Stub code marshals arguments into request message, marshals results
into a reply message.
Stubs and IDL
This picture illustrates the stub generation and build process for an RPC system based on the C language (e.g., ONC or Sun RPC, used in NFS).
Threads and RPC
[OpenGroup, late 1980s]
Q: How do we manage these “call threads”? A: Create them as needed, and keep idle threads in a thread pool. When an RPC call arrives, wake up an idle thread from the pool to handle it. On the client, the client thread blocks until the server thread returns a response.
Thread pool: idealized
Incoming request (event) queue
worker loop
Magic elastic worker pool Resize worker pool to match
incoming request load: create/destroy workers as needed.
dispatch
idle workers
Workers wait here for next request dispatch.
(Workers are threads.)
Handle one event,
blocking as necessary.
When handler is complete,
return to worker pool.
handler
handler
handler
Event/request queue
Incoming event queue
worker loop
Handle one event,
blocking as necessary.
When handler is complete,
return to worker pool.
We can synchronize an event queue with a monitor: a
mutex/CV pair. Protect the event queue data structure itself with the mutex.
dispatch
threads waiting on CV
Workers wait on the CV for next event if the event queue
is empty. Signal the CV when a new event arrives. This is a producer/consumer problem.
handler
handler
handler
Some details
• How is incoming data delivered to the correct process? • On the return, how does the Receiver know which thread
to wake up? • How does the wakeup happen? • What if a request/reply is dropped in the net? • What if a request/reply is duplicated? • How does the client find the server? (binding) • What if the server fails? • How to go faster if client/server are on the same host?
(“LRPC” or “LPC”)
Firefly vs. Web/HTTP etc.
• Firefly does not use TCP/IP. • Instead, it has a custom packet protocol. Tradeoffs? • But some of the basics of network communication are
similar/identical. • How is (say) HTTP different from RPC?
Networked services: big picture
Internet “cloud”
server hosts with server applica5ons
client applica5ons
NIC device
kernel network so9ware
client host
Data is sent on the network as messages called packets.
A simple, familiar example
“GET /images/fish.gif HTTP/1.1”
sd = socket(…); connect(sd, name); write(sd, request…); read(sd, reply…); close(sd);
s = socket(…); bind(s, name); sd = accept(s); read(sd, request…); write(sd, reply…); close(sd);
request
reply
client (initiator) server
End-to-end data transfer
transmit packet to network interface
move data from application to system buffer
TCP/IP protocol
compute checksum
network driver
sender
deposit packet in host memory
move data from system buffer to
application
TCP/IP protocol
compare checksum
network driver
receiver
DMA + interrupt DMA + interrupt
buffer queues (mbufs, skbufs)
buffer queues
packet queues packet queues
Ports and packet demultiplexing Data is sent on the network in messages called packets addressed to a des<na<on node and port. Kernel network stack demul5plexes incoming network traffic: choose process/socket to receive it based on des<na<on port.
Network adapter hardware aka, network interface controller (“NIC”)
Incoming network packets
Apps with open sockets
Wakeup from interrupt handler
sleep
ready queue
interrupt
trap or fault return to user mode
wakeup
sleep queue
switch
Example 1: NIC interrupt wakes thread to receive incoming packets. Example 2: disk interrupt wakes thread when disk I/O completes. Example 3: clock interrupt wakes thread aQer N ms have elapsed. Note: it isn’t actually the interrupt itself that wakes the thread, but the interrupt handler (soQware). The awakened thread must have registered for the wakeup before sleeping (e.g., by placing its TCB on some sleep queue for the event).
Process, kernel, and syscalls
trap
read() {…}
write() {…}
copyout copyin
user buffers
kernel
process user space
read() {…}
syscall dispatch table
I/O descriptor table
syscall stub
Return to user mode
I/O objects
Optimize for the common case
Performance of Firefly RPC Michaels Schroeder and Burrows
The slower path through the opera<ng-‐system address space is used when the interrupt rou<ne cannot find the appropriate RPC thread in the call table, when it encounters a lock conflict in the call table, or when it handles a non-‐RPC packet.
Several of the structural features used to improve RPC performance collapse layers of abstrac<on. Programming a fast RPC is not for the squeamish.
Schroeder and Burrows suggest that tripling CPU speed would reduce SRC RPC latency for a small packet by about 50%, on the expecta<on that the 83% of the <me not spent on the wire will decrease by a factor of 3. Looking at Table 3, however, we see that much of the RPC <me goes to func<ons that may not benefit propor<onally from modern architectures. ……The only real ‘computa<on” in RPC, in the tradi<onal sense, is the checksum processing, and this in fact is memory-‐intensive and not compute-‐intensive; each checksum addi<on is paired with a load …. Thus, Ousterhout found in the Sprite opera<ng system [Ousterhout et al. 88] that kernel-‐to-‐kernel null RPC <me was reduced by only half when moving from a Sun-‐3/75 to a SPARCsta<on-‐l, even though integer performance increased by a factor of five [Ousterhout 90a].
Android: object-based RPC channels
JVM+lib
Linux kernel
Activity Manager Service
etc.
Android services and libraries communicate by sending messages through shared-memory channels set up by binder.
JVM+lib
Android binder
A client binds to a service.
Bindings are reference-counted.
Services register to
advertise for clients.
an add-on kernel driver for /dev/binder object RPC
Binder is a add-on driver module that runs in the kernel. Unix drivers can define arbitrary “I/O control” APIs invoked through the ioctl system call. The ioctl syscall was designed for device control, but it serves as a general mechanism to extend the kernel and syscall interface (“kitchen sink”).
Kernel space
Binder: thread pool details “The system maintains a pool of transaction threads in each process that it runs in. These threads are used to dispatch all IPCs coming in from other processes. For example, when an IPC is made from process A to process B, the calling thread in A blocks in transact() as it sends the transaction to process B. The next available pool thread in B receives the incoming transaction, calls Binder.onTransact() on the target object, and replies with the result Parcel. Upon receiving its result, the thread in process A returns to allow its execution to continue. …”
[http://developer.android.com/reference/android/os/IBinder.html] Note: in this setting, a “transaction” is just an RPC request/response exchange.