performance evaluation of linux can-related system calls · internals of the linux networking...

Performance evaluation of LinuxCAN-related system calls

M. Sojka, P. PısaCzech Technical University in Prague

October 29, 2014Version 1.0

Contents

1 Introduction 5

2 Linux networking stack 6

3 Methodology 83.1 Virtual CAN interface method . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Gateway-based method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.1 Test bed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.2 Gateway latency measurements . . . . . . . . . . . . . . . . . . . . 103.2.3 Traffic generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2.4 Gateway implementations . . . . . . . . . . . . . . . . . . . . . . . 11

4 Results 144.1 Advantages of *mmsg() system calls . . . . . . . . . . . . . . . . . . . . . 144.2 Comparison of gateway implementations . . . . . . . . . . . . . . . . . . . 144.3 Busy polling of CAN sockets . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Conclusion 20

Document history

Version Date Description

0 2014-01-14 First draft

1 2014-10-29 Final version

Abstract

Linux kernel contains a full featured CAN bus networking subsystem. It can be accessedfrom applications via several different interfaces. This paper compares the performanceof those interfaces and tries to answer the question, which interface is most suitablefor capturing traffic from a big number of CAN buses. Motivation for this work is thedevelopment of various CAN traffic analyzers and intrusion detection systems. Besidestraditional UNIX interfaces we also investigate the applicability of recently introduced“low-latency sockets” to the Linux CAN subsystem. Although the overhead of Linux ingeneral is quite large, some interfaces offer significantly better performance than others.

1 Introduction

Controller Area Network (CAN) is still the most widespread automotive networkingstandard today, even in the most recent vehicle designs. Although there are more modernsolutions available on the market [1, 2], CAN represents reliable, cheap and provensolution, making it the preferred choice in the industry. Although the newer technologiessuch as FlexRay or Ethernet are used more and more, it seems unlikely that CAN isgoing to be phased out in foreseeable future.

Linux kernel contains a full featured CAN bus networking subsystem. Together withcan-utils package, it represents versatile and user friendly way for working with CANnetworks. Linux CAN subsystem is based on Linux networking subsystem and shares alot of code with it. This is a source of significant overhead [3], because the networkingsubsystem is tuned for high throughput Ethernet networks with big packets and not forup to eight byte long CAN frames. For example, one sk buff structure that representsone CAN frame in the Linux kernel has size 448 bytes on our system. Nevertheless, dueto user friendliness and widespread availability of Linux, its CAN subsystem is used alot.

Recently, after publishing information about CAN bus-related attacks on automotiveelectronic control units [4, 5], automobile manufacturers started taking cybernetic se-curity more seriously. Not only that people started proposing message authenticationtechniques for CAN bus [6, 7, 8], but various other techniques to increase automobilesecurity are discussed. One possibility is to use intrusion detection systems (IDS) thatanalyze CAN bus traffic and try to detect anomalies. Before developing an IDS oneneeds a way to efficiently receive CAN frames from the whole car so that IDS can pro-cess them. It is natural to use Linux for prototyping the IDS and the goal of this paperis to evaluate which Linux system calls have the lowest overhead and are thus suitablefor logging of big number of CAN interfaces.

The contribution of this paper is manifold. First, we present comprehensive perfor-mance evaluation of multiple Linux system calls that allow an application to interactwith CAN networks. Second, we investigate the applicability of recently introduced“low-latency sockets” to the Linux CAN subsystem. Third, we compare the overheadof the whole Linux CAN subsystem with the raw hardware performance represented bylightweight RTEMS executive.

This paper is structured as follows. Section 2 provides a brief introduction to theinternals of the Linux networking stack. We describe our methodology for system callevaluation in Section 3, followed by results and discussion in Section 4. The paper isconcluded in Section 5.

2 Linux networking stack

The architecture of the Linux networking stack is depicted in Figure 2.1. We describethe main components of the stack, which are necessary for understanding the remainderof this paper.

First, we describe the transmission path (green arrows on the left). Application initi-ates transmission process by invoking send() or write() system call on a socket. Bothcalls end up in the socket layer. In case of CAN raw sockets this is raw sendmsg()

function in can/raw.c. This function allocates an sk buff structure, copies the CANframe to it and passes the result to the lower layer. Protocol layer (af can.c) immediatelyforwards the received sk buff to the queuing discipline. If the device driver can accepta CAN frame for transmission, queueing discipline just passes the frame to the driver,which instructs the CAN controller (hardware) to transmit the frame. Then, send()returns. If, on the other hand, the device is busy, queuing discipline stores the frameand send() returns at that time. Later, when the device is ready, the devices driverasks the queueing discipline to supply a frame for transmission.

The reception path is shown with red arrows on the right. When a CAN frame isreceived by the CAN controller, it signals to the CPU an interrupt request (IRQ), whichis handled by the device driver in the interrupt service routine (ISR). The ISR runs in theso called hard-IRQ context of the Linux kernel. It checks the reason of the interrupt andin case of receiving a new frame it asks Linux to schedule a so called NETRX soft-IRQ atsome time later. In normal situation, soft-IRQ is executed just after exiting the ISR. Itcalls device driver’s poll() function, which reads the CAN frames out of the hardware,stores them into sk buffs and passes them to the protocol layer. Protocol layer clonesthe received sk buf in order to deliver it to every socket interested in receiving this

CAN controller (HW)

Queuingdiscipline

Protocol layer (af_can.c)

Socket layer (can/raw.c, af_packet.c)

Dev

ice

driv

er

ISR

Poll

Application (user space)

Ker

nel

Figure 2.1: Linux networking stack architecture. Solid green arrows represent transmitdata flow, red ones receive data flow. Dotted arrows represent notifications.

Contents

frame. Then, the cloned sk buff is passed to the socket layer for storing the framein the socket queue. When an application invokes read() or recv() system calls, itreceives the frame from the socket queue.

7

3 Methodology

We developed two methods for system call performance evaluation. One serves mainlyfor comparison of system call overhead and does not require any special hardware setup.The second method takes into account all components involved in communication suchas device drivers, protocol layer and system calls, but requires complex hardware setup.

3.1 Virtual CAN interface method

This method can be used for performance comparison of read() system call withrecvmmsg() and of write() with sendmmsg(). The difference between those systemcalls is that read() and write() operate on a single message (CAN frame) whereas theirmmsg counterparts can operate on multiple messages (hence the name). This method re-quires the use of virtual CAN interface (vcan). Therefore, it evaluates only the overheadof the system calls and the socket and protocol layers. It does not take into account theoverhead of device drivers and queuing disciplines.

The method works as depicted in Figure 3.1. Two raw CAN sockets are created, one fortransmission and one for reception. Both are bound to the same virtual CAN interface.For the reception socket we set the maximum size of the reception socket queue so that itcan hold all frames that we plan to send. This is accomplished by calling setsockopt()

with SO RCVBUFFORCE parameter. Then we send the planned number of CAN frames tothe transmission socket and measure the time of this operation. Every send operationinvolves copying the CAN frame from the application to the kernel, creating sk buff

structure representing the sent frame and storing it in the reception queue of the receivingsocket (green arrow in the figure). The latter step is performed by the af can.c when itdetects that the socket is bound the virtual CAN interface. Once all frames are sent, westart reading them out of the reception socket. We measure how long does the readingtake (red arrow).

The source code of the application implementing this method is available in our repos-

Application

Kernel

RX socket

RX queue

TX socket

vcan

af_can.c, raw.c

write/sendread/recv

Figure 3.1: Virtual CAN method.

3.2 Gateway-based method

Figure 3.2: Testbed configuration.

itory1.


This method evaluates the CAN system call performance by creating a Linux-basedCAN gateway and measuring how long does it take for the gateway to route a framefrom one CAN bus to another. We use the same hardware for the gateway but changethe software that implements it. This method evaluates the performance of all networkstack layers i.e. device driver, queuing discipline, protocol layer and sockets.

3.2.1 Test bed

The test bed used for this method is depicted in Figure 3.2. There is a PC with KvaserPCI quad-CAN SJA1000-based card and the CAN gateway running on a MPC5200(PowerPC) system. The PC and the gateway (GW) are connected via a dedicatedEthernet network (“crossed” cable) which is used only for booting the GW over network.This is not used when running the experiments. The gateway is implemented either inLinux or in RTEMS. In case of Linux, only the gateway application and the Linux kernelare running in the system. No other user processes are run.

The PC is an old Pentium 4 box running at 2.4 GHz with hyper-threading, 2 GBRAM. The second Ethernet interface is connected to the local LAN. This interface isalso disabled while running the experiments. The application that runs on the PCgenerates the test traffic and measure the gateway latency (as detailed below). It hasassigned real-time priority (SCHED FIFO) and all its memory is locked i.e. it cannotbe swapped out to the disk.

1https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/blob/HEAD:/recvmmsg/can_recvmmsg.c

9

https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/blob/HEAD:/recvmmsg/can_recvmmsg.c


CAN bus 0

CAN bus 1

time

msg 1

Duration

msg 1'

CAN gateway(Linux)

GW latency

RX timestamp 1 RX timestamp 2

Total latency

Figure 3.3: Definition of latency.

3.2.2 Gateway latency measurements

The PC sends frames from the can0 interface and receives them on the can1 interface aswell as on can2, after passing through the gateway. We measure the latency of the frame,which is the time interval between receptions of the frame on can1 and can2. The time offrame reception is determined from the timestamps taken by the kernel at interrupt timeand is accessed from the user-space via SO TIMESTAMPNS socket option. Figure 3.3 showsthe measured latency in the time chart. The final latency of the gateway is calculated asthe difference of the two timestamps decreased by the duration (including bit stuffing)of the frame on the bus.

Since the frames are generated and received in the same computer, the TX and RXtimestamps are measured with the same clock (TSC/HPET) and thus the measuredlatencies are very accurate. We evaluated the accuracy of this method in [9] by comparingour results with an independent measurement with a CAN analyzer. The worst case errorof our method was 10µs and this error occurred in less than 0.1% of measurements. Sincethen, we improved the accuracy even more by disabling all unrelated Ethernet trafficduring experiments.

Whenever a frame is received on interface can2, it is necessary to find the othertimestamp associated with that frame in order to calculate the frame latency. Thisprocedure must be immune to frame losses and changes in the reception order. Thereforewe use the first two bytes of every frame to store a unique number which allows the frameto be tracked.

3.2.3 Traffic generator

In our experiments, we generated the traffic by periodically sending CAN frames. Theperiod differed depending on the experiment, but in general it was between 120 and500µs. We used the frames in standard (SFF) format with two bytes of data and theCAN bus bit rate was 1 Mbps. The average length of such frames is 65 bits, whichcorresponds to 65µs of transmission time (i.e. “duration” in Fig. 3.3).

10


3.2.4 Gateway implementations

We used several implementations of the gateway in order to compare various Linuxsystem calls. In addition to the user space gateway, we compare our results with thegateway implemented in the kernel space and with a non-Linux gateway implementedin RTEMS executive. All implementations are described bellow and summarized inTable 3.1.

rtems This gateway is not based on Linux. Instead, it uses RTEMS [10] executive, whichis more lightweight than Linux. We believe that the performance of this gatewayis very close to raw hardware performance. We include it in our experiments tosee the overhead of Linux in general. The sources of the gateway are available inour repository2.

kernel The gateway that is a part of the Linux kernel3 configured with the followingcommand:

cangw -A -s can0 -d can1

read-write The simplest possible user space gateway. It uses read() system call toreceive CAN frames and write() system call to forward them to another CANinterface.

readnb-write The same as read-write, but with the O NONBLOCK flag set for the receivingsocket. This flag causes the socket to be non-blocking, which means that when-ever there is no frame in the socket queue, the call returns immediately with anerror instead of blocking the caller until a frame is received. This means that theapplication busy-waits in a loop around the read() in the user space.

readbusy-write The same as read-write, but with SO BUSY POLL (a.k.a. low-latencysockets) option set for the receiving socket. This option was added to the Linuxkernel recently (in version 3.11) and its effect is similar to the readnb-write, exceptthat the busy-waiting happens transparently in the kernel. In order for this towork, it was necessary to patch the Linux kernel. For more details see the discussionin Section 4.3. The timeout for polling was set to 300µs.

mmap-mmap This gateway uses PF PACKET sockets which store or read packets (CANframes) to/from a ring buffer memory that is shared between the kernel and theuser space (via mmap() system call). With these sockets, extra copying of theframes between the kernel and the application is avoided and the application canreceive or send multiple packets in one system call or, in case of reception, withoutcalling the system at all.

Whenever the kernel receives a frame, it stores its content together with somemetadata in the ring buffer and the user space application can immediately ac-cess it without any system call. If there is no frame available in the ring buffer,

2https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/tree/HEAD:/rtems/gw/cangw3https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/can/gw.c

11

https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/tree/HEAD:/rtems/gw/cangw

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/can/gw.c


Implementa

tion

Environment

Receiveside

Send

side

Sock

etty

pe

System

call

Busy

wait-

ing

NotesSock

etty

pe

System

call

rtem

sR

TE

MS

——

No

——

kern

elK

ern

elsp

ace

——

No

——

read

-wri

teU

ser

space

AF

CA

Nre

ad()

No

AF

CA

Nw

rite

()re

ad

nb

-wri

teU

ser

space

AF

CA

Nre

ad()

Yes

nb

AF

CA

Nw

rite

()re

ad

bu

sy-w

rite

Use

rsp

ace

AF

CA

Nre

ad()

Yes

bp

AF

CA

Nw

rite

()m

map

-mm

ap

Use

rsp

ace

AF

PA

CK

ET

pol

l()

No

AF

PA

CK

ET

sen

d()

mm

ap

-wri

teU

ser

space

AF

PA

CK

ET

pol

l()

No

AF

CA

Nw

rite

()m

map

bu

sy-m

map

Use

rsp

ace

AF

PA

CK

ET

—Y

esA

FP

AC

KE

Tse

nd

()m

map

bu

sy-w

rite

Use

rsp

ace

AF

PA

CK

ET

—Y

esA

FC

AN

wri

te()

read

nb

-mm

apU

ser

space

AF

CA

Nre

ad()

Yes

nb

AF

PA

CK

ET

sen

d()

read

bu

syn

oirq

-wri

teU

ser

space

AF

CA

Nre

ad()

Yes

bp

,i

AF

CA

Nw

rite

()m

msg

-mm

sgU

ser

space

AF

CA

Nre

cvm

msg

()N

oA

FC

AN

sen

dm

msg

()

Tab

le3.

1:S

um

mary

ofgat

eway

imp

lem

enta

tion

s.N

otes

:nb

–so

cket

wit

hONONBLOCK

opti

onen

able

d,bp

–so

cket

wit

hSOBUSYPOLL

op

tion

set,

i–

intr

usi

ve

chan

ges

tod

evic

esd

rive

rsan

dn

etw

ork

stac

kn

eed

ed.

12


the application can call the poll() system call to wait for some frame to arrive.Transmission happens similarly. The application writes one or more frames intothe ring buffer and calls send() on the socket. The kernel then sends all the framesat once. Details of this type of socket are provided in [11].

Our mmap-mmap gateway works as follows. It reads frames from the receptionring buffer and copies them to the transmission ring buffer. Once the receptionring buffer is empty or when there is no space in the transmission ring buffer, thesend() system call is called and the process starts anew. Then, if the receptionring buffer is still empty, we call poll() to wait for a new frame.

mmap-write The same as mmap-mmap, but frame transmission is carried out with thewrite() system call on a raw CAN socket instead of the PF PACKET socket.

mmapbusy-mmap The same as mmap-mmap, but when the reception ring-buffer isempty, it does not call poll() but busy waits in a loop until a frame appears inthe ring buffer.

mmapbusy-write Combination of mmapbusy-mmap and mmap-write.

readnb-mmap Non-blocking read for reception together with the mmap() PF PACKET

socket for transmission. send() is called only when read() returns that there areno more frames.

readbusynoirq-write The same as readbusy-write but the kernel was patched to disablereceive interrupts during busy waiting. See Section 4.3 for details.

mmsg-mmsg This gateway use a combination of recvmmsg() and sendmmsg() to per-form its job. Similarly to mmap-mmap, it can handle multiple CAN frames withone system call.

All Linux-based gateways used kernel 3.12.3 patched with board support patches forour MPC5200-based board4. The kernel configuration is available in our repository5.The source code for all above mentioned user space gateways is also in our repository6.It is a single program and different variants of the gateway are selected with commandline switches.

4https://rtime.felk.cvut.cz/gitweb/shark/linux.git/shortlog/refs/heads/shark/3.12.x5https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/blob/HEAD:/kernel/build/shark/3.

12/.config6https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/blob/HEAD:/ugw/ugw.c

13

https://rtime.felk.cvut.cz/gitweb/shark/linux.git/shortlog/refs/heads/shark/3.12.x

https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/blob/HEAD:/kernel/build/shark/3.12/.config

https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/blob/HEAD:/kernel/build/shark/3.12/.config

https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/blob/HEAD:/ugw/ugw.c

4 Results

In this section we present and discuss the results obtained from the experiments describedabove.

4.1 Advantages of *mmsg() system calls

We applied the method described in Section 3.1 to a varying number of frames andplotted the measured times to the graph in Figure 4.1.

The left graph shows the performance of a PC, the right graph of the MPC5200-based(embedded) system. On the PC, reading 50 thousand of frames with read() takes 15 ms,whereas using a single recvmmsg() call takes about 14 ms. Sending the same number offrames with write() takes 40 ms and with sendmmsg() 35 ms. For all system calls thetime depends almost linearly on the number of frames. The recvmmsg() system call is7% faster than read(), sendmmsg() is 12% faster than write().

On MPC5200, the situation is similar but the difference is bigger. recvmmsg() is 19%faster and sendmmsg() even 35% faster.

Note that this method measures only a fraction of full RX and TX paths and especiallythat the measured RX latency comprises only the very last step of the full RX path. SeeFigures 2.1 and 3.1 to compare the difference between measured and full paths.

Furthermore, we looked whether the *mmsg() system calls can have higher overheadthan read()/write() for a small number of frames. The result is that *mmsg() systemcalls are always slightly faster even for a single frame.

4.2 Comparison of gateway implementations

We measured the latency of the gateway as described in Section 3.2. In each experimentwe sent 3 200 frames. The measured latency represents the time needed by the gatewayfor reception and transmission of a single frame. In the graphs below, we show mediansof the measured latencies.

The results for low CAN traffic, i.e. a new frame was only sent to the gateway afterthe previous one was received from it, are depicted in Figure 4.2.

The lowest latency of 15µs is achieved with the RTEMS-based gateway. The secondlowest value (49µs) belongs to the kernel-based gateway. These two values are providedonly for comparison and the properties of these gateways are outside the scope of thispaper.

Classical read-write gateway performs rather badly with its 180µs, similarly as allimplementations that use some form of blocking, i.e. mmap-write, mmap-mmap and


(a) PC (x86 64)

0

2

4

6

8

10

12

14

16

0 5 10 15 20 25 30 35 40 45 50 0

5

10

15

20

25

30

35

40T

ime

[ms]

Frames [×1000]

readrecvmmsgwritesendmmsg

(b) MPC5200

0 50

100 150 200 250 300 350 400 450

0 5 10 15 20 25 30 35 40 45 50 0

500

1000

1500

2000

2500

3000

Tim

e [m

s]

Frames [×1000]

readrecvmmsgwritesendmmsg

Figure 4.1: Comparison of read()/write() and recvmmsg()/sendmmsg() performanceon a virtual CAN (vcan) interface. The straight lines show linear fit of themeasured data. Note that reception times correspond to the left axis andsend times to the right one.

mmsg-mmsg. A slightly better latency around 100µs can be achieved if some form ofbusy waiting is employed.

The simplest form of busy waiting is to use non-blocking sockets as in the readnb-writeimplementation. The increased performance is caused by the fact that the applicationthread never blocks and when a new frame arrives, there is no overhead of switching thecontext from the idle thread back to the application and perhaps also for waking theCPU from some power saving mode.

Other busy waiting implementations perform similarly. The readbusy-write implemen-tation (a.k.a. low-latency sockets) is discussed separately in Section 4.3.

It was determined that having ftrace compiled in the kernel (but disabled at runtime)caused about 20% performance degradation. Therefore, ftrace was not compiled intothe kernels used in this paper.

The situation is different for heavier traffic (see Figure 4.3) when frames were sentevery 120µs. All write()-based implementations have very long latencies (caused byqueueing delays in overflowed socket queues) and also loose up to one third of frames.Implementations that use mmap() or sendmmsg() perform much better and when mmsg-or mmap-based operations are used for both reception and transmission no packet is lost.Surprisingly, no frame is lost even for readnb-mmap gateway.

The reason for good performance of mmsg- or mmap-based operations is that it is notrequired to perform a system call for every frame. Receiving side can be implementedcompletely without system calls as in mmapbusy-* implementations. Transmission siderequires a system call, but one send*() can (and in our implementations does) transmitseveral frames.

To evaluate how various system calls scale to higher bandwidth, we measured how

15


0

50

100

150

200

250

rtems

kernelm

mapbusy-m

map

mm

apbusy-write

readnb-write

readbusy-write

readbusynoirq-write

readnb-mm

apread-w

ritem

map-w

ritem

map-m

map

mm

sg-mm

sg

GW

late

ncy

[µs]

Figure 4.2: Latencies of various gateway implementations on MPC5200. A new framewas sent to the gateway only after receiving the previous one from the gate-way. Bars show median, error bars span from the minimum value to 90th

percentile.

0

5000

10000

15000

20000

25000

rtemskernel

mmapbusy-mmap

mmapbusy-write

readnb-write

readbusy-write

readbusynoirq-write

readnb-mmap

read-mmap

read-write

mmap-write

mmap-mmap

mmsg-mmsg

0 5 10 15 20 25 30 35 40

GW

late

ncy

[µs]

Los

t fra

mes

[%

]

GW implementation

LatencyLost frames

Figure 4.3: Latencies of various gateway implementations on MPC5200 when frames aresent with the period of 120µs. Bars show median, error bars span from theminimum value to 90th percentile. The exact height of small bars can beseen at the bottom of Figure 4.4. The high latencies (e.g. 23 ms) are causedby queueing delays when the gateway was overloaded.

16


20

50 100 200

500 1000 2000

5000 10000 20000

120 125 130 135 140 145 150 155 160

GW

late

ncy

[µs]

± p

acke

t los

s

Frame period [µs]

kernelmmsg-mmsgmmap-mmapmmapbusy-mmap

mmap-writemmapbusy-writereadnb-mmapreadbusynoirq-write

read-writereadbusy-writereadnb-write

Figure 4.4: Latency and packet loss depending on the frame rate. Measured onMPC5200. The latency is the median of 3 200 measurements. Packet loss isdrawn as error bars. For the sake of readability, error bars are magnified tentimes.

does the gateway latencies and lost frames depend on the frame rate. The result is inFigure 4.4.

As it was already discussed, read*-write implementations cannot sustain higher framerates (lower frame periods). The gateway throughput is saturated for periods between150 and 155µs. Only kernel-based gateway and mmap/mmsg-based gateways can sustainthe higher frame rates.

Both mmap*-write gateways drop off around 127µs. By comparing mmap-write withreadnb-mmap, one can conclude that write() has a bigger overhead than read(). Thismatches with the results in Section 4.1, where the write is 3 to 6 times longer than read.

Gateways mmap*-mmap and mmsg-mmsg perform almost exactly the same. They allbenefit from the fact that they can handle multiple frames per system call. Interestinglyreadnb-mmap, survives the highest frame rate1 even though it needs to use one systemcall for each received frame. Probably frame coalescing on transmission side is sufficienthere.

Although the kernel gateway is not evaluated in this paper, it is interesting to see theincrease in latencies around period of 135µs. This is caused by a TX interrupt on theoutput CAN interface (see RX timestamp 2 in Figure 3.3) that happens to interrupt theprocessing of the next received frame on the input CAN interface.

One may also wonder why the mmapbusy-write curve jumps up and down around

1Our test bed cannot generate higher frame rates reliably.

17

4.3 Busy polling of CAN sockets

130µs. We do not have the exact explanation, but in general, close to the point wherethe gateway throughput get saturated the measurements are quite unstable. The reasonis that at this point the main factor that determines the measured latency changesfrom processing delay to queuing delay. We tried to find a high enough number ofmeasurements to average out this instability, but we have to make a compromise betweenthe number of measurements and the duration of the experiment.


Busy polling, previously known as low-latency sockets, is a feature that was addedto Linux recently, in version 3.11. It was designed for latency sensitive applicationscommunicating over Ethernet. The question that we try to answer in this section ishow beneficial is this feature for CAN bus networking. It is shown, that for most CANapplications, there are no big advantages of using this feature.

When busy polling is enabled for a socket (e.g. with setsockopt(SO BUSY POLL)) thenthe behavior of read() or poll() system calls is changed if the socket receive queueis empty. Instead of blocking the caller, the device driver is invoked directly from theapplication context to poll for new packets (without busy polling enabled, such polling isonly invoked from hard or soft IRQ context). This happens in a busy loop until either apacket is received or a specified timeout elapses. When a new packet arrives, it is pushedto the network stack (protocol layer in Figure 2.1) as in the standard situation and theapplication finds new data in the socket queue when leaving the busy loop. Deviceindependent functionality of busy poll is implemented in include/net/busy poll.h2.

For CAN networking, busy polling does not work out of the box. It is necessary toadd support for it to the network stack as well as to device drivers. We implementedsupport for CAN raw sockets3 and for the mscan device driver4.

As can be seen from Figure 4.2, the performance of readbusy-write implementation isnot very different from other non-blocking implementations, namely from readnb-write,which also uses busy polling but in the user space instead of in the kernel. The reason isthat when the application busy-polls for the frames in the read() call, receive IRQs arenot disabled and the system still needs to handle an IRQ when a frame arrives. Soft-IRQs are disabled when busy-polling but since every MSCAN interrupt service routine(ISR) schedules a soft-IRQ, the soft-IRQ is always invoked after the busy-polling exitsand a frame was received. Usually, this soft-IRQ has no work to do because all work wasalready performed during the busy-polling. The difference in the sequence of operationsduring blocking and busy-polling read is depicted in Figure 4.5 a) and b).

Since it is not necessary to handle receive interrupts while polling, we were interestedin whether eliminating the interrupt overhead brings significant advantage. We imple-

2http://lxr.free-electrons.com/source/include/net/busy_poll.h?v=3.123https://rtime.felk.cvut.cz/gitweb/shark/linux.git/commitdiff/154a6ad59e506835b6b812a3128f16e5786291e5?

hp=f6d51c347e2b35680619906000622401bf5f6cc54https://rtime.felk.cvut.cz/gitweb/shark/linux.git/commitdiff/d40550253b8ea0434100b496528b35759eff07e5?

hp=154a6ad59e506835b6b812a3128f16e5786291e5

18

http://lxr.free-electrons.com/source/include/net/busy_poll.h?v=3.12

https://rtime.felk.cvut.cz/gitweb/shark/linux.git/commitdiff/154a6ad59e506835b6b812a3128f16e5786291e5?hp=f6d51c347e2b35680619906000622401bf5f6cc5

https://rtime.felk.cvut.cz/gitweb/shark/linux.git/commitdiff/154a6ad59e506835b6b812a3128f16e5786291e5?hp=f6d51c347e2b35680619906000622401bf5f6cc5

https://rtime.felk.cvut.cz/gitweb/shark/linux.git/commitdiff/d40550253b8ea0434100b496528b35759eff07e5?hp=154a6ad59e506835b6b812a3128f16e5786291e5

https://rtime.felk.cvut.cz/gitweb/shark/linux.git/commitdiff/d40550253b8ea0434100b496528b35759eff07e5?hp=154a6ad59e506835b6b812a3128f16e5786291e5


read() {

block()

:

hard IRQ {}

soft IRQ {

poll_for_frames() {

push_to_stack() {

wake_up()

}

}

}

copy_to_user()

}

(a) Blocking read

read() {

poll_for_frames()

poll_for_frames()

poll_for_frames()

hard IRQ {}

poll_for_frames() {

push_to_stack()

}

soft IRQ {

poll_for_frames()

}

copy_to_user()

}

(b) Read with busy-poll

read() {

irq_disable()

poll_for_frames()

poll_for_frames()

poll_for_frames()

poll_for_frames() {

push_to_stack()

}

irq_enable()

copy_to_user()

}

(c) Read with busy-poll, no IRQ

Figure 4.5: Comparison of blocking and busy-poll reception path.

mented a patch5 that disables hard-IRQs during polling. The results on the gatewaywith this patch applied are named readbusynoirq-write. Detailed examination of Fig-ure 4.2 reveals that the latency is 7µs smaller when compared to readbusy-write, whichis not a significant improvement. Also in Figure 4.4, although the readbusynoirq-writeline drops off at the highest frame rate (150µs) among all read*-write implementations,the difference is not significant.

The reason why busy-polling sockets are useful for Ethernet and not for CAN (orat least for the MSCAN device) is that Ethernet devices implement so called interruptcoalescing. In this mode, the device does not interrupt the host with every receivedpacket but enforces minimum inter-IRQ time. For example Intel’s ixgbe driver supports125, 50 and 10µs as minimum inter-IRQ time. When a packet arrives at the beginningof the no-IRQ interval, the application might end up waiting for it until the end of theinterval. With busy-polling sockets, this extra latency can be eliminated at the price ofhigher CPU utilization.

5https://rtime.felk.cvut.cz/gitweb/shark/linux.git/commitdiff/busy_poll_noirq

19

https://rtime.felk.cvut.cz/gitweb/shark/linux.git/commitdiff/busy_poll_noirq

5 Conclusion

From our results, it can be clearly seen that kernel interfaces that can handle multipleCAN frames per invocation are superior to the simple read()/write() interface. Thoseinterfaces are PF PACKET sockets and recvmmsg()/sendmmsg() and both perform al-most the same. The shiny new “low-latency sockets” are not very interesting for CANbus. CAN traffic processing in the Linux kernel is about 3 times slower than in RTEMSand when the processing is done in the Linux user space, it can be even 13 times slower.

As for the future work, it might be interesting to propose a new interface that wouldbe the combination of SO BUSY POLL (i.e. polling the driver directly from the taskcontext) and memory mapped sockets. With such an interface, it might be possibleto bypass the Linux networking stack completely and achieve very good receive perfor-mance. Such interface would be suitable for logging of big number of CAN interfaceseven on low-end hardware.

Acknowledgment This work was financially supported by Volkswagen AG. The authors wouldlike to thank Oliver Hartkopp for his support.

Bibliography

[1] T. Nolte, H. Hansson, and L. L. Bello, “Automotive communications-past, current and fu-ture,” in 10th IEEE Conference on Emerging Technologies and Factory Automation (ETFA),Catania, Italy, 2005, pp. 992–1000.

[2] N. Navet, Y. Song, F. Simonot-Lion, and C. Wilwert, “Trends in automotive communicationsystems,” Proceedings of the IEEE, vol. 93(6), pp. 1204–1223, 2005.

[3] M. Sojka, P. Pısa, M. Petera, O. Spinka, and Z. Hanzalek, “A comparison of Linux CANdrivers and their applications,” in 5th IEEE International Symposium on Industrial Embed-ded Systems (SIES), 2010.

[4] S. Checkoway, D. McCoy, B. Kantor, D. Anderson, H. Shacham, S. Savage, K. Koscher,A. Czeskis, F. Roesner, and T. Kohno, “Comprehensive experimental analyses ofautomotive attack surfaces,” in Proceedings of the 20th USENIX Conference on Security,ser. SEC’11. Berkeley, CA, USA: USENIX Association, 2011, pp. 6–6. [Online]. Available:http://dl.acm.org/citation.cfm?id=2028067.2028073

[5] C. Miller and C. Valasek, “Adventures in automotive networks and control units,” Tech.Rep., 2013, http://illmatics.com/car hacking.pdf.

[6] A. Van Herrewege, D. Singelee, and I. Verbauwhede, “CANAuth — a simple, backwardcompatible broadcast authentication protocol for CAN bus,” in ECRYPT Workshop onLightweight Cryptography 2011, 2011.

[7] O. Hartkopp, C. Reuber, and R. Schilling, “MaCAN - message authenticated CAN,” in 10thEmbedded Security in Cars Conference (ESCAR), Proceedings of, 2012.

[8] B. Glas, J. Guajardo, H. Hacioglu, M. Ihle, K. Wehefritz, and A. Yavuz, “Signal-basedautomotive communication security and its interplay with safety requirements,” in 10thEmbedded Security in Cars Conference (ESCAR), Proceedings of, 2012. [Online]. Available:http://www4.ncsu.edu/∼aayavuz/2012-11-28 ESCAR SecICCv2.pdf

[9] M. Sojka, P. Pısa, O. Spinka, O. Hartkopp, and Z. Hanzalek, “Timing Analysisof a Linux-Based CAN-to-CAN Gateway,” in Thirteenth Real-Time Linux Workshop.Schramberg: Open Source Automation Development Lab eG, 2011, pp. 165–172. [Online].Available: http://lwn.net/images/conf/rtlws-2011/proc/Sojka.pdf

[10] “Rtems (real-time executive for multiprocessor systems),” http://www.rtems.org/.

[11] J. Baudy. (2014) Linux packet mmap. [Online]. Available: https://www.kernel.org/doc/Documentation/networking/packet mmap.txt

http://dl.acm.org/citation.cfm?id=2028067.2028073

http://illmatics.com/car_hacking.pdf

http://www4.ncsu.edu/~aayavuz/2012-11-28_ESCAR_SecICCv2.pdf

http://lwn.net/images/conf/rtlws-2011/proc/Sojka.pdf

http://www.rtems.org/

https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt

https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt

performance evaluation of linux can-related system calls · internals of the linux networking...

Documents