acm tech talk - linux in enterprise realtime

© 2006 IBM Corporation

IBM Linux Technology Center

Linux as an Enterprise Real-Time OS

Ankita Garg ([email protected])Realtime Team, IBM Linux Technology Center

July 25, 2008



Agenda

Trends

What is Enterprise Real-Time ?

How to achieve it ?

Demo



Production Systems and Realtime Response

System Administrators must: 1960: Keep system running

1970: Control user access to system

1980: Keep network running

1990: Keep system performing and scaling

2000: Keep cluster/datacenter running

2010: Keep system responding in real-time

2020: Keep Internet responding in real-time?

– Or maybe just cluster/datacenter...



Web-Based Retail Business



Latencies Accumulate!!



Latencies Accumulate with Appliances Also!



Latencies Accumulate Even More With Firewalls!!



Software Complexity v Time Domain

10 s

1 s

100 μs

10 ms

100 ms

1 ms

10 μs

STRATEGY

TACTICS

COORDINATION

ACTUATION

SENSING

MODULATION

SIGNALINGCUSTOM HARDWARE

PE

RC

EP

TIO

N

RE

AC

TIO

N

C

OG

NIT

ION

Incr

easi

ng

so

ftw

are

com

ple

xity

SINGNALING



But Overall Latency Goals Will Decrease... Traditional response time limits on the order of 1-2 seconds

In contrast, 100ms is perceived as ideal, 1 second just barely acceptable and 10 seconds as unacceptable

http://www.bohmann.dk/articles/response_time_still_matters.html

Improved response times gain business: http://www.zend.com/products/zend_platform

http://www-306.ibm.com/software/tivoli/products/composite-application-mgr-rtt/

Numerous other products and services to measure/improve web response times

Improvement from 1s to 100ms represents an hour per month savings for employees who use the web heavily (one page view per two minutes)

Gameset generation moving into positions with IT purchasing authority This group has grown up with sub-reflex response from computers

Endgame: 100ms end-to-end response time translates into smaller per-machine response times!

http://www.bohmann.dk/articles/response_time_still_matters.html

http://www.zend.com/products/zend_platform

http://www-306.ibm.com/software/tivoli/products/composite-application-mgr-rtt/



What is Real-Time ?



Hard Realtime: Definition #1



Hard Realtime: Problem With Definition #1

If you have heard of a realtime system... I have a hammer that will make it miss its deadlines !




“Rest assured, sir, that if you life support fails, your death will most certainly not have been due to software problem!!!




If I have a “hard realtime” system... It simply always fails!



So What is a Good Definition?

Real-time == predictability, guaranteed determinism

What operations must provide real-time response ?

What is the deadline ? Constraints vary in magnitude (from microseconds to seconds)

What happens in case of hardware failure? From service-level agreement being missed (stock trading)

To loss of life (airplanes)

What is the probability of meeting deadlines?



“Hard” Realtime?



Enterprise Real-Time



But Isn't Realtime only for Single-CPU Systems???



Some Use Case Scenarios

Military In today's computer-intensive military, the

ability to react rapidly and predictably to situations can be a matter of life and death. Not only do weapons and tracking systems need to be fast, but they also must be completely reliable all the time, any time, from any systems platform

Financial Services A trading desk at a brokerage firm cannot

ensure the integrity of its transactions if some are slowed because of a transaction processing bottleneck. Financial services organizations are under pressure to assure that both front-office and middle-office transactions not only are executed at blazing speed, but also are consistently fast across the board the



How ?



Previous Real Time Solutions

Specialized Hardware Black box, not flexible

Specialized Real Time Operating Systems (RTOS) Closed source and/or proprietary

Linux not widely used for real-time systems due to

– Unpredictable scheduling

– Low timer resolution (10 ms granularity)

– Non-preemptible kernel

Applications written in C, C++, Ada Required specialized skills, and applications were not reusable or portable

Java not widely used for real-time systems due to

– Regular Java Threads

– Garbage Collection

– Class Loading

– Just-in-time (JIT) Compiling



Why Linux was not suitable for Real-Time

Non-preemptible kernel Critical sections

Interrupt Handling

Unpredictable scheduling No strict priority based scheduling

No preemptive scheduling

Low timer resolution As low as 10ms only



GOAL



Remove Sources of Non-Determinism



Basis of comparing different Real-Time Solutions

Quality of Service (Beyond “Hard”/“Soft”) Services Supported

– Probability of meeting deadline absent HW failure

– Deadlines supported

Performance/Scalability for RT & non-RT Code

Amount of code inspection required

APIs provided

Complexity

Fault Isolation

HW/SW Configurations Supported



Examples of Approaches CONFIG_PREEMPT_RT

Mainline real-time Linux kernel approach

Nested OS Linux instance runs as user process in enclosing RTOS

RTLinux, L4Linux, I-pipe (latency from RTLinux)

Dual-OS/Dual-Core Linux and RTOS instances run side-by-side on diff CPUs

Huge numbers of real products, e.g., cell phones

Migration Between OSes Linux and RTOS instances run side-by-side on diff CPUs transparently

RTAI-Fusion

Migration Within OS Some CPUs are tagged as realtime CPUs

ARTiS (Asymmetric Real-Time Scheduling)



Linux Realtime Approaches



CONFIG_PREEMPT_RT



Realtime LinuxDeterministic low latency SMP Linux kernel

RealTime Priorities and Scheduling Policies

Preemptible Critical Sections

Preemptible Interrupt Handling

Priority Inheritance

High Resolution Timers

Fast User space mutexes and Robust Mutexes



Real-Time Priorities and Scheduling Policies

Goal: System Wide Strict Realtime Priority Scheduling (SWSRPS)

Ensure only the highest priority tasks are executing

Defines Real-Time priorities 0-99

FIFO scheduling policy for real-time threads No time slice

High priority tasks are always at the head of the runqueue

Push and Pull Algorithm Lower priority task being woken up - Push

Lower priority task being preempted - Push

When a runqueue is lowering its priority - Pull



Push and Pull – An Example



Scheduling Example



What Happens?

while (1) {if (check_something())break;

sched_yield();}



Preemptible Critical Sections

Spinlocks are now preemptible i.e, task can block while holding a spinlock

– i.e, illegal to disable interrupts while holding a spinlock

spinlock_t data type defines a spinlock

Using spinlocks with interrupts disabled? raw_spinlock_t data type used

spin_lock()/spin_unlock() APIs used overloaded

Must now explicitly protect per-CPU variables Use get_cpu_var, DEFINE_PER_CPU_LOCKED()



Preempting Interrupt Handlers: IRQ Threads



Preemptible Interrupt Handling

Threaded Interrupt handling run in process context and are fully preemptible

– priorities and scheduling policy could be adjusted

a few still run in interrupt context (SA_NO_DELAY flag)

When accessing data shared between SA_NO_DELAY interrupt handlers and threads

interrupts must be disabled

Each softirq is run in a separate thread



Interrupt Threads

# ps -eLo pid,rtprio,comm | grep IRQ 65 95 [IRQ 11] 284 95 [IRQ 8] 304 95 [IRQ 12] 364 95 [IRQ 16] 375 95 [IRQ 1] 379 95 [IRQ 3] 1078 95 [IRQ 6] 1315 95 [IRQ 19] 1830 95 [IRQ 17] 2245 95 [IRQ 4]

# ps -eLo pid,rtprio,comm | grep softirq 4 50 [softirq-high/0] 5 90 [softirq-timer/0] 6 90 [softirq-net-tx/] 7 90 [softirq-net-rx/] 8 50 [softirq-block/0] 9 50 [softirq-tasklet] 10 1 [softirq-hrtreal] 11 1 [softirq-hrtmono]

... one set per CPU



Priority Inversion

Process P1 needs Lock L1, held by P2

Process P2 has been preempted by medium priority processes Consuming all available CPUs

Process P1 is blocked by lower-priority processes



Preventing Priority Inversion

Trivial solution: Prohibit preemption while holding locks But degrades latency!!! Especially for sleeplocks!!!!

Simple solution: “Priority Inheritance”: P2 “inherits” P1's priority But only while holding a lock that P1 is attempting to acquire

Standard solution, very heavily used

Either way, prevent the low-priority process from being preempted



Priority Inversion and Reader-Writer Locking

Process P1 needs Lock L1, held by P2, P3, and P4 Each of which is waiting on yet another lock

– read-held by yet more low-priority processes

– preempted by medium-priority processes

Process P1 will have a long wait, despite its high priority Even given priority inheritance: many processes to boost!

– Further degrading P1's realtime response latency



Priority Inheritance and Reader-Writer Lock

Real-time operating systems have taken the following approaches to writer-reader priority boosting:

Boost only one reader at a time

– Reasonable on a single-CPU machine, except in presence of readers that can block for other reasons

– Extremely ineffective on an SMP machine, as the writer must wait for readers to complete serially rather than in parallel

Boost a number of readers equal to the number of CPUs

– Works well even on SMP, except in presence of readers that can block for other reasons (e.g., acquiring other locks)

Permit only one task at a time to read-hold a lock (PREEMPT_RT)

– Very fast priority boosting, but severe read-side locking bottlenecks



High Resolution Timers

Offer higer resolution for timers

Traditionally, tied to the timekeeping subsystem, resulting in poor resolutions (10ms)

Distinguish timers from timeouts Timers

– precise event timing

– likely to expire

– linked list of timers

Timeouts

– error cases

– unlikely to expire

– timer wheel



Original Time and Timer System

image courtesy tglx



Timer Wheel



Realtime Time and Timer System

image courtesy tglx



Futexes

Fast User Space mutexes

Fast path to acquire uncontested locks

Kernel intervention only in the slowpath

Robust mutexes



What Changed in the C Library

pthread_mutex_t has kernel support for PRIO_INHERIT The RT kernels implement priority inheritance (PI) in futexes (fast user-

space mutexes) used by pthreads

pthread_cond_APIs cause threads to be woken up in priority order

POSIX interfaces to scheduler APIs sched_*

Timer Interfaces

Note that you don't have to have an RT kernel for most of these APIs to work



Realtime Benchmarking

LTP realtime tests sched_latency

sched_football

pi_tests

async_handler

pthread_kill_latency

More tests cyclictest

signaltest

pi_stress



How To Get Started with RT ?

Obtain the source Vanilla kernel from: http://kernel.org

RT patches from: http://rt.et.redhat.com/download/

Configure and build

Boot into your kernel

And there you are !!!

http://kernel.org/

http://rt.et.redhat.com/download/



Impact of the changes



gtod_latency



sched_latency



async_handler



Tools for Tuning and Debugging

SystemTap custom data probes for kernel space

ftrace rt kernel built-in mechanism to trace events

oprofile system-wide profiler

tuna adjust thread priorities

set interrupt affinities

save/restore tunings



Challenges



IBM’s Real Time OfferingThe Power of Java and Linux Combined to Deliver Real Time Capabilities

Select IBM Hardware

Real-Time Linux

WebSphere® Real-Time (WRT)



Summary

Right Time to move to Real-Time

CONFIG_PREEMPT_RT is an acceptable approach to make Linux a viable option as a Real-Time OS

yielding latencies as low as 10-20us

Makes efficient and cost-effective real-time solutions possible!!



Questions



Legal Statement

IBM is a registered trademark of International Business Machines Corporation in the United States and/or other countries.

Linux is a registered trademark of Linus Torvalds. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in

the United States, other countries, or both. Other company, product, and service names may be trademarks or service

marks of others. The information contained in this documentation is provided for informational

purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this documentation, it is provided “as is” without warranty of any kind, express or implied. In addition, this information is based on IBM's current product plans and strategy, which are subject by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this documentation or any other documentation. Nothing contained in this documentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM (or its suppliers or licensors), or altering the terms and conditions of the applicable license agreement governing the use of IBM software.



Thank You!!



Backup Slides



Priority Inversion and RCU: What is RCU?

Analogous to reader-writer lock, but readers acquire no locks Readers therefore cannot block writers

Reader-to-writer priority inversion is therefore impossible

Writers break updates into “removal” and “reclamation” phases Removals do not interfere with readers

Reclamations deferred until all readers drop references

– Readers cannot obtain references to removed items



What is RCU?



Priority Inversion and RCU

Process P1 needs Lock L1, but P2, P3, and P4 now use RCU P2, P3, and P4 therefore need not hold L1

Process P1 thus immediately acquires this lock

Even though P2, P3, and P4 are preempted by the per-CPU medium priority processes

No priority inheritance required Except if low on memory: permit reclaimer to free up memory

Excellent realtime latencies: medium-priority processes can run High-priority process proceeds despite low-priority process preemption