acm tech talk - linux in enterprise realtime
DESCRIPTION
Linux as an Entreprise Realtime OS.TRANSCRIPT
© 2006 IBM Corporation
IBM Linux Technology Center
Linux as an Enterprise Real-Time OS
Ankita Garg ([email protected])Realtime Team, IBM Linux Technology Center
July 25, 2008
IBM Linux Technology Center
© 2006 IBM Corporation
Agenda
Trends
What is Enterprise Real-Time ?
How to achieve it ?
Demo
IBM Linux Technology Center
© 2006 IBM Corporation
Production Systems and Realtime Response
System Administrators must: 1960: Keep system running
1970: Control user access to system
1980: Keep network running
1990: Keep system performing and scaling
2000: Keep cluster/datacenter running
2010: Keep system responding in real-time
2020: Keep Internet responding in real-time?
– Or maybe just cluster/datacenter...
IBM Linux Technology Center
© 2006 IBM Corporation
Web-Based Retail Business
IBM Linux Technology Center
© 2006 IBM Corporation
Latencies Accumulate!!
IBM Linux Technology Center
© 2006 IBM Corporation
Latencies Accumulate with Appliances Also!
IBM Linux Technology Center
© 2006 IBM Corporation
Latencies Accumulate Even More With Firewalls!!
IBM Linux Technology Center
© 2006 IBM Corporation
Software Complexity v Time Domain
10 s
1 s
100 μs
10 ms
100 ms
1 ms
10 μs
STRATEGY
TACTICS
COORDINATION
ACTUATION
SENSING
MODULATION
SIGNALINGCUSTOM HARDWARE
PE
RC
EP
TIO
N
RE
AC
TIO
N
C
OG
NIT
ION
Incr
easi
ng
so
ftw
are
com
ple
xity
SINGNALING
IBM Linux Technology Center
© 2006 IBM Corporation
But Overall Latency Goals Will Decrease... Traditional response time limits on the order of 1-2 seconds
In contrast, 100ms is perceived as ideal, 1 second just barely acceptable and 10 seconds as unacceptable
http://www.bohmann.dk/articles/response_time_still_matters.html
Improved response times gain business: http://www.zend.com/products/zend_platform
http://www-306.ibm.com/software/tivoli/products/composite-application-mgr-rtt/
Numerous other products and services to measure/improve web response times
Improvement from 1s to 100ms represents an hour per month savings for employees who use the web heavily (one page view per two minutes)
Gameset generation moving into positions with IT purchasing authority This group has grown up with sub-reflex response from computers
Endgame: 100ms end-to-end response time translates into smaller per-machine response times!
IBM Linux Technology Center
© 2006 IBM Corporation
What is Real-Time ?
IBM Linux Technology Center
© 2006 IBM Corporation
Hard Realtime: Definition #1
IBM Linux Technology Center
© 2006 IBM Corporation
Hard Realtime: Problem With Definition #1
If you have heard of a realtime system... I have a hammer that will make it miss its deadlines !
IBM Linux Technology Center
© 2006 IBM Corporation
Hard Realtime: Definition #2
IBM Linux Technology Center
© 2006 IBM Corporation
Hard Realtime: Problem With Definition #2
“Rest assured, sir, that if you life support fails, your death will most certainly not have been due to software problem!!!
IBM Linux Technology Center
© 2006 IBM Corporation
Hard Realtime: Definition #3
IBM Linux Technology Center
© 2006 IBM Corporation
Hard Realtime: Problem With Definition #3
If I have a “hard realtime” system... It simply always fails!
IBM Linux Technology Center
© 2006 IBM Corporation
So What is a Good Definition?
Real-time == predictability, guaranteed determinism
What operations must provide real-time response ?
What is the deadline ? Constraints vary in magnitude (from microseconds to seconds)
What happens in case of hardware failure? From service-level agreement being missed (stock trading)
To loss of life (airplanes)
What is the probability of meeting deadlines?
IBM Linux Technology Center
© 2006 IBM Corporation
“Hard” Realtime?
IBM Linux Technology Center
© 2006 IBM Corporation
Enterprise Real-Time
IBM Linux Technology Center
© 2006 IBM Corporation
But Isn't Realtime only for Single-CPU Systems???
IBM Linux Technology Center
© 2006 IBM Corporation
Some Use Case Scenarios
Military In today's computer-intensive military, the
ability to react rapidly and predictably to situations can be a matter of life and death. Not only do weapons and tracking systems need to be fast, but they also must be completely reliable all the time, any time, from any systems platform
Financial Services A trading desk at a brokerage firm cannot
ensure the integrity of its transactions if some are slowed because of a transaction processing bottleneck. Financial services organizations are under pressure to assure that both front-office and middle-office transactions not only are executed at blazing speed, but also are consistently fast across the board the
IBM Linux Technology Center
© 2006 IBM Corporation
How ?
IBM Linux Technology Center
© 2006 IBM Corporation
Previous Real Time Solutions
Specialized Hardware Black box, not flexible
Specialized Real Time Operating Systems (RTOS) Closed source and/or proprietary
Linux not widely used for real-time systems due to
– Unpredictable scheduling
– Low timer resolution (10 ms granularity)
– Non-preemptible kernel
Applications written in C, C++, Ada Required specialized skills, and applications were not reusable or portable
Java not widely used for real-time systems due to
– Regular Java Threads
– Garbage Collection
– Class Loading
– Just-in-time (JIT) Compiling
IBM Linux Technology Center
© 2006 IBM Corporation
Why Linux was not suitable for Real-Time
Non-preemptible kernel Critical sections
Interrupt Handling
Unpredictable scheduling No strict priority based scheduling
No preemptive scheduling
Low timer resolution As low as 10ms only
IBM Linux Technology Center
© 2006 IBM Corporation
GOAL
IBM Linux Technology Center
© 2006 IBM Corporation
Remove Sources of Non-Determinism
IBM Linux Technology Center
© 2006 IBM Corporation
Basis of comparing different Real-Time Solutions
Quality of Service (Beyond “Hard”/“Soft”) Services Supported
– Probability of meeting deadline absent HW failure
– Deadlines supported
Performance/Scalability for RT & non-RT Code
Amount of code inspection required
APIs provided
Complexity
Fault Isolation
HW/SW Configurations Supported
IBM Linux Technology Center
© 2006 IBM Corporation
Examples of Approaches CONFIG_PREEMPT_RT
Mainline real-time Linux kernel approach
Nested OS Linux instance runs as user process in enclosing RTOS
RTLinux, L4Linux, I-pipe (latency from RTLinux)
Dual-OS/Dual-Core Linux and RTOS instances run side-by-side on diff CPUs
Huge numbers of real products, e.g., cell phones
Migration Between OSes Linux and RTOS instances run side-by-side on diff CPUs transparently
RTAI-Fusion
Migration Within OS Some CPUs are tagged as realtime CPUs
ARTiS (Asymmetric Real-Time Scheduling)
IBM Linux Technology Center
© 2006 IBM Corporation
Linux Realtime Approaches
IBM Linux Technology Center
© 2006 IBM Corporation
CONFIG_PREEMPT_RT
IBM Linux Technology Center
© 2006 IBM Corporation
Realtime LinuxDeterministic low latency SMP Linux kernel
RealTime Priorities and Scheduling Policies
Preemptible Critical Sections
Preemptible Interrupt Handling
Priority Inheritance
High Resolution Timers
Fast User space mutexes and Robust Mutexes
IBM Linux Technology Center
© 2006 IBM Corporation
Real-Time Priorities and Scheduling Policies
Goal: System Wide Strict Realtime Priority Scheduling (SWSRPS)
Ensure only the highest priority tasks are executing
Defines Real-Time priorities 0-99
FIFO scheduling policy for real-time threads No time slice
High priority tasks are always at the head of the runqueue
Push and Pull Algorithm Lower priority task being woken up - Push
Lower priority task being preempted - Push
When a runqueue is lowering its priority - Pull
IBM Linux Technology Center
© 2006 IBM Corporation
Push and Pull – An Example
IBM Linux Technology Center
© 2006 IBM Corporation
Scheduling Example
IBM Linux Technology Center
© 2006 IBM Corporation
What Happens?
while (1) {if (check_something())break;
sched_yield();}
IBM Linux Technology Center
© 2006 IBM Corporation
Preemptible Critical Sections
Spinlocks are now preemptible i.e, task can block while holding a spinlock
– i.e, illegal to disable interrupts while holding a spinlock
spinlock_t data type defines a spinlock
Using spinlocks with interrupts disabled? raw_spinlock_t data type used
spin_lock()/spin_unlock() APIs used overloaded
Must now explicitly protect per-CPU variables Use get_cpu_var, DEFINE_PER_CPU_LOCKED()
IBM Linux Technology Center
© 2006 IBM Corporation
Preempting Interrupt Handlers: IRQ Threads
IBM Linux Technology Center
© 2006 IBM Corporation
Preemptible Interrupt Handling
Threaded Interrupt handling run in process context and are fully preemptible
– priorities and scheduling policy could be adjusted
a few still run in interrupt context (SA_NO_DELAY flag)
When accessing data shared between SA_NO_DELAY interrupt handlers and threads
interrupts must be disabled
Each softirq is run in a separate thread
IBM Linux Technology Center
© 2006 IBM Corporation
Interrupt Threads
# ps -eLo pid,rtprio,comm | grep IRQ 65 95 [IRQ 11] 284 95 [IRQ 8] 304 95 [IRQ 12] 364 95 [IRQ 16] 375 95 [IRQ 1] 379 95 [IRQ 3] 1078 95 [IRQ 6] 1315 95 [IRQ 19] 1830 95 [IRQ 17] 2245 95 [IRQ 4]
# ps -eLo pid,rtprio,comm | grep softirq 4 50 [softirq-high/0] 5 90 [softirq-timer/0] 6 90 [softirq-net-tx/] 7 90 [softirq-net-rx/] 8 50 [softirq-block/0] 9 50 [softirq-tasklet] 10 1 [softirq-hrtreal] 11 1 [softirq-hrtmono]
... one set per CPU
IBM Linux Technology Center
© 2006 IBM Corporation
Priority Inversion
Process P1 needs Lock L1, held by P2
Process P2 has been preempted by medium priority processes Consuming all available CPUs
Process P1 is blocked by lower-priority processes
IBM Linux Technology Center
© 2006 IBM Corporation
Preventing Priority Inversion
Trivial solution: Prohibit preemption while holding locks But degrades latency!!! Especially for sleeplocks!!!!
Simple solution: “Priority Inheritance”: P2 “inherits” P1's priority But only while holding a lock that P1 is attempting to acquire
Standard solution, very heavily used
Either way, prevent the low-priority process from being preempted
IBM Linux Technology Center
© 2006 IBM Corporation
Priority Inversion and Reader-Writer Locking
Process P1 needs Lock L1, held by P2, P3, and P4 Each of which is waiting on yet another lock
– read-held by yet more low-priority processes
– preempted by medium-priority processes
Process P1 will have a long wait, despite its high priority Even given priority inheritance: many processes to boost!
– Further degrading P1's realtime response latency
IBM Linux Technology Center
© 2006 IBM Corporation
Priority Inheritance and Reader-Writer Lock
Real-time operating systems have taken the following approaches to writer-reader priority boosting:
Boost only one reader at a time
– Reasonable on a single-CPU machine, except in presence of readers that can block for other reasons
– Extremely ineffective on an SMP machine, as the writer must wait for readers to complete serially rather than in parallel
Boost a number of readers equal to the number of CPUs
– Works well even on SMP, except in presence of readers that can block for other reasons (e.g., acquiring other locks)
Permit only one task at a time to read-hold a lock (PREEMPT_RT)
– Very fast priority boosting, but severe read-side locking bottlenecks
IBM Linux Technology Center
© 2006 IBM Corporation
High Resolution Timers
Offer higer resolution for timers
Traditionally, tied to the timekeeping subsystem, resulting in poor resolutions (10ms)
Distinguish timers from timeouts Timers
– precise event timing
– likely to expire
– linked list of timers
Timeouts
– error cases
– unlikely to expire
– timer wheel
IBM Linux Technology Center
© 2006 IBM Corporation
Original Time and Timer System
image courtesy tglx
IBM Linux Technology Center
© 2006 IBM Corporation
Timer Wheel
IBM Linux Technology Center
© 2006 IBM Corporation
Realtime Time and Timer System
image courtesy tglx
IBM Linux Technology Center
© 2006 IBM Corporation
Futexes
Fast User Space mutexes
Fast path to acquire uncontested locks
Kernel intervention only in the slowpath
Robust mutexes
IBM Linux Technology Center
© 2006 IBM Corporation
What Changed in the C Library
pthread_mutex_t has kernel support for PRIO_INHERIT The RT kernels implement priority inheritance (PI) in futexes (fast user-
space mutexes) used by pthreads
pthread_cond_APIs cause threads to be woken up in priority order
POSIX interfaces to scheduler APIs sched_*
Timer Interfaces
Note that you don't have to have an RT kernel for most of these APIs to work
IBM Linux Technology Center
© 2006 IBM Corporation
Realtime Benchmarking
LTP realtime tests sched_latency
sched_football
pi_tests
async_handler
pthread_kill_latency
More tests cyclictest
signaltest
pi_stress
IBM Linux Technology Center
© 2006 IBM Corporation
How To Get Started with RT ?
Obtain the source Vanilla kernel from: http://kernel.org
RT patches from: http://rt.et.redhat.com/download/
Configure and build
Boot into your kernel
And there you are !!!
IBM Linux Technology Center
© 2006 IBM Corporation
Impact of the changes
IBM Linux Technology Center
© 2006 IBM Corporation
gtod_latency
IBM Linux Technology Center
© 2006 IBM Corporation
sched_latency
IBM Linux Technology Center
© 2006 IBM Corporation
async_handler
IBM Linux Technology Center
© 2006 IBM Corporation
Tools for Tuning and Debugging
SystemTap custom data probes for kernel space
ftrace rt kernel built-in mechanism to trace events
oprofile system-wide profiler
tuna adjust thread priorities
set interrupt affinities
save/restore tunings
IBM Linux Technology Center
© 2006 IBM Corporation
Challenges
IBM Linux Technology Center
© 2006 IBM Corporation
IBM’s Real Time OfferingThe Power of Java and Linux Combined to Deliver Real Time Capabilities
Select IBM Hardware
Real-Time Linux
WebSphere® Real-Time (WRT)
IBM Linux Technology Center
© 2006 IBM Corporation
Summary
Right Time to move to Real-Time
CONFIG_PREEMPT_RT is an acceptable approach to make Linux a viable option as a Real-Time OS
yielding latencies as low as 10-20us
Makes efficient and cost-effective real-time solutions possible!!
IBM Linux Technology Center
© 2006 IBM Corporation
Questions
IBM Linux Technology Center
© 2006 IBM Corporation
Legal Statement
IBM is a registered trademark of International Business Machines Corporation in the United States and/or other countries.
Linux is a registered trademark of Linus Torvalds. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in
the United States, other countries, or both. Other company, product, and service names may be trademarks or service
marks of others. The information contained in this documentation is provided for informational
purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this documentation, it is provided “as is” without warranty of any kind, express or implied. In addition, this information is based on IBM's current product plans and strategy, which are subject by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this documentation or any other documentation. Nothing contained in this documentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM (or its suppliers or licensors), or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
IBM Linux Technology Center
© 2006 IBM Corporation
Thank You!!
IBM Linux Technology Center
© 2006 IBM Corporation
Backup Slides
IBM Linux Technology Center
© 2006 IBM Corporation
Priority Inversion and RCU: What is RCU?
Analogous to reader-writer lock, but readers acquire no locks Readers therefore cannot block writers
Reader-to-writer priority inversion is therefore impossible
Writers break updates into “removal” and “reclamation” phases Removals do not interfere with readers
Reclamations deferred until all readers drop references
– Readers cannot obtain references to removed items
IBM Linux Technology Center
© 2006 IBM Corporation
What is RCU?
IBM Linux Technology Center
© 2006 IBM Corporation
Priority Inversion and RCU
Process P1 needs Lock L1, but P2, P3, and P4 now use RCU P2, P3, and P4 therefore need not hold L1
Process P1 thus immediately acquires this lock
Even though P2, P3, and P4 are preempted by the per-CPU medium priority processes
No priority inheritance required Except if low on memory: permit reclaimer to free up memory
Excellent realtime latencies: medium-priority processes can run High-priority process proceeds despite low-priority process preemption