open source development labs carrier grade linux ... · carrier grade linux performance...

Open Source Development Labs

Carrier Grade Linux

Performance Requirements Definition

Version 3.0

Prepared by the Carrier Grade Linux Working Group

Open Source Development Labs, Inc. 12725 SW Millikan Way, Suite 400 Beaverton, OR 97005 USA Phone: +1-503-626-2455

Copyright (c) 2005 by The Open Source Development Labs, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is available at http://www.opencontent.org/opl.shtml/). Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder.

Other company, product, or service names may be the trademarks of others.

Linux is a Registered Trademark of Linus Torvalds.

Contributors to the Performance Requirements Definition include (in alphabetical order): Badovinatz, Peter (IBM) * Chacron, Eric (Alcatel) Chen, Terence (Intel) Cherry, John (OSDL) Dake, Steven (Montavista) Fleisher, Julie (OSDL) Gross, Mark (Intel) Haddad, Ibrahim (Ericsson) Howell, David P. (Intel) Ikebe, Takashi (NTT ) Johnson, Christopher P. (Sun) Krauska, Joel (Cisco) Kukkonen, Mika (Nokia) La Monte.H.P , Yarrol (Timesys) Peter-Gonzalez, Inaky (Intel) Rossi, Frederic (Ericsson) Saskena, Manas (Timesys) Witham, Timothy D. (OSDL) *Specification editor

Comments on the contents of this document should be sent to [email protected] .

http://www.opencontent.org/opl.shtml

mailto:[email protected]

Carrier Grade Linux Performance Requirements Definition Version 3.0

1 Introduction to CGL Performance Requirements....................................................1 1.1 Real-time Processing ...............................................................................................1

1.1.1 Scope............................................................................................... 1 1.1.2 Maximum scheduling and interrupt latency ................................... 1 1.1.3 Configurable scheduler quantum for a round robin policy............. 2 1.1.4 High resolution timers and 1 ms tick support ................................. 2 1.1.5 POSIX real-time features................................................................ 3 1.1.6 Protection against priority inversion............................................... 3 1.1.7 Replacing kernel semaphores by mutex ......................................... 3 1.1.8 Message queues with priority promotion........................................ 3 1.1.9 Handling interrupts as kernel threads ............................................. 3 1.1.10 Tuning internal scheduling policies and priorities........................ 4

1.2 Symmetric Multiprocessing (SMP) ..........................................................................4 1.2.1 Reducing SMP contention .............................................................. 4 1.2.2 Process affinity................................................................................ 4 1.2.3 Interrupt handler affinity................................................................. 5 1.2.4 Hyper-Threading support................................................................ 5

1.3 Memory Usage .........................................................................................................5 1.3.1 Introduction..................................................................................... 5 1.3.2 Support of more than 4G physical memory.................................... 5 1.3.3 Support of more than 64G physical memory.................................. 5

1.4 Communication Services..........................................................................................5 1.4.1 Introduction..................................................................................... 5 1.4.2 Low software overhead for message latency .................................. 5 1.4.3 IPv4, IPv6, MIPv6 forwarding tables fast access and compact memory ....................................................................................................... 6 1.4.4 Gigabit Ethernet jumbo MTU support ............................................ 6 1.4.5 ARP cache immediate flush on client side ..................................... 6 1.4.6 Optimizing protocol stacks on SMP ............................................... 6 1.4.7 Cluster communication service....................................................... 6 1.4.8 Diffserv support .............................................................................. 7 1.4.9 Prioritized protocol processing ....................................................... 7

1.5 I/O and File Systems ................................................................................................7 1.5.1 Network storage replication............................................................ 7 1.5.2 NFS performance ............................................................................ 7

1.6 Availability and Initialization ..................................................................................7 1.6.1 Application pre-loading .................................................................. 7

1.7 Measurements and benchmarks...............................................................................7 1.7.1 Standard benchmark........................................................................ 7 1.7.2 On-line resource control ................................................................. 8

2 Document Organization ..............................................................................................8 3 Requirements and Roadmap Definitions...................................................................8

i


4 Performance Requirements ......................................................................................10 PRF.1 Real-Time Support Enhancements........................................... 10 PRF.1.1 Low Scheduling Latency ...........................................................10 PRF.1.2 Configurable Scheduler Quantum For Round Robin Scheduling Policy .........................................................................................................10 PRF.1.3 1 ms Tick Support......................................................................10 PRF.1.4 High-Resolution Timers ............................................................11 PRF.1.5 POSIX Real-Time Features .......................................................11 PRF.1.6 Protecting Against Priority Inversion On Mutex.......................11 PRF.1.7 Handling Interrupts As Threads.................................................11 PRF.2 SMP Performances.................................................................... 11 PRF.2.1 Enabling Process Affinity..........................................................12 PRF.2.2 Enabling Interrupt Top-Half Affinity ........................................12 PRF.2.3 Hyper-Threading Support ..........................................................12 PRF.3 Memory usage............................................................................ 12 PRF.3.1 Dynamic allocation with low space loss....................................12 PRF.3.2 Support More Than 4 Gigabyte Physical Memory....................12 PRF.4 Communication Services Performances ................................. 13 PRF.4.1 IP Forwarding Tables Fast Access And Compact Memory ......13 PRF.4.2 Support of Gigabit Ethernet Jumbo MTU .................................13 PRF.5.0 Efficient Low-Level Asynchronous Events.............................. 13 PRF.6.0 Managing Transient Data.......................................................... 14 PRF.7.0 Interruptless Ethernet Delivery................................................. 14

5 Performance Roadmap..............................................................................................15 PRF.1 Real-Time Support Enhancements.......................................... 15 PRF.1.8 Use Kernel Mutexes...................................................................15 PRF.1.9 Message Queues With Priority Promotion ................................15 PRF.1.10 Configurable Scheduling Policies And Priorities ....................15 PRF.1.11 Implementing Priority Inheritance Inside API.........................15 PRF.1.12 Reducing Virtual Memory Access Latency.............................16 PRF.3 Memory Usage ........................................................................... 16 PRF.3.3 Support More Than 64 Gigabyte Physical Memory..................16 PRF.4 Communication Services Performances ................................. 16 PRF.4.3 Prioritized Protocol Processing..................................................16 PRF.4.4 Low Software Overhead For Message Latency.........................16 PRF.4.5 ARP Cache Immediate Flush On Client ....................................16 PRF.8.0 Network Storage Replication Performances............................. 17 PRF.9.0 NFS Peformance ....................................................................... 17 PRF.10.0 CGL Benchmark..................................................................... 18 PRF.11 Application (Pre)loading Capability ..................................... 18 PRF.11.1 Application (Pre)loading Non-Root.........................................19 PRF.11.2 Application (Pre)loading Limits ..............................................19 PRF.12.0 Flexible Process Scheduling Policy Framework .................... 19 PRF.13.0 Page Flushing.......................................................................... 20

ii


Appendices 21 A.1. Performance References.......................................................................................21

iii


1 Introduction to CGL Performance Requirements The CGL Performance section contains requirements essential to a carrier-grade system that apply to the Linux kernel, core libraries, and tools. Performance requirements have significant bearing on application performance. Carrier grade applications have some unique requirements, although they also share many needs with more general applications.

1.1 Real-time Processing

1.1.1 Scope The telecommunications application market faces new technical challenges with the introduction of architectures such as Next Generation Networks and IP multimedia services for mobile networks.

Real-time behavior is a major issue for new applications and protocol classes based on IP services such as VoIP, SIGTRAN, and RTP, where real time behavior drives the quality of service for end-users. Enhancements in real-time behavior would allow Linux to be used for some applications that are currently run on other real-time operating systems.

This document does not make a distinction between hard real-time and soft real-time support in the Linux kernel. Real-time capabilities are defined in terms such as maximum scheduling latency.

1.1.2 Maximum scheduling and interrupt latency The CGL 3.0 performance subgroup defines scheduling latency as the time between when an event marks a process run-able and the time the process starts to run. For example, when a real-time task responds to an interrupt event, the scheduling latency is the time from when the interrupt handler makes the process run-able to when the first instruction of that process begins to execute.

For an interrupt event, the total process dispatch latency time (defined as PDLT in [8] in Appendix A.1) includes both the interrupt latency and the scheduling latency.

Interrupt latency includes:

• The delay caused by hardware • The time to execute sections of code in which the interrupt is disabled • Delays due to handling higher priority interrupts • The time to execute the interrupt handler top half • The time to execute the interrupt handler bottom half

Scheduling latency includes: • Execution of critical sections where preemption is disabled • Context switching time

See [8] in Appendix A.1 for more details.

1


The maximum scheduling latency for a user process is a direct metric of the real-time behavior of Linux from the application perspective. The maximum scheduling latency depends on several factors including the user’s process scheduling policy and priorities.

For the kernel, the maximum latency depends on the following items:

• Scheduler response time for task context switching. • Interrupt handler overhead and delay (See Section 1.1.9). • Length of kernel critical sections that disable preemption. Although Linux 2.6

is preemptible, Linux has been designed to be non-preemptible. Therefore, in many sections, preemption is disabled or lock-breaking preemption points have been introduced to reduce related latencies. SMP spin_lock is an example of such a critical section. (See Section 1.1.7)

• Occurrences of priority inversions (See Section 1.1.6).

The Linux kernel now provides the following capabilities that enhance predictable process scheduling:

• Preemptable kernel

• Preemption points in kernel functions such as process fork and exec and virtual memory page cache management.

• The 0(1) constant order scheduler

1.1.3 Configurable scheduler quantum for a round robin policy Real-time applications expect fine control of process scheduling so that CPU resource usage can be fairly distributed among processes. The latency of context switching between real-time processes can be controlled by a configurable quantum value.

In the current stable kernel, the quantum value is neither configurable nor easy to determine from design perspective.

The POSIX specification recommends configuration of a quantum value at the machine level with a minimum tick value of 1 millisecond (ms).

1.1.4 High resolution timers and 1 ms tick support Incorporating high-resolution timers based on a 1 ms tick, rather than the currently supported 10 ms tick, will enhance the real-time task scheduling capabilities of Linux. If hardware platform support is provided for a 1 ms tick, the kernel will no longer be required to program a specific timer to elapse after 1 ms, eliminating overhead.

This feature enables:

• A 1 ms quantum to be managed for task scheduling (as described in Section 1.1.3, with maximum latency as defined in Section 1.1.2).

• A 1 ms timer to be managed without requiring the kernel to program a specific clock. Configuring the kernel with a 1 ms tick value rather than the current 10

2


ms tick value allows rescheduling to occur every 1 ms in response to a periodic clock timer interrupt.

1.1.5 POSIX real-time features POSIX real-time and advanced real-time features enable better support for real-time, portable applications at the API level. See STD.2.1 in the CGL Standards Requirements Definition – Version 3 for a list of POSIX tags that apply to real-time features.

1.1.6 Protection against priority inversion Priority inversion is an issue for real-time application programming because scheduling priorities defined by design may be inverted causing unexpected latencies. Priority inversion happens when a lower priority thread blocks a higher priority one. The most general case is when a lower priority thread holds a resource needed by the higher priority thread.

Priority inversion protection can be provided in the Linux kernel by dynamically modifying the thread scheduling priority when lower priority threads are holding resources.

Transitive priority inheritance is required to deal with cases where several mutexes are used by several threads (see [8] in Appendix A.1).

Scheduling policy can also be dynamically modified by the protection mechanism. For example, time-sharing threads can be promoted to real-time FIFO threads. This can have undesired consequences, however, as timesharing processes are generally not coded with FIFO policy in mind. A means should be provided for the client application to specify priority inheritance or priority protection capabilities for the internal mutexes that they use.

APIs providing this capability should be implemented in such a way so that they will perform correctly if they are promoted to real-time policies.

1.1.7 Replacing kernel semaphores by mutex The replacement of semaphores used as mutexes inside the kernel with robust mutexes may help prevent priority inversion.

1.1.8 Message queues with priority promotion The priority inheritance protection mechanism can be extended by using a dynamic priority promotion system for message queues. In such a system, the priority of the receiver thread is promoted by the scheduler according to the message priority, enabling processing of urgent messages with high scheduling priority.

1.1.9 Handling interrupts as kernel threads Since interrupt service routines are not allowed to sleep, preemption locks in interrupt handlers normally can’t be changed to mutexes. To change preemption locks that are placed in interrupt service routines, interrupt service routines (aside from the timer interrupt routines) could be handled by kernel threads.

3


Mapping interrupt service routines onto real-time kernel threads enables interrupt handlers to be assigned priorities and soft real-time processes to be given higher priorities than interrupt handlers, allowing better designs. An additional benefit is the reduction of critical sections in interrupt handlers.

1.1.10 Tuning internal scheduling policies and priorities Applications need to be able to modify the default scheduling policies and priorities of Linux internal threads depending on their requirements. For instance, disk-oriented applications should be able to upgrade file system daemons to real-time policy. The mechanism for modifying policies and priorities should be persistent from reboot to reboot and accessible through /proc.

1.2 Symmetric Multiprocessing (SMP)

1.2.1 Reducing SMP contention Improving performance and scalability in an SMP system can be accomplished by reducing resource contention through process affinity, interrupt affinity, and Hyper-Threading support.

SMP kernel critical sections can be handled by:

• A spin-lock • A mutex, if not used in an interrupt handler

Generally, the spin-lock option is the faster in terms of CPU time, but it requires that preemption be disabled and introduces processor-level latency when the resource is already locked. The mutex option adds mutex and context switching costs, but latency remains at the process level.

Using spin-lock with a high number of processors can lead to high latency depending on the critical section length.

Quality of service must be taken into account for following cases:

• When timers are armed in parallel on several processors • When concurrent file accesses occur • When shared-memory is accessed by several processors

1.2.2 Process affinity Process affinity provides for load balancing at the application level. When process affinity is used, it provides more efficient caching. For example, it must be possible to bind real-time processes to specified processors. Other processes in the systems do not need to be assigned to specified processors.

4


1.2.3 Interrupt handler affinity Assigning the top half of interrupt handlers to a single processor enables load balancing of interrupt handlers. The bottom half and top half of each interrupt handler should be assigned to the same CPU to reduce inter-processor contention.

1.2.4 Hyper-Threading support Because the logical Hyper-Threaded processors share a cache, the scheduler only needs to keep threads attached to one of the adjacent logical processors. The scheduler can move threads between adjacent logical processors with no performance degradation because the cache is stable between the two logical processors.

1.3 Memory Usage

1.3.1 Introduction As CPU capabilities increase, memory demands also increase as more communication contexts can be handled per system. Memory related requirements are oriented toward high physical memory (HIGHMEM) and virtual memory.

1.3.2 Support of more than 4G physical memory Support for more than 4G of physical memory is a requirement for 32-bit and 64-bit processor architectures.

1.3.3 Support of more than 64G physical memory The previous requirement should be extended to include support for more than 64G bytes of physical memory.

1.4 Communication Services

1.4.1 Introduction Communication services have a major impact on performance of telecommunications applications. Performance of Linux stacks should be evaluated as follows:

• Message delivery latency and throughput • Resource usage including CPU and memory usage • Load balancing capability on an SMP system

1.4.2 Low software overhead for message latency As the hardware transmission latency decreases with a new generation of physical layer hardware, software overhead has become a more significant issue. Enhancements could include the following:

• A zero_copy based on memory mapping (as processor/memory performance increases).

5


• A shorter path from the user to the physical layer than is provided by the BSD socket interface and abstract layers currently used.

New communication services such as GAMMA, MPI, and VIA must be supported.

1.4.3 IPv4, IPv6, MIPv6 forwarding tables fast access and compact memory

The speed at which packets can be routed is limited by the time it takes to perform the forwarding table lookup for each packet.

When a basic lookup method is used, such as the BSD binary trie, the number of nodes equal to the length of the address in bits is potentially traversed in the forwarding table, generating an equivalent number of memory accesses. The current Linux implementation is not highly scalable.

Methods faster than those currently available should be implemented to support 2000 routes updated per second and up to 500,000 routes with low lookup latency. The tradeoff between memory and access latency should also be addressed.

See “Survey and taxonomy of IP address lookup algorithms “ at http://mia.ece.uic.edu/~papers/Surveys/pdf00000.pdf.

1.4.4 Gigabit Ethernet jumbo MTU support Support for an increase of MTU size to 9000 bytes will reduce the number of frames exchanged and associated CPU overhead. This should be configurable because some applications will prefer a smaller message size for lower latencies. For an Ethernet message transmission, message size and low latency are tradeoffs.

See NetPIPE study at http://www.scl.ameslab.gov/netpipe/np_euro.pdf.

1.4.5 ARP cache immediate flush on client side When migrant IP addresses are used, client ARP tables need to be updated as soon as the IP address has migrated. A gratuitous ARP message must be sent immediately from the server to the clients to refresh the ARP caches.

1.4.6 Optimizing protocol stacks on SMP Linux protocol stacks should be optimized by load balancing the protocol on multiple processors to obtain maximum benefits from SMP.

Load balancing shall be enabled by configuration of the stack itself rather than by the interrupt (IRQ) configuration, since by default, IRQs are balanced if they are not assigned to a specific CPU.

1.4.7 Cluster communication service A cluster benefits from a cluster specific communication service that addresses specific issues such as latency, ordering, and recovery. A cluster communication service can achieve better performance than a general communication service when used in a cluster, because it has knowledge of the local topology, including the cluster membership.

6

http://mia.ece.uic.edu/~papers/Surveys/pdf00000.pdf

http://www.scl.ameslab.gov/netpipe/np_euro.pdf


1.4.8 Diffserv support Support should be provided for Differentiated Services (RFCs 2474 and 2475) for IPv4 to enable quality of service and traffic control.

1.4.9 Prioritized protocol processing A prioritized protocol processing mechanism enables a high-priority process to quickly obtain data from the network even if massive packets arrive for multiple processes. It is based on a protocol priority assignment mechanism that allows a higher scheduling priority to be given to the protocol with higher priority.

1.5 I/O and File Systems

1.5.1 Network storage replication A network storage replication service uses local network and device resources. Performance depends on the local network and storage devices used.

A network storage replication service provides a lower performance level compared to local storage access. The relative difference must be less than 30 % in terms of user throughput in normal conditions when mirrored devices are synchronized.

Upon device resynchronization, the user throughput should not be reduced more than 25% compared to normal conditions.

1.5.2 NFS performance The Network File System (NFS) has been the standard distributed file system for *NIX systems for almost two decades. Carrier grade systems require a full featured and scalable implementation of NFS. Maximizing NFS performance and functionality will require the development of test tools, iterative testing and analysis of open and industry benchmarks, analysis of bottlenecks and failures, and refining improvements.

1.6 Availability and Initialization

1.6.1 Application pre-loading The CGL 2.0 requirement for application pre-loading should be extended to enhance dynamic loading performance. Often, several seconds are spent in the dynamic ELF loader for symbol relocation.

1.7 Measurements and benchmarks

1.7.1 Standard benchmark A standard benchmark tool to survey Linux performance from release to release is strongly required. The benchmark should provide applications with different metrics based on application profiles. For example, a gateway server would apply a high weighting to a communication performance subset, whereas a billing server would apply a high weighting to the storage access subset.

7


1.7.2 On-line resource control Embedded measurement services would provide the user with periodic statistics on performance.

2 Document Organization This document is a section of the OSDL Carrier Grade Linux Requirements Definition Version 3.0, which is organized into the separately published sections listed below:

Overview of Requirements Version 3.0

Availability Requirements Definition Version 3.0

Clustering Requirements Definition Version 3.0

Hardware Requirements Definition Version 3.0

Performance Requirements Definition Version 3.0

Security Requirements Definition Version 3.0 (to be released mid-2005)

Serviceability Requirements Definition Version 3.0

Standards Requirements Definition Version 3.0

Released versions of these sections can be found at http://www.osdl.org/lab_activities/carrier_grade_linux/documents.html/document_view.

3 Requirements and Roadmap Definitions Two types of requirements are included in each section of the OSDL Carrier Grade Linux Requirements Definition Version 3.0:

• Requirements –Describes requirements necessary for a CGL system

• Roadmap –Highlights possible future requirements

Each requirement or roadmap item is described as follows:

ID A unique identification number including:

• An acronym identifying a category for the requirement (first field).

• An ID number for the requirement (second field)

• An ID number for a sub-requirement (third field). A “0”in this field indicates the requirement is a stand-alone requirement. An empty field indicates the requirement is a summary requirement with sub-requirements. A number in this field indicates this requirement is a sequentially numbered sub-requirement.

A summary requirement is also indicated by bolding the header of the requirement.

8

http://www.osdl.org/lab_activities/carrier_grade_linux/documents.html/document_view


Name Short description of the requirement

Category The category to which the requirement is assigned. The category to which the requirement is assigned. The category for Serviceability is:

PRF.x.x Performance

Description Detailed description of the requirement.

9


4 Performance Requirements ID Name Category PRF.1

Real-Time Support Enhancements { TC “PRF.1 Real-Time Support Enhancements” \l 3 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide enhancements to real-time support capabilities.

ID Name Category PRF.1.1 Low Scheduling Latency { TC “PRF.1.1 Low Scheduling

Latency” \l 4 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide the ability to configure the kernel to provide real time support so the scheduling latency of a given task will not exceed a target defined by the vendor. Based on commodity hardware commonly supported by Linux, latency responses of less than 1 millisecond should be considered a reasonable and likely target.

Notes: See general information at:

• http://inf3-www.informatik.unibw-muenchen.de/research/linux/hannover/automation_conf04.pdf

• http://www.linuxdevices.com/files/article027/rh-rtpaper.pdf ID Name Category PRF.1.2 Configurable Scheduler Quantum For Round Robin Scheduling Policy {

TC “PRF.1.2 Configurable Scheduler Quantum For Round Robin Scheduling Policy” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide a configurable quantum value for round robin real-time scheduling policy. This quantum value shall be configurable at the machine level as recommended in the POSIX specification. The minimum value of the range for the quantum value shall be the tick value (for example, 1 ms on the Intel x86 architecture

Notes: Robert Love project reference: http://www.kernel.org/pub/linux/kernel/people/rml/sched/sched-tunables/README ID Name Category PRF.1.3 1 ms Tick Support { TC “PRF.1.3 1 ms Tick Support ” \l 4 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall support a 1 ms tick value on all compatible architectures rather than the 10 ms tick value currently supported. This feature enables quantum management at a 1 ms resolution for scheduling and 1 ms timer support without the added overhead of hardware programming.

The base overhead of the timer interrupt handler should remain less than 0.1% of CPU time.

10

http://inf3-www.informatik.unibw-muenchen.de/research/linux/hannover/automation_conf04.pdf


http://www.linuxdevices.com/files/article027/rh-rtpaper.pdf

http://www.kernel.org/pub/linux/kernel/people/rml/sched/sched-tunables/README


ID Name Category PRF.1.4 High-Resolution Timers { TC “PRF.1.4 High-Resolution

Timers ” \l 4 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide high-resolution timer support. To work without data loss, communications applications, such as VoIP, require processes to wake up every 2ms with a jitter of less than 0.2ms.

Notes: Reference: http://sourceforge.net/projects/high-res-timers/ ID Name Category PRF.1.5

POSIX Real-Time Features{ TC “PRF.1.5 POSIX Real-Time Features” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable POSIX real-time and advanced real-time features at the API level.

Note: Defined in STD.2.1. ID Name Category PRF.1.6

Protecting Against Priority Inversion On Mutex { TC “PRF.1.6 Protecting Against Priority Inversion On Mutex ” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall support a mechanism for protecting against priority inversion when using a mutex to synchronize tasks. This mechanism shall support transitive priority inheritance and resolve cases where several mutexes are owned by the same task. It shall be supported in UP and SMP contexts. ID Name Category PRF.1.7

Handling Interrupts As Threads { TC “PRF.1.7 Handling Interrupts As Threads ” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable handling of interrupt handlers (top half and bottom half) as a task-based process rather than in interrupt processing routine mechanism to allow:

• A mutex-based critical section inside an interrupt handler.

• The ability for an interrupt handler to sleep.

• Prioritization of an interrupt handler based on real-time scheduling priorities.

• Affinity and load-balancing in an SMP.

Context switching overhead should be considered case by case in the application design.

Notes: The worker thread mechanism introduced in Linux 2.6 enables bottom half handling by worker threads. See [8] in Appendix A.1 for related paper.

ID Name Category PRF.2 SMP Performances{ TC “PRF.2 SMP Performances ” \l 3 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable best usage of SMP by reducing sources of internal contention.

11

http://sourceforge.net/projects/high-res-timers/


ID Name Category PRF.2.1 Enabling Process Affinity { TC “PRF.2.1 Enabling Process

Affinity ” \l 4 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable process affinity. ID Name Category PRF.2.2

Enabling Interrupt Top-Half Affinity { TC “PRF.2.2 Enabling Interrupt Top-Half Affinity ” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable interrupt top half handler affinity.

Note: The latest stable kernel enables interrupt affinity based on the /proc configuration interface.

ID Name Category PRF.2.3

Hyper-Threading Support. { TC “PRF.2.3 Hyper-Threading Support ” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable process and interrupt migration between logical processors.

Note: The latest stable kernel enables this feature.

ID Name Category PRF.3 Memory usage { TC “PRF.3 Memory usage ” \l 3 } Performance

Description: OSDL CGL specifies that the memory management in carrier grade Linux shall be efficient and that large amounts of physical memory shall be supported, even on 32-bit architectures. ID Name Category PRF.3.1

Dynamic allocation with low space loss { TC “PRF.3.1 Dynamic allocation with low space loss ”\l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall allow less than 10% loss of memory space, due to internal memory usage by the system and fragmentation, during periods of intense dynamic memory allocation.

Note: The latest stable kernel enables this feature. Performance comparisons of several operating systems can be found at http://www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html .


Support More Than 4 Gigabyte Physical Memory. { TC “PRF.3.2 Support More Than 4 Gigabyte Physical Memory ” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable support of more than 4 gigabytes of physical memory, even on 32-bit architectures. Hardware memory-management unit (MMU) support is required.

12

http://www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html

http://www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html


ID Name Category PRF.4

Communication Services Performances { TC “PRF.4 Communication Services Performances ” \l 3 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable enhancements to communication service performance to reduce software overhead. ID Name Category PRF.4.1 IP Forwarding Tables Fast Access And Compact Memory { TC “PRF.4.1

IP Forwarding Tables Fast Access And Compact Memory ” \l 4 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable support of a fast IP forwarding algorithm with predictable performance.

The worst case update lookup time and update order should be predictable and better than O(log2(n)) with n prefixes in the forwarding information table. ID Name Category PRF.4.2

Support of Gigabit Ethernet Jumbo MTU { TC “PRF.4.2 Support of Gigabit Ethernet Jumbo MTU ” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable support for a 9000 byte Maximum Transmission Unit (MTU) for the Gigabit Ethernet protocol to enable lower CPU overhead and better throughput.

This shall be a configurable option as some applications may prefer low latency to large message sizes. ID Name Category PRF.5.0 Efficient Low-Level Asynchronous Events { TC “PRF.5.0 Efficient Low-

Level Asynchronous Events” \l 3 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide an efficient capability for handling a large number of essentially simultaneous asynchronous events arriving on multiple channels, such as multiple sockets or other similar paths.

This mechanism is needed to enforce system scalability and soft real-time responsiveness by reducing contentions appearing at the kernel level, especially under high load.

Notes: See additional information at:

• Asynchronous Event Mechanism (AEM): http://sourceforge.net/projects/aem/

• epoll() (being added to kernel versions beginning with 2.5.46) in combination with libevent from http://monkey.org/~provos/libevent/.

13

http://sourceforge.net/projects/aem/

http://monkey.org/~provos/libevent/


ID Name Category PRF.6.0 Managing Transient Data { TC “PRF.6.0 Managing Transient Data” \l

3 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide support for a self-resizing file system for transient data that can be limited to a maximum size.

Notes: See additional information at:

• /tmpfs implementation in the kernel

• RAMFS: http://www.linuxhq.com/kernel/file/fs/ramfs/

ID Name Category PRF.7.0 Interruptless Ethernet Delivery { TC “PRF.7.0 Interruptless Ethernet

Delivery” \l 3 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide for the capability for Ethernet drivers to operate in a pure polling mode in which they do not generate interrupts for arriving frames. This is to prevent interrupt-storms from consuming too many CPU cycles. This is primarily an issue for gigabit Ethernet.

Notes: See information from /pub/Linux/net-development/NAPI:

• ftp://robur.slu.se/pub/Linux/net-development/NAPI/

• ftp://robur.slu.se/pub/Linux/net-development/NAPI/NAPI_HOWTO.txt

14

ftp://robur.slu.se/pub/Linux/net-development/NAPI/

ftp://robur.slu.se/pub/Linux/net-development/NAPI/NAPI_HOWTO.txt


5 Performance Roadmap ID Name Category PRF.1 Real-Time Support Enhancements { TC “PRF.1 Real-Time

Support Enhancements” \l 3 } Performance

Description: See description in Performance Requirements section above.

ID Name Category PRF.1.8 Use Kernel Mutexes{ TC “PRF.1.8 Use Kernel Mutexes ” \l 4 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall replace internal semaphores when used as mutexes with robust mutexes to, for example, protect against priority inversion. ID Name Category PRF.1.9

Message Queues With Priority Promotion { TC “PRF.1.9 Message Queues With Priority Promotion” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall support thread priority promotion based on message queue priorities. The priority of a receiver thread should be promoted based on the priority of the delivered message, reducing the latency of urgent messages.

Note: Timesys project and general information can be found at: http://tree.celinuxforum.org/pubwiki/moin.cgi/RealTimeWorkingGroup

ID Name Category PRF.1.10 Configurable Scheduling Policies And Priorities { TC “PRF.1.10

Configurable Scheduling Policies And Priorities ” \l 4 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide a means for applications to modify default scheduling policies and priorities of system threads to meet specific requirements related to the application design.

It should also be configurable on an SMP system.

The configuration should be persistent from reboot to reboot and accessible from /proc. ID Name Category PRF.1.11

Implementing Priority Inheritance Inside API { TC “PRF.1.11 Implementing Priority Inheritance Inside API ” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide a means for an application to initialize libraries specifying what kind (if any) of priority inheritance or priority protection capabilities to use for internal mutexes. The ability for an application to have control over the priority capabilities gives an application using these libraries fine-grained control over how mutex contention is handled when processes with differing priorities contend for a resource.

15

http://tree.celinuxforum.org/pubwiki/moin.cgi/RealTimeWorkingGroup


ID Name Category PRF.1.12 Reducing Virtual Memory Access Latency { TC “PRF.1.12

Reducing Virtual Memory Access Latency ” \l 4 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable deterministic latency when memory is accessed. Deterministic memory access time shall be provided to the application, reducing copy-on-write overhead.

A new configuration option shall be provided by the memory management system that will allow priority to be given to the time needed to perform the access rather than to limiting physical memory. When this option is used, physical pages with write access should not be shared among different processes. The copy-on-write option should be enabled by default. ID Name Category PRF.3 Memory Usage { TC “PRF.3 Memory Usage ” \l 3 } Performance

Description: See description in Performance Requirements section above. ID Name Category PRF.3.3

Support More Than 64 Gigabyte Physical Memory. { TC “PRF.3.3 Support More Than 64 Gigabyte Physical Memory ” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable support of more than 64 gigabytes of physical memory, even on 32-bit architectures. Hardware MMU support is required. ID Name Category PRF.4

Communication Services Performances { TC “PRF.4 Communication Services Performances ” \l 3 }

Performance

Description: See description in Performance Requirements section above. ID Name Category PRF.4.3

Prioritized Protocol Processing { TC “PRF.4.3 Prioritized Protocol Processing” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide a prioritized protocol-processing mechanism. The mechanism shall enable a high-priority process to quickly receive data from the network even if massive packets are transmitted for other processes. ID Name Category PRF.4.4

Low Software Overhead For Message Latency { TC “PRF.4.4 Low Software Overhead For Message Latency ” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide a new communication service scheme to support low software latency based on a zero-copy architecture that uses memory mapping and a simplified path between the user and physical layer to reduce abstract layer overhead. ID Name Category PRF.4.5

ARP Cache Immediate Flush On Client { TC “PRF.4.5 ARP Cache Immediate Flush On Client ” \l 4 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable an immediate update to the address resolution protocol (ARP) cache on client nodes when an IP or interface configuration changes to enable fast IP address migration on the client side.

16



Network Storage Replication Performances { TC “PRF.8.0 Network Storage Replication Performances ” \l 3 }

Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide a network storage replication service with the following performance levels:

• Less than 30% decrease in user throughput compared to local storage access using a fast network interface and with full available network bandwidth.

• Less than 25% decrease in user throughput during resynchronization of mirroring devices compared with normal throughput when devices are synchronized.


NFS Performance { TC “PRF.9.0 NFS Peformance ” \l 3 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall enable a fast network file system (NFS) implementation on a local network. The NFS implementation shall reduce message fragmentation and memory copies, enabling fast I/O from client to server.

Note: See http://nfs.sourceforge.net/nfs-howto/performance.html.

17

http://nfs.sourceforge.net/nfs-howto/performance.html


ID Name Category PRF.10.0 CGL Benchmark { TC “PRF.10.0 CGL Benchmark ” \l 3 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall be delivered with a standard benchmark tool allowing measurement of target product performance metrics at a glance. Both hardware and Linux system software capacities should be reflected.

The benchmark tool shall provide applications with a variety of metrics based on application profiles, including the following:

• Processing – CPU usage, memory access, thread context switching time, system call overhead, available memory.

• Real-time – Interrupt and scheduling latency, timer resolution and jiffy granularity.

• Communication/network – Local socket communication, IP forwarding, physical network latency, communication service, latency and bandwidth, and CPU overhead.

• Storage/file system/mirroring – Local disk access, file system local access, file system NFS access, mirroring overhead.

The benchmark tool should provide metrics for both UP and SMP configurations.

It shall be possible to compare at least following protocols:

• Physical layers – Ethernet 100 BT and Gigabit Ethernet

• Network – IP and IPsec

• Transport – TCP and SCTP

UP and SMP configurations should be used for the analysis.

The analysis should include the latency of message transfers and the CPU load generated. It should also address message size and the location of local and remote addresses. The IP performance analysis should take into account the forwarding route table size.

Notes:

LMBENCH http://sourceforge.net/projects/lmbench provides simple metrics like context switch, null system call, and UDP/TCP message latencies.

http://ltp.sourceforge.net/tooltable.php identifies various benchmark and test tools. ID Name Category PRF.11 Application (Pre)loading Capability { TC “PRF.11 Application

(Pre)loading Capability” \l 3 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide support for fully loading an application before beginning execution. mlock() alone does not meet this requirement because it requires superuser privileges.

18

http://sourceforge.net/projects/lmbench

http://ltp.sourceforge.net/tooltable.php


ID Name Category PRF.11.1 Application (Pre)loading Non-Root { TC “PRF.11.1 Application

(Pre)loading Non-Root” \l 4 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide support for the preloading of an application even when the application is not executing as root. A configuration capability must exist to allow the system loader to determine an application’s eligible for preloading.

The action of preloading an application must not overload the system memory. The configuration capability must provide a control that allows the application to specify what is to be done if it can't be pre-loaded. Options are:

• Load anyway as a normal (pageable) application.

• Fail and don't load the application.

Regardless of the option used, any failure to pre-load the application must be logged.

Note: Application preloading glibc patch: http://sources.redhat.com/ml/libc-alpha/2002-05/msg00010.html

ID Name Category PRF.11.2 Application (Pre)loading Limits{ TC “PRF.11.2 Application

(Pre)loading Limits” \l 4 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide mechanisms to avoid overloading a system when preloading applications. Specifically, it shall be possible to specify the total amount of memory reserved (pinned) by preloading applications. ID Name Category PRF.12.0 Flexible Process Scheduling Policy Framework{ TC “PRF.12.0

Flexible Process Scheduling Policy Framework” \l 3 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide the user with a choice of process scheduling policies. A framework shall be provided that:

• Allows multiple process scheduling policies to be easily plugged into the kernel

• Allows a user to select one of the available policies when configuring the kernel for compilation,

• Provides a developer's guide to help developers implement new scheduling policies.

19

http://sources.redhat.com/ml/libc-alpha/2002-05/msg00010.html

http://sources.redhat.com/ml/libc-alpha/2002-05/msg00010.html



Page Flushing{ TC “PRF.13.0 Page Flushing” \l 3 } Performance

Description: OSDL CGL specifies that carrier grade Linux shall provide mechanisms to allow either application- or operator-controllable parameters to modify page-flushing operations. This capability must be configurable on a per-process or per-application basis and also as a global setting. Note that this requirement may have security implications.

Existing functions, such as fsync() and fdatasync(), are possible starting points for a solution for this requirement. These functions apply to files and need to be executed by the application itself rather than by an administrator or a manager program that monitors the system and adjusts for different requirements. For this requirement, the system also needs to be able to flush application memory pages into swap space.

From a functional standpoint, this requirement is meant to allow the system to directly control system memory usage on a very granular basis. During different periods, different applications will be pinned into memory, and the system can be reconfigured to force out some applications and pin others.

Note: fsync() and related functions are possible starting points

20


Appendices

A.1. Performance References [1] Linux Scheduler latency, Clark Williams, Red Hat, Inc. March 2002 http://www.linuxdevices.com/files/article027/rh-rtpaper.pdf

[2] The Linux scalability Project http://www.citi.umich.edu/techreports/reports/citi-tr-99-4.pdf [3] Scalable statistic counter project http://lse.sourceforge.net/counters/statctr.html

[4] Linux 2.5 Timer scalability study from Andy Pfiffer http://developer.osdl.org/andyp/timers/

[5] LK SCTP / TCP performance comparison http://datatag.web.cern.ch/datatag/WP3/sctp/tests.htm

[6] kernel 2.6 includes some scalability enhancements that are referenced in http://www.kernelnewbies.org/status/Status-08-Aug-2003.html

[7] lmbench: Portable Tools for performance analysis http://www.usenix.org/publications/library/proceedings/sd96/full_papers/mcvoy.pdf

[8] Time-critical tasks in Linux 2.6. Concept to increase the preemptability of the Linux kernel. http://inf3-www.informatik.unibw-muenchen.de/research/linux/hannover/automation_conf04.pdf

[9] CELF-RT working group http://tree.celinuxforum.org/pubwiki/moin.cgi/RealTimeWorkingGroup

[10] Integrating New Capabilities into NetPIPE http://www.scl.ameslab.gov/netpipe/np_euro.pdf

21

http://www.linuxdevices.com/files/article027/rh-rtpaper.pdf

http://www.citi.umich.edu/techreports/reports/citi-tr-99-4.pdf

http://lse.sourceforge.net/counters/statctr.html

http://developer.osdl.org/andyp/timers/

http://datatag.web.cern.ch/datatag/WP3/sctp/tests.htm

http://www.kernelnewbies.org/status/Status-08-Aug-2003.html

http://www.usenix.org/publications/library/proceedings/sd96/full_papers/mcvoy.pdf


http://tree.celinuxforum.org/pubwiki/moin.cgi/RealTimeWorkingGroup

http://www.scl.ameslab.gov/netpipe/np_euro.pdf

open source development labs carrier grade linux ... · carrier grade linux performance...

Documents