open source development labs carrier grade linux ... · 1 introduction to cgl availability...

21
Open Source Development Labs Carrier Grade Linux Availability Requirements Definition Version 3.1 Prepared by the Carrier Grade Linux Working Group Open Source Development Labs, Inc. 12725 SW Millikan Way, Suite 400 Beaverton, OR 97005 USA Phone: +1-503-626-2455

Upload: others

Post on 23-Aug-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Open Source Development Labs

Carrier Grade Linux

Availability Requirements Definition

Version 3.1

Prepared by the Carrier Grade Linux Working Group

Open Source Development Labs, Inc. 12725 SW Millikan Way, Suite 400 Beaverton, OR 97005 USA

Phone: +1-503-626-2455

Page 2: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Copyright (c) 2005 by The Open Source Development Labs, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is available at http://www.opencontent.org/opl.shtml/). Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder.

Other company, product, or service names may be the trademarks of others.

Linux is a Registered Trademark of Linus Torvalds.

Contributors to the Availability Requirements Definition include (in alphabetical order):

Aziz, Khalid (HP) Badovinatz, Peter (IBM) Chacron, Eric (Alcatel) Cherry, John (OSDL) Christopher, Johnson (Sun) Cress, Andrew (Intel) Dake, Steven (Monta Vista) Flaxa, Ralf (Novell) Fleischer, Julie (Intel) Haddad, Ibrahim (Ericsson) * Ikebe, Takashi (NTT) Ishitsuka, Seiichi (NEC) Kevin, Fox (Sun) Kimura, Masato (NTT Comware) Kukkonen, Mika (Nokia) Liu, Bing Wei (Intel) Manas, Saksena (Timesys) Nakayama, Mitsuo (NEC) Sakuma, Junichi (OSDL) Witham, Timothy (OSDL)

*Specification editor

Comments on the contents of this document should be sent to [email protected] .

Page 3: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

i

1 Introduction to CGL Availability Requirements......................................................1 2 Document Organization ..............................................................................................1 3 Requirements and Roadmap Definitions...................................................................1 4 Availability Requirements...........................................................................................3

AVL.1.0 Robust Mutexes ......................................................................... 3 AVL.2.0 Software ECC Support .............................................................. 3 AVL.3 Forced Device Removal.............................................................. 3 AVL.3.2 Forced Unmount ........................................................................ 4 AVL.4 Memory Overcommit Actions ................................................... 4 AVL.4.1 VM Strict Over-Commit............................................................ 4 AVL.5 Non-Intrusive Monitoring of Processes .................................... 5 AVL.5.1 Kernel-Level Non-Intrusive Application Monitor Without Modifying Application Code ...................................................................... 5 AVL.5.2 Kernel-Level Non-Intrusive Application Monitor Using a Defined API ................................................................................................ 5 AVL.6.0 Disk Predictive Analysis............................................................ 6 AVL.7 Redundant Paths to Resources.................................................. 6 AVL7.1 Multi-Path Access to Storage ..................................................... 6 AVL.8 Fast System Startup Within Kernel Space............................... 7 AVL.8.1 Fast Linux Restart Bypassing System Firmware....................... 7 AVL.9.0 Boot Image Fallback Mechanism .............................................. 7 AVL.10.0 Live Patching ........................................................................... 8 AVL.21.0 Ethernet link bonding .............................................................. 8 AVL.22.0 Software RAID 1 support ........................................................ 8 AVL.23.0 Watchdog Timer Pre-Timeout Interrupt.................................. 8 AVL.24.0 Watchdog Timer Interface Requirements................................ 9 AVL.25.0 Application Heartbeat Monitor................................................ 9 AVL.26.0 Resilient File System Support ................................................. 9

5 Availability Roadmap................................................................................................10 AVL.3 Forced Device Removal............................................................ 10 AVL.3.1 Block Device Removal ............................................................ 10 AVL.3.3 Forced Unmount Application Notification.............................. 10 AVL.4 Memory Overcommit Actions ................................................. 10 AVL.4.2 Replaceable OOM Killer ......................................................... 10 AVL.4.3 Low Memory Condition Monitor ............................................ 11 AVL.4.4 Out Of Memory Notification Mechanism ............................... 11 AVL.5 Non-Intrusive Monitoring of Processes .................................. 11 AVL.5.3 Process-level Non-intrusive Application Monitor................... 12 AVL.7 Redundant Paths to Resources................................................ 12 AVL.7.2 Advanced Multi-Path Access to Storage ................................. 12 AVL.7.3 Redundant Communication Paths........................................... 13 AVL.8 Fast System Startup Within Kernel Space............................. 13 AVL.8.2 Fast Linux Start Using Known-Devices Database .................. 13 AVL.8.3 Parallel Driver Initialization During Startup ........................... 13 AVL.11.0 Fault Isolation Enabling......................................................... 13

Page 4: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

ii

AVL.12.0 NFS Client Protection Across Server Failures ...................... 14 AVL.13.0 Fast System Startup Within User Space ........................... 14 AVL.13.1 Parallel User Initialization During Startup ............................ 14 AVL.14.0 Excessive CPU Cycle Usage Detection................................. 14 AVL.15.0 Fast Application Restart Mechanism..................................... 15 AVL.16.0 Fallback Operation Mechanism............................................. 15 AVL.17.0 Multiple FIB Support............................................................. 15 AVL.18.0 iSCSI Error Handling Support............................................... 15 AVL.19.0 Application Profiler ............................................................... 16 AVL.20.0 Kernel Resources Expansion for Threads.............................. 16

Appendices........................................................................................................................17 A.1 General Systems References..................................................................................17

Page 5: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

1

1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries, and tools essential to a carrier-grade system. These Availability requirements are related to single system availability, such as support for memory failure detection. Requirements related to clustered availability, such as heartbeat monitoring and failover, are in the Clustering requirements section.

2 Document Organization This document is a section of the OSDL Carrier Grade Linux Requirements Definition Version 3.1, which is organized into the separately published sections listed below:

Overview of Requirements Version 3.1

Availability Requirements Definition Version 3.1

Clustering Requirements Definition Version 3.1

Hardware Requirements Definition Version 3.1

Performance Requirements Definition Version 3.1

Security Requirements Definition Version 3.0 (to be released mid-2005)

Serviceability Requirements Definition Version 3.1

Standards Requirements Definition Version 3.1

Released versions of these sections can be found at http://www.osdl.org/lab_activities/carrier_grade_linux/documents.html/document_view.

3 Requirements and Roadmap Definitions Two types of requirements are included in each section of the OSDL Carrier Grade Linux Requirements Definition Version 3.1:

• Requirements –Describes requirements necessary for a CGL system

• Roadmap –Highlights possible future requirements

Each requirement or roadmap item is described as follows:

ID A unique identification number including:

• An acronym identifying a category for the requirement (first field)

• An ID number for the requirement (second field)

• An ID number for a sub-requirement (third field). A “0”in this field indicates the requirement is a stand-alone requirement. An empty field indicates the requirement is a summary requirement with sub-requirements. A number in this field indicates this

Page 6: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

2

requirement is a sequentially numbered sub-requirement

A summary requirement is also indicated by bolding the header of the requirement.

Name Short description of the requirement

Category The category to which the requirement is assigned. The categories for Availability are:

AVL.x.x Availability

Description Detailed description of the requirement.

Page 7: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

3

4 Availability Requirements ID Name Category AVL.1.0 Robust Mutexes Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide an enhancement to the POSIX Thread implementation that provides support for robust mutexes. Robust mutex support shall permit a mutex to synchronize threads, either in the same process or in different processes, even when processes or threads exit or abort unexpectedly.

Applications using a robust mutex shall be able to see various return codes that indicate whether the previous holder of the mutex terminated, and also the recovery status of the state of the mutex. The new holder of the robust mutex shall be able to detect a failure, perform cleanup actions, and re-initialize the mutex for continued use.

If a cleanup of the state protected by the mutex can't be completed, the mutex shall be marked “inconsistent” so that any future attempts to lock it will generate a status indicating that it is inconsistent. The following two modes for setting the mutex to an inconsistent state shall be provided:

• Automatically mark the mutex “inconsistent” when the owner dies and the subsequent owner fails to explicitly mark it healthy.

• Provide an advisory to subsequent owners that the mutex needs to be explicitly marked inconsistent.

For further details, refer to http://www.humanfactor.com/pthreads/posix-threads.html .

ID Name Category AVL.2.0 Software ECC Support Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism for reporting when hardware error checking and correcting (ECC) detects and/or recovers from a single-bit ECC error, and a panic trigger mechanism that is activated whenever hardware ECC detects multi-bit ECC errors.

ID Name Category AVL.3 Forced Device Removal Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide support for forced un-mounting of a file system and block device removal. When a file system is un-mounted, processes shall not be able to access or open files on the file system. When a block device is removed, a hot swap signal shall be sent to the storage controller.

Page 8: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

4

ID Name Category AVL.3.2 Forced Un-mount Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide support for forced un-mounting of a file system. The un-mount shall work even if there are open files in the file system. Pending requests shall be ended with the return of an error value when the file system is un-mounted.

ID Name Category AVL.4 Memory Over-commit Actions Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide the ability to configure a global limit on RAM utilization. This limit is a combination of physical memory and swap space. In addition, adequate information and an interface must be provided to allow a middleware component to take action before the system runs out of memory. This requirement is in addition to or a replacement for the kernel out-of-memory killer.

ID Name Category AVL4.1 VM Strict Over-Commit Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide the ability to control kernel virtual memory allocation adjustments based on the specific needs of the system. Control of virtual memory shall include but not be limited to the following:

• Strict over-commit – The total address space committed for the system is not permitted to exceed swap + a configurable percentage of physical RAM (the default is 50%).

• Heuristic over-commit – Obvious over-commits of address space are refused. Limited to free physical memory + free swap.

Page 9: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

5

ID Name Category AVL.5 Non-Intrusive Monitoring of Processes Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a range of capabilities to enable non-intrusive monitoring of processes. To enable monitoring, some configuration actions may have to be taken to specify which processes are to be monitored. Capabilities may be limited in certain cases, as long as the limitations are known. Capabilities to be provided include the following:

• Processes must be manageable and controllable even if they are not a direct child process of the tools and mechanisms provided to enable these capabilities. A carrier system consists of middleware and processes from many sources, which may be difficult to run from a single parent process, as they will usually require different userids, capabilities, permissions, etc.

• The latency of event detection while processes are being monitored must be as low as possible, preferably occurring immediately upon complete failure of a process.

• The overhead of monitoring the processes should be as low as possible.

• Since inittab does not provide sufficient capabilities to meet this requirement, enhancements to inittab must be provided to address the following limitations:

o Monitors only processes inittab starts

o Limited reactions to process death

o No health check capabilities for non-terminating processes

o No controls on re-spawn loops of processes

ID Name Category AVL.5.1 Kernel-Level Non-Intrusive Application Monitor Without

Modifying Application Code. Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a service to enable non-intrusive monitoring of processes at the kernel level. To enable monitoring, the following capabilities shall be provided;

• Communication between the monitoring process and the kernel.

• Registering a list of processes.

• Ability to define policy based on process events including process/thread creation and exit.

• Ability to take action whenever an event occurs.

ID Name Category AVL.5.2 Kernel-Level Non-Intrusive Application Monitor Using a Defined

API Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a service to enable non-intrusive monitoring of processes at the kernel level through a defined API. Any application to be monitored will need to use this API.

Page 10: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

6

ID Name Category AVL.6.0 Disk Predictive Analysis Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide capabilities to assist in predictive analysis of disks. The aim of this support is to assist in predicting situations likely to lead to failure of disks. This allows preventive action to be taken to avoid the failure and resulting disruption of service.

Note that this could be considered a subset of the requirement SMM.7 Diagnostics and Monitoring Framework, but since isolated mechanisms to support this requirement currently exist, it is listed as a separate requirement.

ID Name Category AVL.7 Redundant Paths to Resources Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to enable redundant access paths to system resources.

The software shall handle sending and receiving data via redundant paths without conflicts, and provide high-availability access to resources even if an error occurs in one of the redundant paths.

ID Name Category AVL.7.1 Multi-Path Access to Storage Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a single cluster node to storage devices. The software shall determine if multiple paths exist to the same port of the I/O device, and, with configurable controls, balance I/O requests across multiple host bus adapters. If multiple paths exist to the same device over two separate device ports on the same host bus adapter, those I/Os will not be balanced.

Handling a path failure must be automatic. A mechanism must be provided for the reactivation of failed paths, allowing them to be placed back in service. It must be possible to automatically determine and configure multiple paths. Automatic configuration shall allow automatic multi-path configuration of complete disks and partitions located on those disks

A multipath device feature that allows multipath detection and mapping early in the boot process must be provided so that the root file system can exist on a multipath device.

Page 11: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

7

ID Name Category AVL.8 Fast System Startup Within Kernel Space Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide capabilities to allow a single system to move from power-on to ready in as short a time as possible.

The normal startup sequence includes:

1. Power on and boot (includes BIOS initialization)

2. Load the Linux image

3. Start and initialize Linux

A cold start (BIOS to operating system handoff) comprises steps 1 through 3. A warm start (operating system to operating system handoff) comprises steps 2 and 3.

Fast system startup capabilities include the ability to:

• Bypass BIOS initialization by beginning the startup sequence at step 2 (see AVL 10.1).

• Bypass initialization of the Linux image in step 3 (See AVL 10.2).

• Complete a parallel initialization of device drivers in step 3 (See AVL 10.3).

ID Name Category AVL.8.1 Fast Linux Restart Bypassing System Firmware Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to speed up operating system initialization by bypassing the system firmware when one instance of Linux reboots to another instance of Linux.

ID Name Category AVL.9.0 Boot Image Fallback Mechanism Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism that enables a system to fallback to a previous "known good" boot image in the event of a catastrophic boot failure (i.e. failure to boot, panic on boot, failure to initialize HW/SW). System images are captured from the "known good" system and the system reboots to the latest good image. This mechanism would allow an automatic fallback mechanism to protect against problems resulting from system changes, such as program updates, installations, kernel changes, and configuration changes."

Page 12: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

8

ID Name Category AVL.10.0 Live Patching Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism and framework by which a custom application can be built so that it can be upgraded by replacing symbols in its live process. Dynamic replacement of symbols allows a process to access upgraded functions or values without requiring a process restart and in many circumstances can lead to improved process availability and uptime. The mechanism should be applied only to user applications. Patch to underlying distribution software component may lose distribution support.

ID Name Category AVL.21.0 Ethernet link bonding Availability

Description: OSDL CGL shall support bonding of multiple Ethernet NICs within a single node. The bonding support following functions;

1. Ethernet link aggregation.

Support multiple Ethernet cards to be bonded for bandwidth aggregation.

2. Ethernet link failover.

Support automatic failover of an IP address from one Ethernet NIC to another within a single node using the Ethernet bonding.

• Some mode of bonding requires IEEE 802.3ad support on switches; however, other modes do not require special protocol support.

ID Name Category AVL.22.0 Software RAID 1 support Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide RAID 1(Mirroring) support so that the OS maintains duplicate sets of all data on separate disk drives.

RAID 1 support shall allow booting off of selected mirror disk drive even if the other drive is failed.

RAID 1 implementation shall provide a user-controllable parameter to throttle the syncing operation. Support can be configured out if desired.

ID Name Category AVL.23.0 Watchdog Timer Pre-Timeout Interrupt Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide support for a watchdog timer pre-timeout interrupt. Where the hardware supports such a capability an interrupt handler routine will be called before the real timeout occurs.

Page 13: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

9

ID Name Category AVL.24.0 Watchdog Timer Interface Requirements Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide the ability to use the Linux /dev/watchdog interface to reset the hardware watchdog timer. This timeout value shall be a configurable item. A configurable action can be performed when a timeout occurs.

ID Name Category AVL.25.0 Application Heartbeat Monitor Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide an application heartbeat service that allows applications to register to be monitored via specified APIs. The mechanism shall use periodic synchronized events (heartbeats) between an application and the monitor. If a registered application fails to provide a heartbeat, the monitor shall report the events.

The application heartbeat service shall be available to any process or sub-process (thread) entity on the system. A process or thread may register for multiple heartbeats.

ID Name Category AVL.26.0 Resilient File System Support Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide support for the installation of a file system that is resilient against system failures in terms of recovering rapidly upon reboot without requiring a full, traditional fsck. This is normally achieved using logging or journaling techniques.

Page 14: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

10

5 Availability Roadmap ID Name Category AVL.3 Forced Device Removal Availability

Description: See description in Availability Requirements section above.

ID Name Category AVL.3.1 Block Device Removal

Description: OSDL CGL specifies that Linux shall allow removal of a block device while it is in use without degrading the reliability of the system. The block device shall be removable even if it has been placed in use by an open file command, such as fdisk /dev/sda; it is a member of a RAID-1 volume; a file system is mounted on the device; or a combination of these. If a file is in use and it cannot be serviced by a mirrored disk, the operating system shall return an error to the system calls referencing that file.

ID Name Category AVL.3.3 Forced Un-mount Application Notification Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a notification mechanism when a forced un-mount of a file system occurs. The notification mechanism should send a signal or other message to a process that attempts to access a file on an un-mounted volume.

ID Name Category AVL.4 Memory Over-commit Actions Availability

Description: See description in Availability Requirements section above.

ID Name Category AVL.4.2 Replaceable OOM Killer Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide mechanisms to allow the replacement of the out-of-memory (OOM) killer algorithm within the kernel. In an environment in which an application is made up of many processes, the act of killing any single process may prevent the application from continuing to provide service while leaving its remaining processes running and preventing proper recovery. Hence it must be possible to provide a replacement algorithm that can take the relationships between processes into account when determining which ones to slay. By default the current algorithm in the kernel is used. The new algorithm can be activated by loading the relevant kernel module.

Page 15: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

11

ID Name Category AVL.4.3 Low Memory Condition Monitor Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a low memory condition monitor. To avoid encountering a true out-of-memory (OOM) condition in the Linux kernel, a user-space facility should be provided to monitor memory usage and take action based on a configurable low-memory threshold. This threshold would be set to predict an OOM condition before it becomes critical. The threshold would apply to both physical memory and swap area.

The application should record the top N memory-consuming processes, so that when the threshold is reached, processes that are not on the user-defined do-not-kill list that are trending up in memory use can be killed. This capability would allow the application to tell the kernel to stop allocating memory to user-space processes. When applications run out of pre-allocated memory, the system could remain nominally in service until more memory becomes available.

ID Name Category AVL.4.4 Low Memory Notification Mechanism Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a low memory notification mechanism.

Whenever a low memory condition is detected, the mechanism shall generate a remote notification. Notification methods shall support enterprise-level notification protocols such as SNMP or CIM. See:

• STD.7 SNMP (for IPv4 and IPv6)

• STD.12.0 CIM

ID Name Category AVL.5 Non-Intrusive Monitoring of Processes Availability

Description: See description in Availability Requirements section above.

Page 16: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

12

ID Name Category AVL.5.3 Process-Level Non-Intrusive Application Monitor Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide control and management capabilities for processes that cannot be altered to incorporate a monitoring API. Such capabilities are known as non-intrusive monitoring. These capabilities must be implemented programmatically using commands or scripts.

Another issue for many such processes is that the start script itself may spawn an application process that is not under the control of the management process. This sub-requirement assumes that this does not happen, and the child process remains under the control of the management entity.

Capabilities required:

• The following capabilities must be enabled for controlling processes:

o The ability to start a process (or a list of processes)

o The ability to stop a process (or a list of processes)

• The following capabilities must be enabled for monitoring processes:

o The ability to detect the unexpected exit of a process

o The ability to configure a set of actions in response to an unexpected exit of a process

• The following services must be provided beyond those currently provided by inittab:

o The ability to configure whether to restart the application if the process dies

o A configurable amount of time to wait before restarting the application

o A limit on the number of times to restart the application

ID Name Category AVL.7 Redundant Paths to Resources Availability

Description: See description in Availability Requirements section above.

ID Name Category AVL.7.2 Advanced Multi-Path Access to Storage Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a node to storage devices. The mechanism should implement the following features:

• Ability to boot from SAN storage using the multipath mechanism.

• Ability to use a swap partition on a multipath disk.

• Kernel support for a path-switching policy.

• Error logs must provide easy device identification

Page 17: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

13

ID Name Category AVL.7.3 Redundant Communication Paths Availability

Description: OSDL CGL specifies that Linux shall provide support for redundant communication paths between nodes to improve network availability. The system should handle sending and receiving data between nodes via redundant communication paths without any conflicts. The path should form logical or physical end-to-end redundant paths.

ID Name Category AVL.8 Fast System Startup Within Kernel Space Availability

Description: See description in Availability Requirements section above.

ID Name Category AVL.8.2 Fast Linux Start Using Known-Devices Database Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to speed up operating system initialization. The improvement in boot speed could be achieved by leveraging boot load to inform the operating system of previously connected devices, or the known devices could be derived from a previously running instance of the operating system.

ID Name Category AVL.8.3 Parallel Driver Initialization During Startup Availability

Description: OSDL CGL specifies that, if multiple drivers are compiled into the Linux Kernel, the initialization or probing routines of those drivers execute in parallel. CGL further specifies that, if multiple drivers are to be loaded as modules, the driver modules are loaded in parallel. CGL further specifies that in either of these two cases, a driver is only initialized once its dependent drivers have initialized.

ID Name Category AVL.11.0 Fault Isolation Enabling Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide support to report anomalies detected on a compute node. The objective in reporting these anomalies is to provide data for fault isolation mechanisms. Software-related failures may require actions like the restart or termination of a process or the unloading and reinstallation of a kernel module. Hardware-related failures may require actions to restart, turn off, or isolate a failing device.

OSDL CGL specifies that carrier grade Linux shall provide mechanisms to isolate faulty software or hardware components. These mechanisms can be activated by management middleware fault isolation policies.

Page 18: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

14

ID Name Category AVL.12.0 NFS Client Protection Across Server Failures Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide mechanisms that allow an NFS server to have failover capability to provide service continuity upon a node failure. The NFS service has to be resumed on another node without any impact on NFS clients other than the retransmission of pending requests (open files must remain open). Clients authenticated on the old server must remain authenticated on the new server.

ID Name Category AVL.13 Fast System Startup Within User Space Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a variety of capabilities to allow a single system to move from a power-on state to an application-ready state in as short a time as possible.

The normal startup sequence includes:

1. Power on and boot (includes BIOS initialization)

2. Load the Linux image

3. Start and initialize Linux

4. Start application

ID Name Category AVL.13.1 Parallel User Initialization During Startup Availability

Description: OSDL CGL specifies that the user initialization procedure executed by the program /sbin/init shall provide a mechanism to allow multiple init scripts to run in parallel. CGL further specifies that a service is only started once its dependent services have started.

ID Name Category AVL.14.0 Excessive CPU Cycle Usage Detection Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism that detects excessive CPU cycle usage by any process or thread. To enable detection, the following capabilities shall be provided:

• Communication between the monitoring process and the kernel.

• Registering a list of processes or threads and their allowed CPU cycle thresholds.

• Ability to define policy based on process events including process/thread creation and exit.

• Ability to take action whenever an event occurs.

• Ability to set the CPU cycle threshold to a resolution of one millisecond.

Page 19: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

15

ID Name Category AVL.15.0 Fast Application Restart Mechanism Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism that enables a quick application restart. Typical applications in a carrier environment use multiple processes with inter-process communications. As applications become more complex, application initialization times become longer.

To speed up application initialization, the mechanism shall provide the functionality to simultaneously save memory images of multiple processes (including the kernel resources used by each process) and to restore the images.

When the application completes initialization, including making connections between processes and setting up kernel resources for inter-process communication, the application invokes a save function that makes a copy of the memory images of the process and kernel resources. If the application hangs, the mechanism restores the memory images and kernel resources and restarts the application.

ID Name Category AVL.16.0 Fallback Operation Mechanism Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism that enables or disables specific functions that allow system fallback mode operation when an overload condition is detected. It is desirable that the mechanism provide the functions below:

• A softirq-based interrupt handler.

• Temporal roll-in/roll-out.

• Temporal low priority daemon execution stops.

ID Name Category AVL.17.0 Multiple FIB Support Availability

Description: OSDL CGL specifies that Linux shall support multiple Forwarding Information Base (FIB) quick look-up tables with forwarding addresses to allow better server virtualization of overlapping addresses.

An FIB is a table that contains a copy of the forwarding information in the IP routing table. All hooks/changes required to support multiple FIBs shall be added.

ID Name Category AVL.18.0 iSCSI Error Handling Support Availability

Description: OSDL CGL specifies that the iSCSI Initiators implemented by carrier grade Linux should support the following iSCSI options:

• Header and Data Digests

• Error recovery level 1 as specified by RFC3270

Page 20: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

16

ID Name Category AVL.19.0 Application Profiler Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to profile critical resources of the kernel and applications. The critical resources that are profiled by this mechanism shall include (but are not limited to):

• Time used

• Memory used

• Number of semaphores, mutexes, sockets, and threads/child processes in use

• Number of open files.

Monitoring shall happen at configurable, periodic intervals or as initiated by the user.

ID Name Category AVL.20.0 Kernel Resources Expansion for Threads Availability

Description: OSDL CGL specifies that carrier grade Linux shall expand available kernel resources to provide additional support for threads. The existing thread model is defined as a lightweight process model; therefore some thread kernel resources are missing. Threads are widely used in carrier grade level applications, so at least the following additional kernel resource functionality shall be provided to support threads:

3. Full SIGNAL support – The SIGNAL should be sent to each thread.

4. Full rlimit support – The rlimit parameter should be supported for each thread.

Page 21: Open Source Development Labs Carrier Grade Linux ... · 1 Introduction to CGL Availability Requirements This section contains requirements that apply to the Linux kernel, core libraries,

Carrier Grade Linux Availability Requirements Definition Version 3.1

17

Appendices

A.1 General Systems References POSIX:

• http://www.opengroup.org/ • http://www.unix.org/online.html • http://www.opengroup.org/onlinepubs/007908799/ • http://posixtest.sf.net for more POSIX conformance data on Linux. • POSIX Technical Corrigendum 1 text:

http://www.opengroup.org/pubs/catalog/u057.htm

• POSIX Specification with current Technical Corrigendum: http://www.unix.org/version3/

Linux Standard Base, Free Standards Group: • http://www.linuxbase.org/ • http://www.freestandards.org/

Service Availability Forum: • http://www.saforum.org/

IETF: • http://www.ietf.org/rfc.html