rac background processes

13
A cluster comprises multiple interconnected computers or servers that appear as if they are one server to end users and applications. Oracle Clusterware is a portable cluster management solution that is integrated with the Oracle database. Oracle Clusterware enables you to create a clustered pool of storage to be used by any combination of single-instance and Oracle RAC databases. Oracle Clusterware is the only clusterware that you need for most platforms on which Oracle RAC operates. You can also use clusterware from other vendors if the clusterware is certified for Oracle RAC. Single-instance Oracle databases have a one-to-one relationship between the Oracle database and the instance. Oracle RAC environments, however, have a one-to-many relationship between the database and instances. An Oracle RAC database can have up to 100 instances, all of which access one database. All database instances must use the same interconnect, which can also be used by Oracle Clusterware. Each Oracle RAC database instance also has: At least one additional thread of redo for each instance An instance-specific undo tablespace Figure 1-1 shows how Oracle RAC is the Oracle Database option that provides a single system image for multiple servers to access one Oracle database. In Oracle RAC, each Oracle instance usually runs on a separate server.

Upload: abdul-nabi

Post on 26-Nov-2014

107 views

Category:

Documents


2 download

TRANSCRIPT

A cluster comprises multiple interconnected computers or servers that appear as if they are one server to end users and applications.  Oracle Clusterware is a portable cluster management solution that is integrated with the Oracle database.  Oracle Clusterware enables you to create a clustered pool of storage to be used by any combination of single-instance and Oracle RAC databases.

Oracle Clusterware is the only clusterware that you need for most platforms on which Oracle RAC operates. You can also use clusterware from other vendors if the clusterware is certified for Oracle RAC.

Single-instance Oracle databases have a one-to-one relationship between the Oracle database and the instance. Oracle RAC environments, however, have a one-to-many relationship between the database and instances.  An Oracle RAC database can have up to 100 instances, all of which access one database. All database instances must use the same interconnect, which can also be used by Oracle Clusterware.

Each Oracle RAC database instance also has:

At least one additional thread of redo for each instance An instance-specific undo tablespace

Figure 1-1 shows how Oracle RAC is the Oracle Database option that provides a single system image for multiple servers to access one Oracle database. In Oracle RAC, each Oracle instance usually runs on a separate server.

Oracle Clusterware for Oracle Real Application Clusters

Oracle Clusterware provides a complete, integrated clusterware management solution on all Oracle Database platforms. This clusterware functionality provides all of the features required to manage your cluster database including node membership, group services, global resource management, and high availability functions. You can install Oracle Clusterware independently or as a prerequisite to the Oracle RAC installation process. Oracle database features such as services use the underlying Oracle Clusterware mechanisms to provide their capabilities. Oracle also continues to support select third-party clusterware products on specified platforms.

Oracle Clusterware is designed for, and tightly integrated with, Oracle RAC.  When you create an Oracle RAC database using any of the management tools, the database is registered with and managed by Oracle Clusterware, along with the other Oracle processes such as Virtual Internet Protocol (VIP) address, Global Services Daemon (GSD), the Oracle Notification Service (ONS), and the Oracle Net listeners. These resources are automatically started when Oracle Clusterware starts the node and automatically restarted if they fail. The Oracle Clusterware daemons run on each node.

You can use Oracle Clusterware to manage high-availability operations in a cluster. Anything that Oracle Clusterware manages is known as a CRS resource, which could be a database, an instance, a service, a listener, a VIP address, an application process, and so on. Oracle Clusterware manages CRS resources based on the resource’s configuration information that is stored in the Oracle Cluster Registry (OCR). You can use SRVCTL commands to administer other node resources.

Oracle Real Application Clusters Architecture and Processing

At a minimum, Oracle RAC requires a cluster software infrastructure that can provide concurrent access to the same storage and the same set of data files from all nodes in the cluster, a communications protocol for enabling interprocess communication (IPC) across the nodes in the cluster, enable multiple database instances to process data as if the data resided on a logically combined, single cache, and a mechanism for monitoring and communicating the status of the nodes in the cluster.

Understanding Cluster-Aware Storage Solutions

An Oracle RAC database is a shared everything database. All data files, control files, SPFILEs, and redo log files in Oracle RAC environments must reside on cluster-aware shared disks so that all of the cluster database instances can access these storage components. All database instances must use the same interconnect, which can also be used by Oracle Clusterware. Because Oracle RAC databases use a shared everything architecture, Oracle RAC requires cluster-aware storage for all database files.

In Oracle RAC, the Oracle Database software manages disk access and the Oracle software is certified for use on a variety of storage architectures. It is your choice as to how to configure your disk, but you must use a supported cluster-aware storage solution. Oracle Database provides the following file storage options for Oracle RAC:

Automatic Storage Management (ASM)

This is the recommended solution to manage your disk.

OCFS2 and Oracle Cluster File System (OCFS)

OCFS2 is available for Linux and OCFS is available for Windows platforms. However you may optionally use a third-party cluster file system or cluster-aware volume manager that is certified for Oracle RAC.

A network file system Raw devices

Overview of Connecting to the Oracle Database Using Services and VIP Addresses

All nodes in an Oracle RAC environment must connect to a Local Area Network (LAN) to enable users and applications to access the database. Applications should use the Oracle Databaseservices feature to connect to an Oracle database. Services enable you to define rules and characteristics to control how users and applications connect to database instances. These characteristics include a unique name, workload balancing and failover options, and high availability characteristics. Oracle Net Services enable the load balancing of application connections across all of the instances in an Oracle RAC database.

Users can access an Oracle RAC database using a client/server configuration or through one or more middle tiers, with or without connection pooling. Users can be database administrators, developers, application users, power users, such as data miners who create their own searches, and so on.

Most public networks typically use TCP/IP, but you can use any supported hardware and software combination. Oracle RAC database instances can be accessed through a database’s default IP address and through VIP addresses.

The interconnect network is a private network that connects all of the servers in the cluster. The interconnect network uses a switch (or multiple switches) that only the nodes in the cluster can access. Configure User Datagram Protocol (UDP) on a Gigabit Ethernet for your cluster interconnect. On Linux and Unix systems, you can configure Oracle Clusterware to use either the UDP or Reliable Data Socket (RDS) protocols. Windows clusters use the TCP protocol. Crossover cables are not supported for use with Oracle Clusterware interconnects.

In addition to the node’s host name and IP address, you must also assign a virtual host name and an IP address to each node. You should use the virtual host name or VIP address to connect to the database instance. For example, you might enter the virtual host name CRM in the address list of the tnsnames.ora file.

A virtual IP address is an alternate public address that client connections use instead of the standard public IP address. To configure VIP addresses, you need to reserve a spare IP address for each node, and the IP addresses must use the same subnet as the public network.

If a node fails, then the node’s VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle connections. Generally, VIP addresses fail over when:

The node on which a VIP address runs fails All interfaces for the VIP address fail All interfaces for the VIP address are disconnected from the network

Clients that attempt to connect to the VIP address receive a rapid connection refused error instead of waiting for TCP connect timeout messages. You configure VIP addresses in the address list for your database connection definition to enable connectivity.

If you use Network Attached Storage (NAS), then you are required to configure a second private network. Access to this network is typically controlled by the vendor’s software. The private network uses static IP addresses.

About Oracle Real Application Clusters Software Components

Oracle RAC databases have two or more database instances that each contain memory structures and background processes. An Oracle RAC database has the same processes and memory structures as a single-instance Oracle database as well as additional process and memory structures that are specific to Oracle RAC. Any one instance’s database view is nearly identical to any other instance’s view in the same Oracle RAC database; the view is a single system image of the environment.

Each instance has a buffer cache in its System Global Area (SGA). Using Cache Fusion, Oracle RAC environments logically combine each instance’s buffer cache to enable the instances to process data as if the data resided on a logically combined, single cache.

To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances, which effectively increases the size of the SGA for an Oracle RAC instance.

After one instance caches data, any other instance within the same cluster database can acquire a block image from another instance in the same database faster than by reading the block from disk. Therefore, Cache Fusion moves current blocks between instances rather than re-reading the blocks from disk. When a consistent block is needed or a changed block is required on another instance, Cache Fusion transfers the block image directly between the affected instances. Oracle RAC uses the private interconnect for interinstance communication and block transfers. The GES Monitor and the Instance Enqueue Process manages access to Cache Fusion resources and enqueue recovery processing.

About Oracle Real Application Clusters Background Processes

The GCS and GES processes, and the GRD collaborate to enable Cache Fusion. The Oracle RAC processes and their identifiers are as follows:

ACMS—Atomic Controlfile to Memory Service (ACMS)

In an Oracle RAC environment, the atomic controlfile to memory service (ACMS) per-instance process is an agent that contributes to ensuring a distributed SGA memory update is either globally committed on success or globally aborted in the event of a failure.

GTX0-j—Global Transaction Process

The GTX0-j process provides transparent support for XA global transactions in a RAC environment. The database autotunes the number of these processes based on the workload of XA global transactions.

LMON—Global Enqueue Service Monitor

The LMON process monitors global enqueues and resources across the cluster and performs global enqueue recovery operations.

LMD—Global Enqueue Service Daemon

The LMD process manages incoming remote resource requests within each instance.

LMS—Global Cache Service Process

The LMS process maintains records of the datafile statuses and each cached block by recording information in a Global Resource Directory (GRD). The LMS process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances. This processing is part of the Cache Fusion feature.

LCK0—Instance Enqueue Process

The LCK0 process manages non-Cache Fusion resource requests such as library and row cache requests.

RMSn—Oracle RAC Management Processes (RMSn)

The RMSn processes perform manageability tasks for Oracle RAC. Tasks accomplished by an RMSn process include creation of resources related Oracle RAC when new instances are added to the clusters.

RSMN—Remote Slave Monitor manages background slave process creation and communication on remote instances. These background slave processes perform tasks on behalf of a coordinating process running in another instance.

TOM BURLESON

The Background Processes

Figure. 4.6 shows various background processes (as listed following the figure) that are spawned to execute the database processing.

Figure 4.6 The Background Processes

SMON - System Monitor process recovers after instance failure and monitors temporary segments and extents.  SMON in a non-failed instance can also perform failed instance recovery for other failed RAC instance.

PMON - Process Monitor process recovers failed process resources. If MTS (also called Shared Server Architecture) is being utilized.  PMON monitors and restarts any failed dispatcher or server processes.  In RAC, PMON’s role as service registration agent is particularly important.

DBWR - Database Writer or Dirty Buffer Writer process is responsible for writing dirty buffers from the database block cache to the database data files.  Generally, DBWR only writes blocks back to the data files on commit, or when the cache is full and space has to be made for more blocks.  The possible multiple DBWR processes in RAC must be coordinated through the locking and global cache processes to ensure efficient processing is accomplished.

LGWR - Log Writer process is responsible for writing the log buffers out to the redo logs. I n RAC, each RAC instance has its own LGWR process that maintains that instance’s thread of redo logs.

ARCH - (Optional) Archive process writes filled redo logs to the archive log location(s). In RAC, the various ARCH processes can be utilized to ensure that copies of the archived redo logs for each instance are available to the other instances in the RAC setup should they be needed for recovery.

CKPT - Checkpoint process writes checkpoint information to control files and data file headers.

Pnnn - (Optional) Parallel Query Slaves are started and stopped as needed to participate in parallel query operations.

CQJ0 - Job queue controller process wakes up periodically and checks the job log.  If a job is due, it spawns Jnnn processes to handle jobs.

Jnnn - (Optional) Job processes used by the Oracle9i job queues to process internal Oracle9i jobs. The CQJ0 process controls it automatically.

QMN - (Optional) Advanced Queuing process is used to control the advanced queuing jobs.

Snnn - (Optional) Pre-spawned shared server processes are used by the multi-threaded server (MTS) process to handle connection requests from users, and act as connection pools for user processes.  These user processes also handle disk reads from database datafiles into the database block buffers.

Dnnn - (Optional) Dispatcher process for shared server (MTS) - It accepts connection requests and portions them out to the pre-spawned server processes.

MMON – This process performs various manageability-related background tasks, for example:

Issuing alerts whenever a given metrics violates its threshold value 

Capturing statistics value for SQL objects which have been recently modified

MMNL - This process performs frequent and lightweight manageability-related tasks, such as session history capture and metrics computation.

MMAN - is used for internal database tasks that manage the automatic shared memory. MMAN serves as the SGA Memory Broker and coordinates the sizing of the memory components.

RBAL - This process coordinates rebalance activity for disk groups in an Automatic Storage Management instance.

ORBn - performs the actual rebalance data extent movements in an Automatic Storage Management instance.  There can be many of these at a time, called ORB0, ORB1, and so forth.

OSMB - is present in a database instance using an Automatic Storage Management disk group.  It communicates with the Automatic Storage Management instance.

FMON - The database communicates with the mapping libraries provided by storage vendors through an external non-Oracle Database process that is spawned by a background process called FMON. FMON is responsible for managing the mapping information.  When you specify the FILE_MAPPING initialization parameter for mapping data files to physical devices on a storage subsystem, then the FMON process is spawned.

LMON - The Global Enqueue Service Monitor (LMON) monitors the entire cluster to manage the global enqueues and the resources.  LMON manages instance and process failures and the associated recovery for the Global Cache Service (GCS) and Global Enqueue Service (GES).  In particular, LMON handles the part of recovery associated with global resources. LMON-provided services are also known as cluster group services (CGS)

LMDx - The Global Enqueue Service Daemon (LMD) is the lock agent process that manages enqueue manager service requests for Global Cache Service enqueues to control access to global enqueues and resources.  The LMD process also handles deadlock detection and remote enqueue requests.  Remote resource requests are the requests originating from another instance.

RAC Specific Background Processes

LMSx - The Global Cache Service Processes (LMSx) are the processes that handle remote Global Cache Service (GCS) messages. Real Application Clusters software provides for up to 10 Global Cache Service Processes.  The number of LMSx varies depending on the amount of messaging traffic among nodes in the cluster.

The LMSx handles the acquisition interrupt and blocking interrupt requests from the remote instances for Global Cache Service resources. For cross-instance consistent read requests, the

LMSx will create a consistent read version of the block and send it to the requesting instance. The LMSx also controls the flow of messages to remote instances.

The LMSn processes handle the blocking interrupts from the remote instance for the Global Cache Service resources by:

Managing the resource requests and cross-instance call operations for the shared resources. 

Building a list of invalid lock elements and validating the lock elements during recovery. 

Handling the  global lock deadlock detection and Monitoring for the lock conversion timeouts

RAC Specific Processes

LCKx - This process manages the global enqueue requests and the cross-instance broadcast. Workload is automatically shared and balanced when there are multiple Global Cache Service Processes (LMSx).

DIAG: Diagnosability Daemon – Monitors the health of the instance and captures the data for instance process failures.

The following shows typical background processes of the RAC instance named NYDB1.

$ rac-1a:NYDB1:/app/home/oracle >ps -ef | grep ora_oracle   31136     1  0 08:45 ?        00:00:00 ora_pmon_NYDB1oracle   31138     1  0 08:45 ?        00:00:00 ora_diag_NYDB1oracle   31141     1  0 08:45 ?        00:00:00 ora_lmon_NYDB1oracle   31143     1  0 08:45 ?        00:00:04 ora_lmd0_NYDB1oracle   31145     1  0 08:45 ?        00:00:03 ora_lms0_NYDB1oracle   31147     1  0 08:45 ?        00:00:03 ora_lms1_NYDB1oracle   31149     1  0 08:45 ?        00:00:00 ora_mman_NYDB1oracle   31151     1  0 08:45 ?        00:00:01 ora_dbw0_NYDB1oracle   31153     1  0 08:45 ?        00:00:01 ora_lgwr_NYDB1oracle   31155     1  0 08:45 ?        00:00:05 ora_ckpt_NYDB1oracle   31157     1  0 08:45 ?        00:00:05 ora_smon_NYDB1oracle   31159     1  0 08:45 ?        00:00:00 ora_reco_NYDB1oracle   31161     1  0 08:45 ?        00:00:00 ora_cjq0_NYDB1oracle   31163     1  0 08:45 ?        00:00:00 ora_d000_NYDB1oracle   31165     1  0 08:45 ?        00:00:00 ora_s000_NYDB1oracle   31168     1  0 08:45 ?        00:00:02 ora_lck0_NYDB1oracle   31190     1  0 08:46 ?        00:00:00 ora_arc0_NYDB1oracle   31193     1  0 08:46 ?        00:00:02 ora_arc1_NYDB1oracle   31207     1  0 08:46 ?        00:00:00 ora_qmnc_NYDB1oracle   31210     1  0 08:46 ?        00:00:07 ora_mmon_NYDB1oracle   31213     1  0 08:46 ?        00:00:00 ora_mmnl_NYDB1oracle   31286     1  0 08:46 ?        00:00:00 ora_q000_NYDB1oracle   31288     1  0 08:46 ?        00:00:00 ora_q001_NYDB1oracle   31290     1  0 08:46 ?        00:00:00 ora_q002_NYDB1oracle   18041     1  0 20:41 ?        00:00:06 ora_j000_NYDB1oracle   25579     1  0 23:19 ?        00:00:00 ora_pz99_NYDB1oracle   25581     1  0 23:19 ?        00:00:00 ora_pz98_NYDB1oracle   26703 19731  0 23:23 pts/5    00:00:00 grep ora_