maximizing oracle rac uptime

42

Upload: markus-michalewicz

Post on 18-Dec-2014

455 views

Category:

Software


6 download

DESCRIPTION

Oracle Open World 2014 presentation [CON8127] on Maximizing Oracle RAC Uptime. This presentation discusses tools integrated into the Oracle RAC Stack and shows which tools to use in the various stages of the system's lifecycle to ensure smooth operation.

TRANSCRIPT

Page 1: Maximizing Oracle RAC Uptime
Page 2: Maximizing Oracle RAC Uptime

Maximizing Oracle RAC Uptime

Ian Cookson, Markus Michalewicz Oracle Real Application Clusters (RAC) Product Management / Development September 29, 2014

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Page 3: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

3

Page 4: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

4

Page 5: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation Installation

5

Page 6: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

spain

Oracle GI | Leaf

• Server OS:

– HUBs 4GB+ memory recommended

• One HUB at a time will host GIMR database.

• Only HUBs will host (Flex) ASM instances.

• Leafs can have less memory, dependent on the use case.

• Installer enforces HUB minimum memory requirement.

– OL 6.5 UEK (other kernels are supported)

Installation – System assumed for this presentation

brazil

argentina germany

Oracle GI | HUB Oracle GI | HUB

Oracle GI | HUB

Oracle RAC Oracle RAC

italy

Oracle GI | Leaf

Oracle RAC

6

Page 7: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• Installation is an infrequent task

• It should be standardized

– Follow: http://www.slideshare.net/MarkusMichalewicz/oracle-rac-12c-collaborate-best-practices-ioug-2014-version

– and come to the Oracle RAC demo booth (3787)

• Tools to use:

1. Linux: pre-install package

2. Cluster Verification Utility (CVU)

3. Oracle Universal Installer (OUI)

Installation

[root@germany ~]# uname –a

3.8.13-16.2.1.el6uek.x86_64 #1 SMP Thu Nov 7 17:01:44 PST 2013

x86_64 x86_64 x86_64 GNU/Linux

#Get the pre-install package

[root@germany Desktop]# yum list oracle-*

oracle-rdbms-server-11gR2-preinstall.x86_64 1.0-7.el6 ol6_latest

oracle-rdbms-server-12cR1-preinstall.x86_64 1.0-8.el6 ol6_latest

7

Page 8: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• OUI provides a simple GUI for:

• Installation and Configuration

• Upgrades

• OUI calls cluvfy for:

• Verification checks

• Generating ‘fixup’ scripts

8

Oracle Universal Installer (OUI)

Page 9: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation Implementation

9

Page 10: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Implementation

10

• Implementation is a recurring task

– Initial implementation

– Change implementation(s) as required

• Implementation tasks are system-specific

• Tools to use:

1. CVU

2. OraChk CVU

OraChk

Page 11: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Cluster Verification Utility (CVU) – Introduction

• Purpose: – Verification of pre-install & post-install cluster setup

– Run manually (command: cluvfy) or as part of the OUI

– Available from OTN and included in Oracle Grid Infrastructure

– Supports the Oracle RAC stack since version 10g Rel. 1

• What does it do?: – Runs specified verification tests and optionally generates a ‘fixup’ script (run under root)

– Utilizes a ‘stage’ concept, enabling users to run the necessary tests for a ‘pre’ or ‘post’ installation

11

Page 12: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

What does CVU Check?

• System requirements – Are the installation requirements met for Clusterware, or RAC?

• Network and connectivity

• Cluster Time Synchronization (CTSS or NTP)

• Existence of required OS users and permissions

• Prerequisites for adding nodes

• etc.

12

Page 13: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 13

CVU for Pre-Implementation Checks

• Purpose:

– Verification of configuration after installation, prior to implementation (is the system ready?)

• What Checks to be Made?:

– Use ‘post’ checks to verify that system is indeed ready, and

– Confirm that post-installation changes made to the system will not cause problems

• Examples:

– cluvfy comp healthcheck -collect cluster -mandatory –deviations -save

Page 14: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

CVU for Pre-Implementation Checks - Example $ cluvfy stage -post hwos -n germany,argentina –verbose

Performing post-checks for hardware and operating system setup

Checking node reachability...

Check: Node reachability from node "germany“ Destination Node Reachable? ------------------------------------ ------------------------ germany yes argentina yes

Result: Node reachability check passed from node "germany“

Checking user equivalence...

Check: User equivalence for user "grid“ Node Name Status ------------------------------------ ------------------------ argentina passed germany passed

Result: User equivalence check passed for user "grid“

14

Page 15: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• OraChk – Formerly RACchk or RACcheck

– aka ExaChk

• RAC Configuration Audit Tool – For details see MOS note ID 1268927.1

• Checks Oracle Stack: – Standalone Database

– Grid Infrastructure & RAC

– Maximum Availability Architecture (MAA) Validation

– Oracle Hardware

OraChk

15

Engineered Systems

require less initial testing

OraChk

Page 16: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 16

OraChk – Installation and Configuration

• Installation: – Download the latest version of orachk (90 day reminder…)

– Unzip in local directory under the oracle user

– Check permission are 755 on orachk

• Configuration: – Run manually or in silent mode (via daemon)

– Implementation – run singly (manually) to validate system setup, etc prior to going live

Page 17: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 17

OraChk – Usage

• Usage : ./orachk [-abvhpfmsuSo:c] -a - all checks

-b - best practices only

-p - patch recommendations only

-f - offline (reports from existing data only)

-u - pre-upgrade checks

-S or -s - for silent installs, with or without SUDO capabilities

-c - check individual components (ie. orachk –a –c ASM)

-o - to invoke optional functionality (ie. to display only non-passing audit checks, verbose format, etc)

-m - exclude MAA checks

-v - what is the tool version?

Page 18: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 18

OraChk – Example

Check Id Status Type Message Status On Details

E960DB20CA5A634F

E04312C0E50A62E0 FAIL SQL Check

Table containing SecureFiles LOB storage belongs

to a tablespace with extent allocation type that is

not SYSTEM managed (not AUTOALLOCATE)

All Databases View

6580DCAAE8A28F5B

E0401490CACF6186 WARNING OS Check

The number of async IO descriptors is too low

(/proc/sys/fs/aio-max-nr)

All Database

Servers View

5ADD88EC8E0AFF2E

E0401490CACF0C10 WARNING OS Check

net.core.wmem_max Is NOT Configured

According to Recommendation

All Database

Servers View

84BE4DE1F00AD833

E040E50A1EC07771 INFO OS Check

Kernel Parameter fs.file-max Is Lower Than The

Recommended Value

All Database

Servers View

66E70B43167837ABE

040E50A1EC02FEA INFO OS Check ORA-00600 errors found in alert log

All Database

Servers View

Database Server

Oracle orachk Assessment Report

System Health Score is 75 out of 100 (detail)

OraChk report in html format Summary with links to content

Page 19: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 19

OraChk – Example

MAA Scorecard

Oracle orachk Assessment Report

System Health Score is 75 out of 100 (detail)

DATA CORRUPTION

PREVENTION BEST

PRACTICES

FAIL OS Check Active Data Guard is not configured All Database Servers View

FAIL SQL Parameter

Check

Database parameter

DB_BLOCK_CHECKSUM is NOT set to

recommended value

All Instances View

OraChk highlights failures Here: Data Guard not setup

Page 20: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

Operation

20

Page 21: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 21

• Operation is an ongoing task

– Oracle Grid Infrastructure provides all necessary tools for normal operation.

• Operation should not create extra tasks

– Automation is the key

• Tools to use:

1. CVU (periodic runs)

2. OraChk (interval runs via daemon)

3. Cluster Health Monitor (CHM/OS)

Operation

CVU

OraChk

Page 22: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Operations – Periodic CVU Checks are the Default

22

[GRID]> crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr ONLINE ONLINE argentina STABLE ONLINE ONLINE brazil STABLE ONLINE ONLINE germany STABLE ...

ora.cvu 1 ONLINE ONLINE brazil STABLE ora.germany.vip 1 ONLINE ONLINE germany ...

[GRID]> crsctl status res ora.cvu -p NAME=ora.cvu TYPE=ora.cvu.type ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r-- ACTIONS= ACTION_SCRIPT= ACTION_TIMEOUT=60 ACTIVE_PLACEMENT=0 AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX% AUTO_START=restore CARDINALITY=1 CHECK_INTERVAL=60 CHECK_RESULTS=PRVF-4090 : Node connectivity failed for interface "*",PRVF-4090 : Node connectivity failed for interface "*",PRVF-4090 : Node connectivity failed for interface "*",PRVF-4090 : Node connectivity failed for interface "*",PRVG-1101 : SCAN name "cupscan.cupgnsdom.localdomain" failed to resolve,PRVF-4657 : Name resolution setup check for "cupscan.cupgnsdom.localdomain" (IP address: 10.1.1.55) failed,PRVF-4090 : Node connectivity failed for interface "*",PRVG-11050 : No matching interfaces "*" for subnet "172.149.0.0" on nodes "argentina,brazil,germany",PRVG-11050 : No matching interfaces "*" for subnet "172.149.0.0" on nodes "argentina,brazil,germany",PRVF-7530 : Sufficient physical memory is not available on node "germany" [Required physical memory = 4GB (4194304.0KB)],PRVF-4354 : Proper hard limit for resource "maximum open file descriptors" not found on node "germany" [Expected = "65536" ; Found = "4096”…

Page 23: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Operations – Setup Periodic OraChk System Checks

23

<<< Configure & start orachk daemon for scheduled interval runs >>>

$ ./orachk -id DBA -set \ > "[email protected];\ > AUTORUN_SCHEDULE = 4,8,12,16,20 * * *;\ > AUTORUN_FLAGS=-profile dba; COLLECTION_RETENTION=30“ $ ./orachk -d start

Page 24: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• Service integrated with the Oracle Clusterware stack

• Introduced in 11.2.0.2 (Linux, Solaris, Windows), 11.2.0.3(AIX)

• Gathers OS level metrics to monitor resource degradation and failure

• Stores data in a central repository (GIMR)

• Runs real time with locked down memory for last gasp analysis

• Integration with QoS (Memory Guard) and CRS (server pool categorization)

• Integrated into EM Cloud Control

Cluster Health Monitor (CHM/OS)

germany argentina

italy brazil

osysmond

Oracle GI Oracle GI

Oracle GI Oracle GI

osysmond

osysmond

osysmond

OLOGGERD

24

Page 25: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Cluster Health Monitor – Deamons / Processes

25

osysmond ologgerd oclumon

Function • Collect OS metrics • Process raw data for subset

of processes • Compress and send data to

ologgerd • Store/forward in case of

network failures

• Consume data from all active osysmonds

• Store data in the repository • Service requests from

clients

• Display OS level metrics in historic/ real time mode

• Perform repository management operations

Managed by ohasd osysmond Command line utility

Instances and location Every node of the cluster (including leaf nodes)

One per cluster (Replica for 11.2.x)

Can be invoked from any hub node in the cluster

Page 26: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 26

Cluster Health Monitor in EM Cloud Control

Page 27: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 27

Cluster Health Monitor in EM Cloud Control

Page 28: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Cluster Health Monitor – command line reporting

• Command line reporting of current and historic OS metrics (oclumon)

– from any hub node in the cluster

• Example: [germany]: > oclumon dumpnodeview -process

28

Page 29: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

Monitoring

29

Page 30: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Monitoring

30

• Monitoring is an ongoing task

– There is optional monitoring available for an Oracle RAC cluster via QoS and Oracle EM

– Quality of Service Management (QoS) comes with a monitoring only feature

• Monitoring is a pro-active task.

• Tools to use:

1. Oracle Enterprise Manager 12c CC

2. Oracle Quality of Service Management (Memory Guard)

Page 31: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 31

Monitoring the RAC Cluster with EM Cloud Control

Page 32: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Quality of Service Management – Memory Guard

• QoS Feature externalized for general use

• Memory Guard protects resources – Receives a stream of OS Memory metrics from CHM/OS

• Issues alert should any server be at risk

• Protects existing work and applications by automatically closing the server to new connections (ie. stops service on at-risk node)

• Automatically re-opens server to connections once the memory pressure has subsided

32

Page 33: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Autonomous Computing

33

QoS CHM

CHA HngMgr

Policy

Self- Optimizing

Self- Protecting

Self- Configuring

Self- Healing

Page 34: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enabling Autonomous Computing

Cluster Health Monitor (CHM)/OS & QoS 11.2+

LOGGERD sysmond

CHM/OS

• QoS Support for Measure only with Performance Objectives and Alerts

• QoS Support for Measuring and Monitoring Admin-Managed Databases

Further QoS & CHM Enhancements in 12.1.0.2

Cluster Health Advisor Coming soon…

34

Page 35: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

Diagnosis

35

Page 36: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• Diagnosis is a recurring task

– Ideally, there will be no incidents on system.

– Realistically, there will be more than one.

• Diagnosis is a reactive task. – It should be performed as efficiently as possible.

• Tools to use:

1. Trace File Analyzer (TFA)

Diagnosis

36

Page 37: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• Trace File Analyzer

– Improved comprehensive first failure diagnostics collection

– Efficient collection, packaging and transfer of data

– Collect for all relevant components (OS, Grid Infra., ASM, RDBMS), including Exadata cell nodes

– One command to collect all information, from all nodes (or single-instance, single-node)

• More information: MOS note ID 1513912.1

Trace File Analyzer (TFA) – log collection in action

37

Page 38: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 38

Trace File Analyzer (TFA) – intelligent log collection

Sending diagcollect request to host : argentina

Getting list of files satisfying time range [Tue Sep 03 14:17:43 PDT 2014, Tue Sep 03 18:17:43 PDT 2014]

germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswiostat/germany_iostat_14.09.03.1500.dat.gz

germany: Zipping File: /u01/app/oracle/diag/rdbms/bill/bill1/trace/alert_bill1.log

Trimming file : /u01/app/oracle/diag/rdbms/bill/bill1/trace/alert_bill1.log with original file size : 109kB

germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswtop/germany_top_14.09.03.1500.dat.gz

germany: Zipping File: /opt/oracle/oak/log/germany/oak/oakd.log

Trimming file : /opt/oracle/oak/log/germany/oak/oakd.log with original file size : 9.2MB

germany: Zipping File: /u01/app/12.1.0.2/grid/log/germany/gipcd/gipcd.log

germany: Zipping File: /u01/app/12.1.0.2/grid/log/germany/agent/ohasd/oraagent_grid/oraagent_grid.log

Trimming file : /u01/app/12.1.0.2/grid/log/germany/agent/ohasd/oraagent_grid/oraagent_grid.log with original filesize 4.3MB

germany: Zipping File: /var/log/messages

germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswslabinfo/germany_slabinfo_14.09.03.1800.dat

Collecting ADR incident files...

Total Number of Files checked : 10543

Total Size of all Files Checked : 3.9GB

Number of files containing required range : 68

Total Size of Files containing required range : 129MB

Number of files trimmed : 10

Total Size of data prior to zip : 144MB

Saved 63MB by trimming files

Zip file size : 8.6MB

Total time taken : 47s.

Logs are collected to:

/opt/oracle/tfa/tfa_home/repository/collection_Tue_Sep_3_18_17_24_PDT_2014_node_all/germany.tfa_Tue_Sep_3_18_17_24_PDT_2014.zip

/opt/oracle/tfa/tfa_home/repository/collection_Tue_Sep_3_18_17_24_PDT_2014_node_all/argentina.tfa_Tue_Sep_3_18_17_24_PDT_2014.zip

$ ./tfactl diagcollect One simple command

OS Watcher files

Pruning

47 seconds! – 1 command, 2 nodes, 4 databases, ASM, Clusterware, OS

Relevant files only

144MB pruned and compressed down to 8.6MB

ADR Incident files

Page 39: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 39

Trace File Analyzer (TFA) – Efficiency from A-Z

germany

Oracle GI | HUB

Oracle RAC

brazil

Oracle GI | HUB

Oracle RAC

LOGs

LOGs

Page 40: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Utility Cluster

40

Enterprise Management (EM) Server

+1

Grid Home Server (Rapid Home Provisioning)

Storage Server

Node2 Node1

Oracle ASM

Oracle Clusterware

ASM ASM

Flex ASM Storage

IOsrv IOsrv

Utility Cluster

Node 1

Database Domain

Application Domain

Application Domain

Application Domain

Database Domain

Application Domain

Application Domain

Application Domain

Node 2

• Utility Cluster – Centralize and standardize storage,

deployment, management and diagnostics

• Architecture: – An Oracle Grid Infrastructure based cluster

– “Solution-in-a-Box” approach on ODA

Page 41: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

41

Page 42: Maximizing Oracle RAC Uptime

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 42