aix powerha 讨论_20130109

AIX POWERHA 讨论

sina@冰砖帮帮忙

2

Basics

It uses services provided by the RSCT subsystems to monitor the status of

the nodes and their interfaces. It receives ibformation from Topology Sevices

and uses Group Services for inter-node communication. It invokes the

appropriate scripts in response to node or network events.(recovering from

SW/HW failures, request to online/offline a node, request to

move/online/offline a resource group) It maintains update informations about

the resource groups (status, location) A daemon which runs on each cluster

nodes.

clstrmgr

3

Basics

If clstrmgr hangs or is terminated the default action taken by SRC is to issue

halt -q, causing the system to crash. Clstrmgr is dependent on RSCT; if

topsvcs or grpsvcs has problems with starting, the clstrmgr will not start

either.

clstrmgr

4

Basics

Clinfo obtains updated cluster information from the Cluster Manager. It

makes information about the state of the cluster, nodes, networks and

applications. Used by clstat, and it is optional on cluster nodes and clients.

# startsrc -s clinfoES starts clinfo

// usr/es/sbin/cluster/etc/rc.cluster this script also starts everything

# stopsrc -s clinfoES stops clinfo

clinfo

5

Basics

You can create a netmon.cf configuration file with a list of additional network

addresses. These addresses will only be used by topology services to send

ICMP ECHO requests to help determine an adapter's status. This

implementation is recommended in clusters with only a single network card

on each node, because topology services cannot force traffic over the single

adapter to confirm its proper operation.

The file should be in /usr/es/sbin/cluster directory on all nodes and contains 1

IP address per line. The file should contain remote IP labels/addresses that

are not in the cluster configuration and that can be accessed from PowerHA.

netmon.cf

6

Basics

This file is used to send ICMP ECHO requests to each IP address in the file.

After sending the request to every address, netmon checks the inbound

packet count before determining whether an adapter has failed.

netmon.cf

7

Basics

This file contains IP address information which helps to enable

communication between monitoring daemons on clients and the PowerHA

cluster nodes. The file resides on all PowerHA cluster servers and clients in

the /usr/es/sbin/cluster/etc/ directory.

When a monitor daemon starts up (for example clinfoES on a client), it reads

this file to know which nodes are available for communication.

(when running clstat utility from a client, the clinfoES obtains info from this

file.)

clhosts

8

Basics

The Internet communication protocol used to dynamically map Internet

addresses to physical (hardware/MAC) addresses on local area networks.

The /usr/sbin/cluster/etc/clinfo.rc script, which is called by the clinfo utility

whenever a network or node event occurs, updates the system’s ARP cache.

ARP

9

Basics

PowerHA can be configured to change the MAC address of a network

interface by hardware address takeover (HWAT). In a switched enwironment,

the network switch might not always get promptly informed of the new MAC.

The clinfo.rc script is used to flush the system's ARP cache in order to reflect

changes to network IP addresses. (HWAT is only supported when using IPAT

via replacement.)

clinfo.rc

10

Basics

On clients not running clinfoES, you might have to update the local ARP

cache by pinging the client from the cluster node. In order to avoid this, add

the IP of the client to the PING_CLIENT_LIST variable in the clinfo.rc script

(/usr/es/sbin/cluster/etc/clinfo.rc). Through the use of PING_CLIENT_LIST

entries, the ARP cache of clients (and other network devices) can be

updated.

clinfo.rc

11

Basics

All cluster communication is going through clcomd. It must be running before

any cluster services can be started. The trusted IP addresses are stored in

the /usr/sbin/cluster/etc/rhosts file. (root.system 0600). Nodes with a non-

empty /usr/es../rhosts file (or missing ...rhosts file) will refuse all HACMP

related communication with nodes not listed in their rhosts file. If an adapter

is missing or there is a format error in the file, clcomd will not function, all

connections will be denied. After the first synchronization HACMP ODM

classed are populated, so rhosts file can be emptied.

clcomd

12

Basics

Clcomd is started via /etc/inittab entry, which is created during PowerHA

install (clverify is using the clcomd subsystem). It uses port 6191, and it is the

transport medium for PowerHA cluster verification, global ODM changes and

remote command execution.

clcomd is managed by src (startsrc, stopsrc, refresh; refresh is useful to

reread /usr/sbin/cluster/etc/rhosts file), and logs are in

/var/hacmp/clcomd/clcomd.log

clcomd

13

Basics

It is a software stack, a package of services ("cient subsystems"), which is a

prerequisite for HACMP and is packaged with AIX.

-Topology Services: generates heartbeats to monitor nodes, networks and

network adapters, diagnoses failures. When a node joins the cluster,

topology services adds the adapter information to the machine list.

This "topology" or "connectivity" information is then passed on to group

services.

RSCT

14

Basics

-Group Services: "Client subsystems" (e.g. event management subsystem,

RMC subsystems...) forms groups, with a membership list. Provides reliable

communication and protocols required for cluster operation. The main

daemon is hagsd. Group services coordinates/monitors state changes within

the cluster (e.g. node join/leave) and then passes these state changes to the

interested subscribers, for example cluster manager or. event

management.Enhanced Concurrent mode disks use RSCT group services to

control locking.

RSCT

15

Basics

-Resource Monitoring and Control (RMC): RMC notifies the Cluster

Manager about events, so it responds to this event by RSCT. Main damon is

rmcd. The process application monitoring uses RMC and therefore does not

require any custom script. Dynamic Node Priority (DNP) is calcuated by the

use of RMC

RSCT

16

Basics

-Resource Monitoring and Control (RMC): RMC notifies the Cluster

Manager about events, so it responds to this event by RSCT. Main damon is

rmcd. The process application monitoring uses RMC and therefore does not

require any custom script. Dynamic Node Priority (DNP) is calcuated by the

use of RMC

RSCT

17

Basics

-Event Management: Match information about the state of system

resources, it initiates the scripts needed managing the cluster.

HACMP relies on topology services for heartbeats and group services for

reliable messaging. These services are started prior to the cluster processes.

If there are problems with these services the cluster will not start.

RSCT

18

Basics

It is a popular protocol for network management. It is used for collecting

information from, and configuring, network devices, such as servers, printers,

hubs, switches, and routers

When giving cldump it says: "Obtaining information via SNMP from Node:

aix11..."

SNMP

19

Basics

Clusmuxpd provides SNMP support. (IT is before v 5.3, as clinfo improved)

clinfo is based on SNMP; it queries the clsmuxpd daemon for up-to-date

cluster information and provides a simple display of it.

clsmuxpd

20

Basics

It helps managing the entire cluster from a single point. In smitty hacmp or

with commmands which are under /usr/es/sbin/cluster/cspoc.

C-SPOC using clcomd for HACMP communication between nodes, so

/etc/rhosts file no longer used.

If there is a failure of a C-SPOC function it will be logged in the

/tmp/cspoc.log, on the node performing the operation.

cspoc.log contains the used commands in this file

C-SPOC

21

Basics

The service IP replaces the existing address on the interface, thus only one

service IP can be configured on one interface at one time. The service IP should

be on the same subnet as one of the boot IP addresses.

Other interfaces on this node cannot be in the same subnet, and they are called

as standby interfaces.

These standby interfaces are used if the boot interface fails. IPAT via IP

replacement can save subnets, but requires extra hardware.

IPAT via IP replacement

22

Basics


23

Basics

If the interface holding the service IP address fails, PowerHA moves the service

IP address on another available interface on the same node and on the same

network; in this case, the resource group is not affected.

If there is no available interface on the same node, the resource group is moved

together with the service IP to another node with an available interface on the

same logical network


24

Basics

The service IP is aliased (using the ifconfig command) onto the interface without

removing the underlying boot IP address. This means more than one service IP

label can coexist on one interface.

Each boot interface on a node must be on a different subnet. The service IP

labels can be on one or more subnets, but they cannot be the same as any of

the boot interface subnets.

Standby interfaces are not necessary, because all interfaces are labeled as boot

interfaces.

IPAT via aliasing

25

Basics

IPAT via aliasing

26

Basics

By removing the need for one interface per service IP address that the node

could host, IPAT through aliasing is more flexible and in some cases requires

less hardware. IPAT through aliasing also reduces fallover time, as it is much

faster to add an alias to an interface, rather than removing the base IP

address and then apply the service IP address.

IPAT via aliasing

27

Basics

When PowerHA installed it will create this entry:

hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init >/dev/console 2>&1

// it starts clcomdES, clstrmgrES, snmpd, syslogd

HACMP start-up

28

Basics

If PowerHA configured for IP Address Takeover:

harc:2:wait:/usr/es/sbin/cluster/etc/harc.net # HACMP for AIX network

startup

When start at system restart option is chosen in C-SPOC:

hacmp6000:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot -A # Bring up

Cluster

//do not use this option, manual control better

HACMP start-up

29

Build-Configure Install HA Software

-cluster.es

-cluster.es.cspoc

-cluster.license

-cluster.man.en_US.es

# reboot

30

Build-Configure

Network and /etc/host

Boot interfaces are those that share the service subnet.

Standby interfaces are those that are not on the service subnet.

IPAT via IP REPLACEMENT (service IP is in the same subnet with boot IP)

IPAT via IP ALIASING (all IPs are in different subnets)

IP Aliasing in detail:

All base IP addresses on a node must be on separate subnets. (If heartbeat

monitoring over IP aliases is not used)

All service IP addresses must be on a separate subnet from any of the

base subnets.

The service IP addresses can all be in the same or different subnets.

The subnet masks must all be the same

31

Build-Configure

Network and /etc/host

IP Replacement in detail:

Base (boot) and service IP addresses on the primary adapter must be on

the same subnet.

All base IP addresses on the secondary adapters must be on separate

subnets (different from each other and from the primary adapter).

32

Build-Configure

Storage and FileSystem

-shared disk (on both nodes):

cfgmgr

chdev -l hdiskX pv=yes

-create enhanced concurrent vg:

lvlstmajor (on both nodes)

mkvg -C -y'orabckvg' -s'128' '-n' -V 50 hdiskpower50

// autovaryon should be turned off

lv, fs if needed

on the other node: importvg -V 50 -y orabckvg -n hdiskpower50

33

Build-Configure

Application Script

stop/start application scripts should be created

34

Build-Configure

Extended Topology

Extended Config -> Extended Topology:

-Config. HACMP Cluster

-Config HACMP Node (set nodes and ips)

35

Build-Configure

Discover

Extended Config -> Discover

"/usr/es/sbin/cluster/etc/rhosts" file possibly needed with necessary ips

36

Build-Configure

Extended Topology

Extended Config -> Extended Topology

Configure HACMP Networks: give a name and set netmask

-enable IP adress takeover with Alias? --> yes: if Aliasing

--> no: if Replacement

Configure HACMP Interface/Device: Add discovered -> comm interface

ALIASING: 1 network configured and both ...boot_1 and ...boot_2

addreses were used because they are in different subnets

IP REPLACEMENT:1 network configured and ...boot_1 addresses will be

used because ...boot_2 IPs are in different subnet from service IP

// Not necessary, but here we can do verif. and synch. to see if everything is

correct:Ext. Config -> Ext. Ver...

37

Build-Configure

RG and resources

Extended Config -> Ext. resource:

-Ext. Res. Group:

-startup policy:

if IP repl. with 1 nic per network per node -> Online Using Distribution

Policy

if IP repl. with more nics on a network on the nodes -> anything

if IP aliasing -> anything

-Extended Resource:

-Appl. Server: (start, stop scripts)

-Service IP..: Configurable on Multiple Nodes -> which network -> F4 to

choose

RG and resources are ready to be related together:

Extended Config -> Extended RG -> Change/show Resources for a

RG: with F4 add:

38

Build-Configure RG and resources

-Service IP

-Appl Serv

-VG

39

Build-Configure Sync and Verify

places to start in SMITTY:

- under Standard Config.: this always runs a full verification, all aspects

of the cluster will be checked before synchronization

it has no option, it is a press and go function

- under Enhanced Config.: allows separation of verification and

synchronization

Emulate: changes will be tested before trying to implement

them

Actual: it implements the settings

Forced: ignore any errors, can be dangerous to a running

cluster

Verify changes only: if only few items changed it will allow

much faster verification and synchronization

Always verify and synchronize the cluster from the node on which the

changed occured to the other nodes in the cluster.

40


-!!! Problem Determination Tools > HACMP Verification: verification can

be run without causing a synchronization!!!

Error count: verification is successful if error count not

exceeded

The verify process will indicate warnings if the cluster is

capable running but:

-may have items that are not configured in resource groups

-the recommendations for configuring the cluster have not

been followed

41


hen clverify has problems it is usually related to the ability to contact a

node.

The node to node communication is provided by clcomd.

(clcomd uses the /usr/es/sbin/cluster/etc/rhosts file for inter node security.

(in earlier releases it used the 'r' commands))

42


there are 2 log files for problem checking:

clverify: /var/hacmp/clverify/clverify.log (automatic cluster conf. mon. <-

every 24 hours, by default on the first node in alphabetical)

clcomd: /var/hacmp/clcomd/clcomd.log

additional logs:

/var/hacmp/clverify/fail stores data from the most recent failed

verification attempt

/var/hacmp/clverify/pass stores data from the most recent passed

verification attempt

43

Command Help ODM

odmget HACMPlogs shows where are the log files

odmget HACMPcluster shows cluster version

odmget HACMPnode shows info from nodes

// changing the location of the log files: C-SPOC > Log Viewing and

Management)

/etc/es/objrepos HACMP ODM files

44

Command Help POWERHA LOGS

45

Command Help RSCT LOGS

/var/ha/log RSCT logs are here

/var/ha/log/nim.topsvcs... the heartbeats are logged here (comm. is

OK between the nodes

46

Command Help Command

clRGinfo Shows the state of RGs (in earlier HACMP clfindres was

used)

clRGinfo -p shows the node that has temporarily the highest priority

(POL)

clRGinfo -t shows the delayed timer information

clRGinfo -m shows the status of the application monitors of the

cluster resource groups state can be: online, offline, acquiring, releasing,

error, unknown

cldump (or clstat -o) detailed info about the cluster (realtime, shows

cluster status) (clstat requires a running clinfo)

cldisp detailed general info about the cluster (not realtime)

47


cltopinfo Detailed information about the network of the cluster (this

shows the data in DCD not in ACD)

cltopinfo -i good overview, same as cllsif: this also lists cluster

inetrfaces, it was used prior HACMP 5.1

cltopinfo -m shows heartbeat statistics, missed heartbeats

clshowres Detailed information about the resource group(s)

cllsserv Shows which scripts will be run in case of a takeover

clrgdependency -t PARENT_CHILD -sl shows parent child dependencies

of resource groups

clshowsrv -v shows status of the cluster daemons (very good

overview!!!)

48


lssrc -ls clstrmgrES shows if cluster is STABLE or not, cluster version,

Dynamic Node Priority (pgspace free, disk busy, cpu idle)

ST_STABLE: cluster services running with resources online

NOT_CONFIGURED: cluster is not configured or node is not

synced

ST_INIT: cluster is configured but not active on this node

ST_JOINING: cluster node is joining the cluster

ST_VOTING: cluster nodes are voting to decide event execution

ST_RP_RUNNING: cluster is running a recovery program

RP_FAILED: recovery program event script is failed

ST_BARRIER: clstrmgr is in between events waiting at the

barrier

ST_CBARRIER: clstrmgr is exiting a recovery program

ST_UNSTABLE: cluster is unstable usually due to an event error

49


lssrc -g cluster lists the running cluster daemons

lssrc -ls topsvcs shows the status of individual diskhb devices,

heartbeat intervals, failure cycle (missed heartbeats)

lssrc -ls grpsvcs gives info about connected clients, number of groups)

lssrc -ls emsvcs shows the resource monitors known to the event

management subsystem)

lssrc -ls snmpd shows info about snmpd

halevel -s shows PowerHA level (from 6.1)

50


cl_ping pings all the adapters of the given list (e.g.: cl_ping -w 2

aix21 aix31 (-w: wait 2 seconds))

cldiag HACMP troubleshooting tool (e.g.: cldiag debug clstrmgr -l

5 <--shows clstrmgr heartbeat infos)

cldiags vgs -h nodeA nodeB <--this checks the shared vgs

definitions on the given node for inconsistencies

/usr/es/sbin/cluster/utilities/get_local_nodename shows the name of

this node within the HACMP

/usr/es/sbin/cluster/utilities/clexit.rc this script halt the node if the

cluster manager daemon stopped incorrectly

51

Command Help Remove HA Configure

1. stop cluster on both nodes

2. remove the cluster configuration ( smitty hacmp) on both nodes

3. remove cluster filesets (startinf with cluster.*)

If you are planning to do crash-test, do it with halt -q or reboot -q

shutdown -Fr will not work, because it stops hacmp and resource groups

garcefully (rc.shutdown), so no takeover will occur.

52

Disk HeartBeat OverView

Heartbeat disks should be used in enhanced concurrent mode. Enhanced

concurrent mode disks use RSCT group services to control locking, thus

freeing up a sector on the disk that can now be used for communication.

This sector, which was formerly used for SSA Concurrent mode disks, is

now used for writing heartbeat information.

Any disk that is part of an enhanced concurrent volume group can be used

for a diskhb network, including those used for data storage. Also, the

volume group that contains the disk used for a diskhb network does not

have to be varied on.

An enhanced concurrent volume group is not the same as a concurrent

volume group (which is part of a concurrent resource group), rather, it refers

to the mode of locking by using RSCT.

53

Disk HeartBeat How to View

lspv | grep hb

<--shows the actual state of heartbeat disks

cltopinfo -i | grep hb

<--shows what had been saved into the configuration (we have to change it

to show the actual state)

54

Disk HeartBeat How to Config

1. Create diskhb network

Extended Configuration->Extended Topology->Configure HACMP

Networks->Add a Network...

choose:diskhb

* Network Name [anything you want]

* Network Type diskhb

55

Disk HeartBeat How to Config

2. Add device：

Extended Configuration->Extended Topology->Configure HACMP Comm.

Interfaces/Dev.->Add ...

Add Pre-defined...

Communication Devices

Choose your diskhb Network Name

* Device Name [aix41_diskhb2] <--choose a

unique name

* Network Type diskhb

* Network Name net_diskhb_aix41_aix42

* Device Path [/dev/vpath4]

* Node Name [aix41]

// You will repeat this process for the other node and the other device. This

will complete both devices for the diskhb network.

56

Disk HeartBeat How to Test

DO NOT PERFORM THIS TEST WHILE HACMP IS RUNNING???

dhb_read -p devicename <--dump diskhb sector contents

dhb_read -p devicename -r <--receive data over diskhb network

dhb_read -p devicename -t <--transmit data over diskhb network

1. on one node set receiving:

/usr/sbin/rsct/bin/dhb_read -p hdisk2 -r

2. on the other node set transmit:

/usr/sbin/rsct/bin/dhb_read -p hdisk2 -t

dhb_read -p rvpath0 -r <--Note: That the device name is raw device as

designated with the "r" proceeding the device name.

If everything is OK: Link operating normally

57

Disk HeartBeat How to Monitor

root@aix41: / # lssrc -ls topsvcs

Subsystem Group PID Status

topsvcs topsvcs 921638 active

Network Name Indx Defd Mbrs St Adapter ID Group ID

VLAN200_10_20_ [ 0] 2 2 S 10.10.10.2 10.10.10.2

VLAN200_10_20_ [ 0] en11 0x41c64107 0x41c64108

HB Interval = 1.000 secs. Sensitivity = 10 missed beats

Missed HBs: Total: 0 Current group: 0

...

Thank You

SINA@冰砖帮帮忙 Make Presentation much more fun

aix powerha 讨论_20130109

Documents