aix powerha 讨论_20130109
TRANSCRIPT
AIX POWERHA 讨论
sina@冰砖帮帮忙
2
Basics
It uses services provided by the RSCT subsystems to monitor the status of
the nodes and their interfaces. It receives ibformation from Topology Sevices
and uses Group Services for inter-node communication. It invokes the
appropriate scripts in response to node or network events.(recovering from
SW/HW failures, request to online/offline a node, request to
move/online/offline a resource group) It maintains update informations about
the resource groups (status, location) A daemon which runs on each cluster
nodes.
clstrmgr
3
Basics
If clstrmgr hangs or is terminated the default action taken by SRC is to issue
halt -q, causing the system to crash. Clstrmgr is dependent on RSCT; if
topsvcs or grpsvcs has problems with starting, the clstrmgr will not start
either.
clstrmgr
4
Basics
Clinfo obtains updated cluster information from the Cluster Manager. It
makes information about the state of the cluster, nodes, networks and
applications. Used by clstat, and it is optional on cluster nodes and clients.
# startsrc -s clinfoES starts clinfo
// usr/es/sbin/cluster/etc/rc.cluster this script also starts everything
# stopsrc -s clinfoES stops clinfo
clinfo
5
Basics
You can create a netmon.cf configuration file with a list of additional network
addresses. These addresses will only be used by topology services to send
ICMP ECHO requests to help determine an adapter's status. This
implementation is recommended in clusters with only a single network card
on each node, because topology services cannot force traffic over the single
adapter to confirm its proper operation.
The file should be in /usr/es/sbin/cluster directory on all nodes and contains 1
IP address per line. The file should contain remote IP labels/addresses that
are not in the cluster configuration and that can be accessed from PowerHA.
netmon.cf
6
Basics
This file is used to send ICMP ECHO requests to each IP address in the file.
After sending the request to every address, netmon checks the inbound
packet count before determining whether an adapter has failed.
netmon.cf
7
Basics
This file contains IP address information which helps to enable
communication between monitoring daemons on clients and the PowerHA
cluster nodes. The file resides on all PowerHA cluster servers and clients in
the /usr/es/sbin/cluster/etc/ directory.
When a monitor daemon starts up (for example clinfoES on a client), it reads
this file to know which nodes are available for communication.
(when running clstat utility from a client, the clinfoES obtains info from this
file.)
clhosts
8
Basics
The Internet communication protocol used to dynamically map Internet
addresses to physical (hardware/MAC) addresses on local area networks.
The /usr/sbin/cluster/etc/clinfo.rc script, which is called by the clinfo utility
whenever a network or node event occurs, updates the system’s ARP cache.
ARP
9
Basics
PowerHA can be configured to change the MAC address of a network
interface by hardware address takeover (HWAT). In a switched enwironment,
the network switch might not always get promptly informed of the new MAC.
The clinfo.rc script is used to flush the system's ARP cache in order to reflect
changes to network IP addresses. (HWAT is only supported when using IPAT
via replacement.)
clinfo.rc
10
Basics
On clients not running clinfoES, you might have to update the local ARP
cache by pinging the client from the cluster node. In order to avoid this, add
the IP of the client to the PING_CLIENT_LIST variable in the clinfo.rc script
(/usr/es/sbin/cluster/etc/clinfo.rc). Through the use of PING_CLIENT_LIST
entries, the ARP cache of clients (and other network devices) can be
updated.
clinfo.rc
11
Basics
All cluster communication is going through clcomd. It must be running before
any cluster services can be started. The trusted IP addresses are stored in
the /usr/sbin/cluster/etc/rhosts file. (root.system 0600). Nodes with a non-
empty /usr/es../rhosts file (or missing ...rhosts file) will refuse all HACMP
related communication with nodes not listed in their rhosts file. If an adapter
is missing or there is a format error in the file, clcomd will not function, all
connections will be denied. After the first synchronization HACMP ODM
classed are populated, so rhosts file can be emptied.
clcomd
12
Basics
Clcomd is started via /etc/inittab entry, which is created during PowerHA
install (clverify is using the clcomd subsystem). It uses port 6191, and it is the
transport medium for PowerHA cluster verification, global ODM changes and
remote command execution.
clcomd is managed by src (startsrc, stopsrc, refresh; refresh is useful to
reread /usr/sbin/cluster/etc/rhosts file), and logs are in
/var/hacmp/clcomd/clcomd.log
clcomd
13
Basics
It is a software stack, a package of services ("cient subsystems"), which is a
prerequisite for HACMP and is packaged with AIX.
-Topology Services: generates heartbeats to monitor nodes, networks and
network adapters, diagnoses failures. When a node joins the cluster,
topology services adds the adapter information to the machine list.
This "topology" or "connectivity" information is then passed on to group
services.
RSCT
14
Basics
-Group Services: "Client subsystems" (e.g. event management subsystem,
RMC subsystems...) forms groups, with a membership list. Provides reliable
communication and protocols required for cluster operation. The main
daemon is hagsd. Group services coordinates/monitors state changes within
the cluster (e.g. node join/leave) and then passes these state changes to the
interested subscribers, for example cluster manager or. event
management.Enhanced Concurrent mode disks use RSCT group services to
control locking.
RSCT
15
Basics
-Resource Monitoring and Control (RMC): RMC notifies the Cluster
Manager about events, so it responds to this event by RSCT. Main damon is
rmcd. The process application monitoring uses RMC and therefore does not
require any custom script. Dynamic Node Priority (DNP) is calcuated by the
use of RMC
RSCT
16
Basics
-Resource Monitoring and Control (RMC): RMC notifies the Cluster
Manager about events, so it responds to this event by RSCT. Main damon is
rmcd. The process application monitoring uses RMC and therefore does not
require any custom script. Dynamic Node Priority (DNP) is calcuated by the
use of RMC
RSCT
17
Basics
-Event Management: Match information about the state of system
resources, it initiates the scripts needed managing the cluster.
HACMP relies on topology services for heartbeats and group services for
reliable messaging. These services are started prior to the cluster processes.
If there are problems with these services the cluster will not start.
RSCT
18
Basics
It is a popular protocol for network management. It is used for collecting
information from, and configuring, network devices, such as servers, printers,
hubs, switches, and routers
When giving cldump it says: "Obtaining information via SNMP from Node:
aix11..."
SNMP
19
Basics
Clusmuxpd provides SNMP support. (IT is before v 5.3, as clinfo improved)
clinfo is based on SNMP; it queries the clsmuxpd daemon for up-to-date
cluster information and provides a simple display of it.
clsmuxpd
20
Basics
It helps managing the entire cluster from a single point. In smitty hacmp or
with commmands which are under /usr/es/sbin/cluster/cspoc.
C-SPOC using clcomd for HACMP communication between nodes, so
/etc/rhosts file no longer used.
If there is a failure of a C-SPOC function it will be logged in the
/tmp/cspoc.log, on the node performing the operation.
cspoc.log contains the used commands in this file
C-SPOC
21
Basics
The service IP replaces the existing address on the interface, thus only one
service IP can be configured on one interface at one time. The service IP should
be on the same subnet as one of the boot IP addresses.
Other interfaces on this node cannot be in the same subnet, and they are called
as standby interfaces.
These standby interfaces are used if the boot interface fails. IPAT via IP
replacement can save subnets, but requires extra hardware.
IPAT via IP replacement
22
Basics
IPAT via IP replacement
23
Basics
If the interface holding the service IP address fails, PowerHA moves the service
IP address on another available interface on the same node and on the same
network; in this case, the resource group is not affected.
If there is no available interface on the same node, the resource group is moved
together with the service IP to another node with an available interface on the
same logical network
IPAT via IP replacement
24
Basics
The service IP is aliased (using the ifconfig command) onto the interface without
removing the underlying boot IP address. This means more than one service IP
label can coexist on one interface.
Each boot interface on a node must be on a different subnet. The service IP
labels can be on one or more subnets, but they cannot be the same as any of
the boot interface subnets.
Standby interfaces are not necessary, because all interfaces are labeled as boot
interfaces.
IPAT via aliasing
25
Basics
IPAT via aliasing
26
Basics
By removing the need for one interface per service IP address that the node
could host, IPAT through aliasing is more flexible and in some cases requires
less hardware. IPAT through aliasing also reduces fallover time, as it is much
faster to add an alias to an interface, rather than removing the base IP
address and then apply the service IP address.
IPAT via aliasing
27
Basics
When PowerHA installed it will create this entry:
hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init >/dev/console 2>&1
// it starts clcomdES, clstrmgrES, snmpd, syslogd
HACMP start-up
28
Basics
If PowerHA configured for IP Address Takeover:
harc:2:wait:/usr/es/sbin/cluster/etc/harc.net # HACMP for AIX network
startup
When start at system restart option is chosen in C-SPOC:
hacmp6000:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot -A # Bring up
Cluster
//do not use this option, manual control better
HACMP start-up
29
Build-Configure Install HA Software
-cluster.es
-cluster.es.cspoc
-cluster.license
-cluster.man.en_US.es
# reboot
30
Build-Configure
Network and /etc/host
Boot interfaces are those that share the service subnet.
Standby interfaces are those that are not on the service subnet.
IPAT via IP REPLACEMENT (service IP is in the same subnet with boot IP)
IPAT via IP ALIASING (all IPs are in different subnets)
IP Aliasing in detail:
All base IP addresses on a node must be on separate subnets. (If heartbeat
monitoring over IP aliases is not used)
All service IP addresses must be on a separate subnet from any of the
base subnets.
The service IP addresses can all be in the same or different subnets.
The subnet masks must all be the same
31
Build-Configure
Network and /etc/host
IP Replacement in detail:
Base (boot) and service IP addresses on the primary adapter must be on
the same subnet.
All base IP addresses on the secondary adapters must be on separate
subnets (different from each other and from the primary adapter).
32
Build-Configure
Storage and FileSystem
-shared disk (on both nodes):
cfgmgr
chdev -l hdiskX pv=yes
-create enhanced concurrent vg:
lvlstmajor (on both nodes)
mkvg -C -y'orabckvg' -s'128' '-n' -V 50 hdiskpower50
// autovaryon should be turned off
lv, fs if needed
on the other node: importvg -V 50 -y orabckvg -n hdiskpower50
33
Build-Configure
Application Script
stop/start application scripts should be created
34
Build-Configure
Extended Topology
Extended Config -> Extended Topology:
-Config. HACMP Cluster
-Config HACMP Node (set nodes and ips)
35
Build-Configure
Discover
Extended Config -> Discover
"/usr/es/sbin/cluster/etc/rhosts" file possibly needed with necessary ips
36
Build-Configure
Extended Topology
Extended Config -> Extended Topology
Configure HACMP Networks: give a name and set netmask
-enable IP adress takeover with Alias? --> yes: if Aliasing
--> no: if Replacement
Configure HACMP Interface/Device: Add discovered -> comm interface
ALIASING: 1 network configured and both ...boot_1 and ...boot_2
addreses were used because they are in different subnets
IP REPLACEMENT:1 network configured and ...boot_1 addresses will be
used because ...boot_2 IPs are in different subnet from service IP
// Not necessary, but here we can do verif. and synch. to see if everything is
correct:Ext. Config -> Ext. Ver...
37
Build-Configure
RG and resources
Extended Config -> Ext. resource:
-Ext. Res. Group:
-startup policy:
if IP repl. with 1 nic per network per node -> Online Using Distribution
Policy
if IP repl. with more nics on a network on the nodes -> anything
if IP aliasing -> anything
-Extended Resource:
-Appl. Server: (start, stop scripts)
-Service IP..: Configurable on Multiple Nodes -> which network -> F4 to
choose
RG and resources are ready to be related together:
Extended Config -> Extended RG -> Change/show Resources for a
RG: with F4 add:
38
Build-Configure RG and resources
-Service IP
-Appl Serv
-VG
39
Build-Configure Sync and Verify
places to start in SMITTY:
- under Standard Config.: this always runs a full verification, all aspects
of the cluster will be checked before synchronization
it has no option, it is a press and go function
- under Enhanced Config.: allows separation of verification and
synchronization
Emulate: changes will be tested before trying to implement
them
Actual: it implements the settings
Forced: ignore any errors, can be dangerous to a running
cluster
Verify changes only: if only few items changed it will allow
much faster verification and synchronization
Always verify and synchronize the cluster from the node on which the
changed occured to the other nodes in the cluster.
40
Build-Configure Sync and Verify
-!!! Problem Determination Tools > HACMP Verification: verification can
be run without causing a synchronization!!!
Error count: verification is successful if error count not
exceeded
The verify process will indicate warnings if the cluster is
capable running but:
-may have items that are not configured in resource groups
-the recommendations for configuring the cluster have not
been followed
41
Build-Configure Sync and Verify
hen clverify has problems it is usually related to the ability to contact a
node.
The node to node communication is provided by clcomd.
(clcomd uses the /usr/es/sbin/cluster/etc/rhosts file for inter node security.
(in earlier releases it used the 'r' commands))
42
Build-Configure Sync and Verify
there are 2 log files for problem checking:
clverify: /var/hacmp/clverify/clverify.log (automatic cluster conf. mon. <-
every 24 hours, by default on the first node in alphabetical)
clcomd: /var/hacmp/clcomd/clcomd.log
additional logs:
/var/hacmp/clverify/fail stores data from the most recent failed
verification attempt
/var/hacmp/clverify/pass stores data from the most recent passed
verification attempt
43
Command Help ODM
odmget HACMPlogs shows where are the log files
odmget HACMPcluster shows cluster version
odmget HACMPnode shows info from nodes
// changing the location of the log files: C-SPOC > Log Viewing and
Management)
/etc/es/objrepos HACMP ODM files
44
Command Help POWERHA LOGS
45
Command Help RSCT LOGS
/var/ha/log RSCT logs are here
/var/ha/log/nim.topsvcs... the heartbeats are logged here (comm. is
OK between the nodes
46
Command Help Command
clRGinfo Shows the state of RGs (in earlier HACMP clfindres was
used)
clRGinfo -p shows the node that has temporarily the highest priority
(POL)
clRGinfo -t shows the delayed timer information
clRGinfo -m shows the status of the application monitors of the
cluster resource groups state can be: online, offline, acquiring, releasing,
error, unknown
cldump (or clstat -o) detailed info about the cluster (realtime, shows
cluster status) (clstat requires a running clinfo)
cldisp detailed general info about the cluster (not realtime)
47
Command Help Command
cltopinfo Detailed information about the network of the cluster (this
shows the data in DCD not in ACD)
cltopinfo -i good overview, same as cllsif: this also lists cluster
inetrfaces, it was used prior HACMP 5.1
cltopinfo -m shows heartbeat statistics, missed heartbeats
clshowres Detailed information about the resource group(s)
cllsserv Shows which scripts will be run in case of a takeover
clrgdependency -t PARENT_CHILD -sl shows parent child dependencies
of resource groups
clshowsrv -v shows status of the cluster daemons (very good
overview!!!)
48
Command Help Command
lssrc -ls clstrmgrES shows if cluster is STABLE or not, cluster version,
Dynamic Node Priority (pgspace free, disk busy, cpu idle)
ST_STABLE: cluster services running with resources online
NOT_CONFIGURED: cluster is not configured or node is not
synced
ST_INIT: cluster is configured but not active on this node
ST_JOINING: cluster node is joining the cluster
ST_VOTING: cluster nodes are voting to decide event execution
ST_RP_RUNNING: cluster is running a recovery program
RP_FAILED: recovery program event script is failed
ST_BARRIER: clstrmgr is in between events waiting at the
barrier
ST_CBARRIER: clstrmgr is exiting a recovery program
ST_UNSTABLE: cluster is unstable usually due to an event error
49
Command Help Command
lssrc -g cluster lists the running cluster daemons
lssrc -ls topsvcs shows the status of individual diskhb devices,
heartbeat intervals, failure cycle (missed heartbeats)
lssrc -ls grpsvcs gives info about connected clients, number of groups)
lssrc -ls emsvcs shows the resource monitors known to the event
management subsystem)
lssrc -ls snmpd shows info about snmpd
halevel -s shows PowerHA level (from 6.1)
50
Command Help Command
cl_ping pings all the adapters of the given list (e.g.: cl_ping -w 2
aix21 aix31 (-w: wait 2 seconds))
cldiag HACMP troubleshooting tool (e.g.: cldiag debug clstrmgr -l
5 <--shows clstrmgr heartbeat infos)
cldiags vgs -h nodeA nodeB <--this checks the shared vgs
definitions on the given node for inconsistencies
/usr/es/sbin/cluster/utilities/get_local_nodename shows the name of
this node within the HACMP
/usr/es/sbin/cluster/utilities/clexit.rc this script halt the node if the
cluster manager daemon stopped incorrectly
51
Command Help Remove HA Configure
1. stop cluster on both nodes
2. remove the cluster configuration ( smitty hacmp) on both nodes
3. remove cluster filesets (startinf with cluster.*)
If you are planning to do crash-test, do it with halt -q or reboot -q
shutdown -Fr will not work, because it stops hacmp and resource groups
garcefully (rc.shutdown), so no takeover will occur.
52
Disk HeartBeat OverView
Heartbeat disks should be used in enhanced concurrent mode. Enhanced
concurrent mode disks use RSCT group services to control locking, thus
freeing up a sector on the disk that can now be used for communication.
This sector, which was formerly used for SSA Concurrent mode disks, is
now used for writing heartbeat information.
Any disk that is part of an enhanced concurrent volume group can be used
for a diskhb network, including those used for data storage. Also, the
volume group that contains the disk used for a diskhb network does not
have to be varied on.
An enhanced concurrent volume group is not the same as a concurrent
volume group (which is part of a concurrent resource group), rather, it refers
to the mode of locking by using RSCT.
53
Disk HeartBeat How to View
lspv | grep hb
<--shows the actual state of heartbeat disks
cltopinfo -i | grep hb
<--shows what had been saved into the configuration (we have to change it
to show the actual state)
54
Disk HeartBeat How to Config
1. Create diskhb network
Extended Configuration->Extended Topology->Configure HACMP
Networks->Add a Network...
choose:diskhb
* Network Name [anything you want]
* Network Type diskhb
55
Disk HeartBeat How to Config
2. Add device:
Extended Configuration->Extended Topology->Configure HACMP Comm.
Interfaces/Dev.->Add ...
Add Pre-defined...
Communication Devices
Choose your diskhb Network Name
* Device Name [aix41_diskhb2] <--choose a
unique name
* Network Type diskhb
* Network Name net_diskhb_aix41_aix42
* Device Path [/dev/vpath4]
* Node Name [aix41]
// You will repeat this process for the other node and the other device. This
will complete both devices for the diskhb network.
56
Disk HeartBeat How to Test
DO NOT PERFORM THIS TEST WHILE HACMP IS RUNNING???
dhb_read -p devicename <--dump diskhb sector contents
dhb_read -p devicename -r <--receive data over diskhb network
dhb_read -p devicename -t <--transmit data over diskhb network
1. on one node set receiving:
/usr/sbin/rsct/bin/dhb_read -p hdisk2 -r
2. on the other node set transmit:
/usr/sbin/rsct/bin/dhb_read -p hdisk2 -t
dhb_read -p rvpath0 -r <--Note: That the device name is raw device as
designated with the "r" proceeding the device name.
If everything is OK: Link operating normally
57
Disk HeartBeat How to Monitor
root@aix41: / # lssrc -ls topsvcs
Subsystem Group PID Status
topsvcs topsvcs 921638 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
VLAN200_10_20_ [ 0] 2 2 S 10.10.10.2 10.10.10.2
VLAN200_10_20_ [ 0] en11 0x41c64107 0x41c64108
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
...
Thank You
SINA@冰砖帮帮忙 Make Presentation much more fun