ap function change

64
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Kontr - Checked Datum - Date Rev 1(64) Dokansv/Godk - Doc respons/Approved File Nr - No. Ericsson Development and Reseach Inc 4/002 01-CAL 119 0401 Uen A FUNCTION CHANGE Abstract This document describes the implementation of Function Change (FCH) upgrade of ACS, ACS-based application software, Large Building Block (LBB) and third party product (3pp) software as well as system parameter editing on the Windows NT platform. The functionality described in this document is the complete FCH. Application This document forms a base for implementation, user documentation, test and maintenance of the product(s) related to FCH and is not intended for the customer or the user of the system. Contents 1 GENERAL 3 1.1 SYSTEM ENVIRONMENT 3 1.2 REQUIREMENT REFERENCES 3 1.3 DESIGN PRINCIPLES 3 1.4 HARDWARE AND SYSTEM SOFTWARE 4 2 EXTERNAL INTERFACES 4 2.1 PROVIDED EXTERNAL INTERFACES 4 2.2 USED EXTERNAL INTERFACES 4 3 USE CASES 5 3.1 INITIATING A FCH SESSION 5 3.2 ENDING A SESSION 22 4 STRUCTURE 38 4.1 RESPONSIBILITIES 38 4.2 INTERFACES 38 5 SOFTWARE UNITS 38 5.1 FCHEXESRC 38 5.2 FCHLIBSRC 40 6 PROCESSES 41 7 PERSISTENT STORAGE 41 8 ERROR HANDLING 42 UAB/I/GBC (T Wocalewski) DESIGN SPECIFICATION B QABKULD 2000-11-15 2000 102 62-CNZ 222 59 Uen

Upload: milonpab

Post on 18-Nov-2014

169 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: AP Function Change

Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date Rev

1(64)

Dokansv/Godk - Doc respons/Approved File

Nr - No.

Ericsson Development and Reseach Inc

4/00

2 01

-CA

L 11

9 04

01 U

en A

FUNCTION CHANGE

Abstract

This document describes the implementation of Function Change (FCH)upgrade of ACS, ACS-based application software, Large Building Block(LBB) and third party product (3pp) software as well as system parameterediting on the Windows NT platform. The functionality described in thisdocument is the complete FCH.

Application

This document forms a base for implementation, user documentation, testand maintenance of the product(s) related to FCH and is not intended forthe customer or the user of the system.

Contents

1 GENERAL 31.1 SYSTEM ENVIRONMENT 31.2 REQUIREMENT REFERENCES 31.3 DESIGN PRINCIPLES 31.4 HARDWARE AND SYSTEM SOFTWARE 4

2 EXTERNAL INTERFACES 42.1 PROVIDED EXTERNAL INTERFACES 42.2 USED EXTERNAL INTERFACES 4

3 USE CASES 53.1 INITIATING A FCH SESSION 53.2 ENDING A SESSION 22

4 STRUCTURE 384.1 RESPONSIBILITIES 384.2 INTERFACES 38

5 SOFTWARE UNITS 385.1 FCHEXESRC 385.2 FCHLIBSRC 40

6 PROCESSES 41

7 PERSISTENT STORAGE 41

8 ERROR HANDLING 42

UAB/I/GBC (T Wocalewski)

DESIGN SPECIFICATION

B

QABKULD

2000-11-15

2000

102 62-CNZ 222 59 Uen

Page 2: AP Function Change

2(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

9 FUNCTION CHANGE 42

10 START, STOP AND RESTART 42

11 CONFIGURATION 42

12 CAPACITY 4312.1 DATA FOR CAPACITY ESTIMATION 4312.2 CAPACITY ESTIMATION 43

13 SPECIAL FEATURES 43

14 REFERENCES 43

15 ANNEXES 4415.1 ANNEX REVISION HISTORY 44

Page 3: AP Function Change

3(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

1 GENERAL

This document describes how to provide Function Change, i.e. online soft-ware upgrade of AP software with minimum system downtime and main-tained high availability.

The implemetation described will require a reboot of the system, andprovides transactional behavior, fallback capability and evaluation of anupgraded system before committing.

The implementation is based on the Microsoft Cluster Server (MSCS) andInstallShield third-party products, and also uses or prepares use of the CXCproduct Backup and Restore (BUR).

1.1 SYSTEM ENVIRONMENT

Figure 1.1 The system environment.

1.2 REQUIREMENT REFERENCES

See ref.[2] for details on requirements.

1.3 DESIGN PRINCIPLES

The internal design of FCH is object oriented, whereas the command lineinterfaces used to execute a Function Change session are procedural.Everything is written in C++.

FCH

MSCS InstallShield WIN32

PRCAEH PHA

Registry BUR

Page 4: AP Function Change

4(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

It is designed to be transactional, in the sense that it will always be able toreturn the system to a known working state. Much effort is placed inmaking it as robust as possible.

1.4 HARDWARE AND SYSTEM SOFTWARE

Force APG40 LBB, Microsoft Windows NT 4.0 Advanced Server.

2 EXTERNAL INTERFACES

2.1 PROVIDED EXTERNAL INTERFACES

FCH provides the following external interfaces:

2.1.1 fchstart

Command line interface to initiate and perform a FCH upgrade of thesystem.

2.1.2 fchfb

Command line interface to perform a fallback to the system existing priorto the FCH session.

2.1.3 fchcommit

Command line interface to transfer necessary data to the unmodified nodeand prepare the system for commit of the new configuration.

2.1.4 fchend

Command line interface to install the new configuration on the passivenode of the system and end the FCH session. Also used to end a failed FCHsession.

2.1.5 fchrst

Command line interface to restore the uncommitted node, using BUR, incase of a severe failure during a FCH session.

2.1.6 FCH pipe command interface.

Used by PRC to do an automatic fallback during FCH. PRC is sending thecommand “fallback” to the pipe.

2.2 USED EXTERNAL INTERFACES

The FCH application uses the following external interfaces:

1 AP Event Report (AEH) API

Page 5: AP Function Change

5(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

2 AP Process Control (PRC) CLI3 AP Parameter Handling (PHA) CLI4 AP Backup and Restore (BUR) CLI. FCH, at present does

only prepare use of BUR CLI.5 MSCS API6 InstallShield CLI7 Microsoft Win32 API

The interfaces are described in the following sections.

2.2.1 AP Event Report API

The AP Event Report API, ACS_AEH_EvReport, is used to send alarmsand events from FCH.

2.2.2 AP Process Control CLI

FCH uses the prcconf CLI to reconfigure the MSCS cluster database andthe prcgen CLI to generate the PRC Service Start Schedule file.

2.2.3 AP Parameter Handling CLI

FCH uses the phacreate CLI to syntax check parameter files and update thePHA parameter database. FCH also uses the phatrans CLI to transferparameters between different versions of CXCs.

2.2.4 AP Backup and Restore CLI

At present, FCH does not use BUR CLI directly to restore a single node. Incase of a restore, FCH prepares the system for restore and the operator hasto invoke the needed BUR command manually. I.e. fcc_restore or Burre-store.

2.2.5 MSCS API

FCH uses the MSCS API to control cluster resources and to performfailovers.

2.2.6 InstallShield CLI

FCH uses the InstallShield CLIs setup.exe and isuninst.exe to performinstallation and removal of CXC software pacakges.

2.2.7 Microsoft Win32 API

The Win32 API is used extensively to implement the FCH functionality.

Page 6: AP Function Change

6(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3 USE CASES

3.1 INITIATING A FCH SESSION

Note that the different fchstart options described below can be combined inany way.

3.1.1 Installing CXC packages

3.1.1.1 Description

Install and upgrade of CXC packages is initiated with the fchstart commandusing the -d option and a directory as argument. The directory must containthe CXC packages to be installed, in the form of self-extracting Winzipfiles. The command will unpack the packages to a specific location, checkto see if the CXC packages are previously installed, and if so check the revi-sion of the installed package and compare it with the revision of thepackage to be installed. All unique CXC packages, i.e. the revisions that arenot already installed, will be displayed as a list from which the operatormay select which CXC’s to install.

FCH is a state machine, implemented in a transactional way, so whenfchstart begins, the FCH state is set to Installing.

All existing rin files with it’s corresponding instances are removed from thesystem before CXC package updates. After installation, the rin files thatshould exist in the new system are added again. There are 2 ways to add orupdate rin-files: by using the -i option or installation using cxc installation.

After all updates have been performed, the operator is prompted to switchto the new system. If the operator selects n for no, the FCH session will beaborted and the node will be restored to its previous state with state set tonoFCH.

If the operator selects y for yes, state is set to Reboot, the node will berebooted, after which the FCH server component will perform the switchover to the new system with state initially set to Failover1 etc.

Page 7: AP Function Change

7(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.1.2 Flow of Events

Operator fchstart FCHLIBfchstart -d (a)

initial check (b)

ok

check resources (c)

initiate session (d)

get new pkg list (f)

list packages (g)

enter selection (h)

install package (i)

get new pkg list (j)

add all rin files(k)

rebuild config (l)

verify before switch (m)

prompt (n)

answer (o)

(loop)

set state Reboot and

ok

ok

ok

ok

ok

ok

ok

ok

ok

initiate pkg handling (e)ok

ok

initiate reboot (p)

Page 8: AP Function Change

8(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

a operator executes fchstart -d <arg>, where <arg> is adirectory on the system disk where the CXC packages toinstall are located.

b FCH verifies that the cluster server is running, that directorygiven as argument exists, that no other FCH session is goingon, and that the FCH server is running. It also checks theamount of free disk space It also makes backup copies ofFCH and ACS binaries used for execution of old binaries

c FCH verifies all necessary system resources. It checks theconnection to the other node and the data disk, that no SoftFunction Change (SFC) and BUR is in progress, that thecluster quorum resource is available, that the current node isthe passive node, that none of the nodes are paused, that allcluster resources are online and that the FCH specialdirectories are empty. It also flushes the cache of all disks.

d FCH initiates the new session by raising the alarm APFUNCTION CHANGE IN PROGRESS. It also pauses thecurrent node and makes a backup copy of the PRC servicestart schedule file. State is set to Installing.

e FCH initiates package handling, making sure that the FCHpackage directories exist and initiating the FCH packagetransaction log. FCH also builds a list of currently installedpackages, stops all resources on the current node anddecompresses the new packages. FCH also clears all resourceinstances from the service control database together with thecorresponding resource instance files.

f FCH builds a list of the new packages.g FCH prints the list of new packages to the operator and

prompts him to select packages to install.h Operator enters selection.i FCH updates the package install transaction log and installs

the selected package(s).j FCH builds a new list of packages, containing the remaining

new packages. Steps g to j are then iterated until all packageshave been selected or the opertor selects to leave the installmenu.

k FCH adds all resource instances in the service controldatabase together with the corresponding resource instancefiles.

l FCH rebuilds the PRC service start schedule file to be usedlater on in the switch over to the new system and makes abackup copy of the file.

m FCH once again verifies the system resources. It checks thatthe quorum resource is available, that the current node is thepassive node, that the current node is paused and that theother node is not paused.

n FCH prompts the operator to confirm that he wishes to switchto the new system configuration.

Page 9: AP Function Change

9(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

o The operator confirms or denies. If the operator denies, theFCH session is aborted.

p FCH sends an event specifying that this is a controlled FCHreboot, sets the state to Reboot and initiates a reboot of thesystem using the prcboot command and exits.

3.1.1.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an erroroccurs that FCH cannot handle, the session will be aborted and all changesmade to the system will be revoked. FCH uses the state machine states torevoke. Fallback is done, state is set to Reboot2 and system is rebooted toactivate old software. Examples of errors that cannot be handled are forinstance missing or failed system resources, missing or faulty input dataand node to node communication failure.

3.1.2 Deleting CXC packages

3.1.2.1 Description

Delete of CXC packages is initiated with the fchstart command using the -r option. All installed CXC’s will be displayed in a list from which the oper-ator may select CXC’s to delete from the system.

FCH is a state machine, implemented in a transactional way, so whenfchstart begins, the FCH state is set to Installing.

All existing rin files with it’s corresponding instances are removed from thesystem before CXC package updates. After installation, the rin files thatshould exist in the new system are added again. There are 2 ways to add orupdate rin-files: by using the -i option or installation using cxc installation.

After all updates have been performed, the operator is prompted to switchto the new system. If the operator selects n for no, the FCH session will beaborted and the node will be restored to its previous state.

If the operator selects y for yes, the node will be rebooted, after which theFCH server component will perform the switch over to the new system.

Note that this powerful option can really make a system unusable.

Page 10: AP Function Change

10(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.2.2 Flow of Events

Operator fchstart FCHLIBfchstart -r (a)

initial check (b)

ok

check resources (c)

initiate session (d)

get current pkg list (f)

list packages (g)

enter selection (h)

remove package (i)

get current pkg list (j)

rebuild config (k)

verify before switch (l)

prompt (m)

answer (n)

(loop)

initiate reboot (o)

ok

ok

ok

ok

ok

ok

ok

ok

initiate pkg handling (e)ok

ok

Page 11: AP Function Change

11(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

a operator executes fchstart -r.b FCH verifies that the cluster server is running, that no other

FCH session is going on, and that the FCH server is running.It also checks the amount of free disk space. It also makesbackup copies of FCH and ACS binaries used for secondaryexecution.

c FCH verifies all necessary system resources. It checks theconnection to the other node and the data disk, that no SoftFunction Change (SFC) is in progress, that the cluster quorumresource is available, that the current node is the passive node,that none of the nodes are paused, that all cluster resources areonline and that the FCH special directories are empty. It alsoflushes the cache of all disks.

d FCH initiates the new session by raising the alarm APFUNCTION CHANGE IN PROGRESS. It also pauses thecurrent node and makes a backup copy of the PRC servicestart schedule file.

e FCH initiates package handling, initiating the FCH packagetransaction log. FCH also builds a list of currently installedpackages and stops all resources on the current node.FCHalso clears all resource instances from the service controldatabase together with the corresponding resource instancefiles.

f FCH retrieves the list of the installed packages.g FCH prints the list of installed packages to the operator and

prompts him to select packages to remove.h Operator enters selection.i FCH updates the package remove transaction log and

removes the selected package(s).j FCH builds a new list of packages, containing the remaining

installed packages. Steps g to j are then iterated until allpackages have been selected or the opertor selects to leave theremove menu.

k FCH rebuilds the PRC service start schedule file to be usedlater on in the switch over to the new system and makes abackup copy of the file.

l FCH once again verifies the system resources. It checks thatthe quorum resource is available, that the current node is thepassive node, that the current node is paused and that theother node is not paused.

m FCH prompts the operator to confirm that he wishes to switchto the new system configuration.

n The operator confirms or denies. If the operator denies, theFCH session is aborted.

o FCH sends an event specifying that this is a controlled FCHreboot, sets Reboot state and initiates a reboot of the systemusing the prcboot command and exits.

Page 12: AP Function Change

12(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.2.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an erroroccurs that FCH cannot handle, the session will be aborted and all changesmade to the system will be revoked. FCH uses the state machine states torevoke. Fallback is done, state is set to Reboot2 and system is rebooted toactivate old software. Examples of errors that cannot be handled are forinstance missing or failed system resources, missing or faulty input dataand node to node communication failure.

3.1.3 Editing CXC parameters

3.1.3.1 Description

Edit of CXC parameters is initiated with the fchstart command using the -p option. All installed CXC’s with parameters will be displayed in a listfrom which the operator may select CXC’s which parameters he wants toedit. The parameter file of the selected CXC will be opened in a text editorwhere the operator may perform edits of parameter values. The editedparameter files are checked for syntax errors.

FCH is a state machine, implemented in a transactional way, so whenfchstart begins, the FCH state is set to Installing.

After all updates have been performed, the operator is prompted to switchto the new system. If the operator selects n for no, the FCH session will beaborted and the node will be restored to its previous state.

If the operator selects y for yes, the node will be rebooted, after which theFCH server component will perform the switch over to the new system.

Page 13: AP Function Change

13(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.3.2 Flow of Events

Operator fchstart FCHLIBfchstart -p (a)

initial check (b)

ok

check resources (c)

initiate session (d)

get current pkg list (f)

list packages (g)

enter selection (h)

edit parameter file (i)

syntax check par file (j)

update pha database (k)

verify before switch (l)

prompt (m)

answer (n)

(loop)

initiate reboot (o)

ok

ok

ok

ok

ok

ok

ok

ok

initiate pkg handling (e)ok

ok

Page 14: AP Function Change

14(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

a operator executes fchstart -p.b FCH verifies that the cluster server is running, that no other

FCH session is going on, and that the FCH server is running.It also checks the amount of free disk space. It also makesbackup copies of FCH and ACS binaries which is used forsecondary execution.

c FCH verifies all necessary system resources. It checks theconnection to the other node and the data disk, that no SoftFunction Change (SFC) and BUR is in progress, that thecluster quorum resource is available, that the current node isthe passive node, that none of the nodes are paused, that allcluster resources are online and that the FCH specialdirectories are empty. It also flushes the cache of all disks.

d FCH initiates the new session by raising the alarm APFUNCTION CHANGE IN PROGRESS. It also pauses thecurrent node and makes a backup copy of the PRC servicestart schedule file.

e FCH initiates package handling, initiating the FCH packagetransaction log.

f FCH builds a list of all installed packages that have PHAparameters.

g FCH prints the list of installed packages with PHAparameters to the operator and prompts him to selectpackages to edit.

h Operator enters selection.i The parameter file of the selected package is backed up and

opened in a text editor, the operator makes his changes, savesthe file and exits the editor.

j FCH syntax checks the edited file using the phacreatecommand. If the file contains syntax errors the operator isprompted to re-edit the file or abort the session. Steps g to jare iterated until the operator selects to leave the parameteredit menu.

k FCH updates the PHA database with the edited files using thephacreate command.

l FCH once again verifies the system resources. It checks thatthe quorum resource is available, that the current node is thepassive node, that the current node is paused and that theother node is not paused.

m FCH prompts the operator to confirm that he wishes to switchto the new system configuration.

n The operator confirms or denies. If the operator denies, theFCH session is aborted.

o FCH sends an event specifying that this is a controlled FCHreboot, sets Reboot state and initiates a reboot of the systemusing the prcboot command and exits.

Page 15: AP Function Change

15(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.3.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an erroroccurs that FCH cannot handle, the session will be aborted and all changesmade to the system will be revoked. FCH uses the state machine states torevoke. Fallback is done, state is set to Reboot2 and system is rebooted toactivate old software. Examples of errors that cannot be handled are forinstance missing or failed system resources, missing or faulty input dataand node to node communication failure.

3.1.4 Update of CXC parameters with offline editing

3.1.4.1 Description

Update of CXC parameters is initiated with the fchstart command using the-P option with a file as an argument. A The file must contain a full-path listof files to replace the existing CXC13NNNN.par files. The original files arebacked up and then replaced with the new files.

The updated parameter files are checked for syntax errors.

FCH is a state machine, implemented in a transactional way, so whenfchstart begins, the FCH state is set to Installing.

After all updates have been performed, the operator is prompted to switchto the new system. If the operator selects n for no, the FCH session will beaborted and the node will be restored to its previous state.

If the operator selects y for yes, state is set to Reboot and the node will berebooted, after which the FCH server component will perform the switchover to the new system.

Page 16: AP Function Change

16(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.4.2 Flow of Events

Operator fchstart FCHLIBfchstart -P <file> (a)

initial check (b)

ok

check resources (c)

initiate session (d)

get current pkg list (f)

update parameter files (h)

syntax check par files (i)

update pha database (j)

verify before switch (l)

prompt (m)

answer (n)

initiate reboot (o)

ok

ok

ok

ok

ok

ok

ok

ok

initiate pkg handling (e)ok

ok

build new list (g)ok

okprepare local fallback (k)

Page 17: AP Function Change

17(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

a operator executes fchstart -P <file> where <file> is a list ofnew CXCNNNNN.par parameter files. FCH checks <file>for: existance, existance of listed files and syntax errors oflisted files.

b FCH verifies that the cluster server is running, that no otherFCH session is going on, and that the FCH server is running.It also checks the amount of free disk space. It also makesbackup copies of FCH and ACS binaries which is used forsecondary execution.

c FCH verifies all necessary system resources. It checks theconnection to the other node and the data disk, that no SoftFunction Change (SFC) and BUR is in progress, that thecluster quorum resource is available, that the current node isthe passive node, that none of the nodes are paused, that allcluster resources are online and that the FCH specialdirectories are empty. It also flushes the cache of all disks.

d FCH initiates the new session by raising the alarm APFUNCTION CHANGE IN PROGRESS. It also pauses thecurrent node and makes a backup copy of the PRC servicestart schedule file.

e FCH initiates package handling, initiating the FCH packagetransaction log.

f FCH builds a list from all installed packages that have PHAparameters.

g FCH builds a list from the new parameter files listed in <list>.h The old parameter files of the new list are backed up and new

ones inserted instread.i FCH syntax checks the new files using the phacreate

command.j FCH updates the PHA database with the edited files using the

phacreate command.k The parameter files in the FCH system directory are saved on

the other node to be able to do a local fallback on that node.l FCH once again verifies the system resources. It checks that

the quorum resource is available, that the current node is thepassive node, that the current node is paused and that theother node is not paused.

m FCH prompts the operator to confirm that he wishes to switchto the new system configuration.

n The operator confirms or denies. If the operator denies, theFCH session is aborted.

o FCH sends an event specifying that this is a controlled FCHreboot, sets Reboot state and initiates a reboot of the systemusing the prcboot command and exits.

Page 18: AP Function Change

18(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.4.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an erroroccurs that FCH cannot handle, the session will be aborted and all changesmade to the system will be revoked. FCH uses the state machine states torevoke. Fallback is done, state is set to Reboot2 and system is rebooted toactivate old software. Examples of errors that cannot be handled are forinstance missing or failed system resources, missing or faulty input dataand node to node communication failure.

3.1.5 Replacing LBB files

3.1.5.1 Description

Replacing of LBB file, i.e. any file in the system, is initiated with thefchstart command using the -l option and a file as argument. The file mustcontain a list of files to replace and the files to replace them, each file pairseparated by a semi-colon. If the file to replace does not exist, it is assumedthat it’s a new file. The original files are backed up and then replaced withthe new files. If the original file didn’t exist, an empty file is created asbackup. The empty file will then later be removed should fallback havehappened.

FCH is a state machine, implemented in a transactional way, so whenfchstart begins, the FCH state is set to Installing.

After all updates have been performed, the operator is prompted to switchto the new system. If the operator selects n for no, the FCH session will beaborted and the node will be restored to its previous state.

If the operator selects y for yes, the node will be rebooted, after which theFCH server component will perform the switch over to the new system.

Page 19: AP Function Change

19(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.5.2 Flow of Events

a operator executes fchstart -l <arg>, where <arg> is a filecontaining the list of files to be replaced and the files toreplace them.

b FCH verifies that the cluster server is running, that the filegiven as argument exists, that no other FCH session is goingon, and that the FCH server is running. It also checks theamount of free disk space and that vaild BUR backup filesexist. It also makes backup copies of FCH and ACS binaries.

c FCH verifies all necessary system resources. It checks theconnection to the other node and the data disk, that no SoftFunction Change (SFC) is in progress, that the cluster quorumresource is available, that the current node is the passive node,

Operator fchstart FCHLIBfchstart -l <arg> (a)

initial check (b)

ok

check resources (c)

initiate session (d)

replace LBB files (f)

verify before switch (g)

prompt (h)

answer (i)

initiate reboot (j)

ok

ok

ok

ok

ok

initiate pkg handling (e)ok

ok

Page 20: AP Function Change

20(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

that none of the nodes are paused, that all cluster resources areonline and that the FCH special directories are empty. It alsoflushes the cache of all disks.

d FCH initiates the new session by raising the alarm APFUNCTION CHANGE IN PROGRESS. It also pauses thecurrent node and makes a backup copy of the PRC servicestart schedule file.

e FCH initiates package handling, initiating the FCH packagetransaction log.

f FCH makes backup copies of the files to be replaced, and thenreplaces them.

g FCH once again verifies the system resources. It checks thatthe quorum resource is available, that the current node is thepassive node, that the current node is paused and that theother node is not paused.

h FCH prompts the operator to confirm that he wishes to switchto the new system configuration.

i The operator confirms or denies. If the operator denies, theFCH session is aborted.

j FCH sends an event specifying that this is a controlled FCHreboot, initiates a reboot of the system using the prcbootcommand and exits.

3.1.5.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an erroroccurs that FCH cannot handle, the session will be aborted and all changesmade to the system will be revoked. FCH uses the state machine states torevoke. Fallback is done, state is set to Reboot2 and system is rebooted toactivate old software. Examples of errors that cannot be handled are forinstance missing or failed system resources, missing or faulty input dataand node to node communication failure.

3.1.6 Adding and replacing resource instance files

3.1.6.1 Description

Some products, such as FOS, has a need to create several instances of acluster resource service. These instances are called resource instances. Theresource instances are described in resource instance files and are comple-ments to the original service described in the ACS_PRC_Config file

Adding or replacing of a resource instance file, a so called .rin file can beseen as a special case of adding or replacing of a LBB file. The same logicis used with some extra handling. Adding and/or replacing of a .rin file isinitiated with the fchstart command using the -i option and a file as argu-ment. The file must contain a list of resource instance files to replace and/oradd and the files to replace them, each file pair separated by a semi-colon.

Page 21: AP Function Change

21(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

If the file to replace does not exist, it is assumed a new file. The originalfiles are backed up and then replaced with the new files. If the original filedidn’t exist, an empty file is created as backup. The empty file will thenlater be removed should fallback have happened.

All existing rin files with it’s corresponding instances are removed from thesystem before CXC package updates. After installation, the rin files thatshould exist in the new system are added again. There are 2 ways to add orupdate rin-files: by using the -i option or installation using cxc installation.

FCH is a state machine, implemented in a transactional way, so whenfchstart begins, the FCH state is set to Installing.

After all updates have been performed, the operator is prompted to switchto the new system. If the operator selects n for no, the FCH session will beaborted and the node will be restored to its previous state.

If the operator selects y for yes, the node will be rebooted, after which theFCH server component will perform the switch over to the new system.

Page 22: AP Function Change

22(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.6.2 Flow of Events

a operator executes fchstart -i <arg>, where <arg> is a filecontaining the list of files to be replaced and the files toreplace them.

b FCH verifies that the cluster server is running, that the filegiven as argument exists, that no other FCH session is goingon, and that the FCH server is running. It also checks theamount of free disk space. It also makes backup copies ofFCH and ACS binaries.

c FCH verifies all necessary system resources. It checks theconnection to the other node and the data disk, that no SoftFunction Change (SFC) is in progress, that the cluster quorumresource is available, that the current node is the passive node,

Operator fchstart FCHLIBfchstart -l <arg> (a)

initial check (b)

ok

check resources (c)

initiate session (d)

replace LBB files (f)

verify before switch (g)

prompt (h)

answer (i)

initiate reboot (j)

ok

ok

ok

ok

ok

ok

ok

exec prcgen & prcconf

Page 23: AP Function Change

23(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

that none of the nodes are paused, that all cluster resources areonline and that the FCH special directories are empty. It alsoflushes the cache of all disks.

d FCH initiates the new session by raising the alarm APFUNCTION CHANGE IN PROGRESS. It also pauses thecurrent node and makes a backup copy of the PRC servicestart schedule file.

e FCH clears all resource instances from the service controldatabase together with the corresponding resource instancefiles. FCH makes backup copies of the files to be replaced,and then replaces them.

f prcgen is executed to create the PRC_Config file with theupdated resource instances. prcconf is run to create the newcluster database.

g FCH once again verifies the system resources. It checks thatthe quorum resource is available, that the current node is thepassive node, that the current node is paused and that theother node is not paused.

h FCH prompts the operator to confirm that he wishes to switchto the new system configuration.

i The operator confirms or denies. If the operator denies, theFCH session is aborted.

j FCH sends an event specifying that this is a controlled FCHreboot, initiates a reboot of the system using the prcbootcommand and exits.

3.1.6.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an erroroccurs that FCH cannot handle, the session will be aborted and all changesmade to the system will be revoked. FCH uses the state machine states torevoke. Fallback is done, state is set to Reboot2 and system is rebooted toactivate old software. Examples of errors that cannot be handled are forinstance missing or failed system resources, missing or faulty input dataand node to node communication failure.

3.1.7

3.1.8 Upgrading LBB and 3pp software

3.1.8.1 Description

Upgrade of LBB and 3pp software is initiated with the fchstart commandusing the -L option. After initial checking a LBBShell> command promptis displayed, and the operator may enter commands and execute upgradepackages. Reboots are also permitted during this phase.

Page 24: AP Function Change

24(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

FCH is a state machine, implemented in a transactional way, so whenfchstart -L begins, the FCH state is set to LBBReboot. This state is kept aslong as the operator chooses to install and reboot. When all reboots are doneand the operator types “l”, the state is set to Installing.

After all updates have been performed, the operator is prompted to switchto the new system. If the operator selects n for no, the FCH session will beaborted and the node must be restored using the fchrst component.

If the operator selects y for yes, the node will be rebooted, after which theFCH server component will perform the switch over to the new system.

Page 25: AP Function Change

25(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.8.2 Flow of Events

a operator executes fchstart -L.b FCH verifies that the cluster server is running, checks if a

LBB upgrade reboot has been done, that no other FCHsession is going on, and that the FCH server is running. It alsochecks the amount of free disk space and that vaild BURbackup files exist. It also makes backup copies of FCH andACS binaries.

c FCH verifies all necessary system resources. It checks theconnection to the other node and the data disk, that no SoftFunction Change (SFC) is in progress, that the cluster quorum

Operator fchstart FCHLIBfchstart -L (a)

initial check (b)

ok

check resources (c)

initiate session (d)

LBBShell prompt (f)

continue session (h)

verify before switch (i)

prompt (j)

answer (k)

(loop)

initiate reboot (l)

ok

ok

enter command (g)

ok

ok

ok

initiate pkg handling (e)ok

ok

Page 26: AP Function Change

26(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

resource is available, that the current node is the passive node,that none of the nodes are paused, that all cluster resources areonline and that the FCH special directories are empty. It alsoflushes the cache of all disks.

d FCH initiates the new session by setting a registry valueindicating that this is a LBB upgrade session, raising thealarm AP FUNCTION CHANGE IN PROGRESS. It alsopauses the current node and makes a backup copy of the PRCservice start schedule file.

e FCH initiates package handling and the FCH packagetransaction log, stops all cluster resources on the current nodeand sets the FCH state to LBBReboot1.

f FCH displays the LBBShell> prompt to the operator.g Operator enters a DOS command. Steps f and g are iterated

until the operator has performed all updates he wishes to do.h FCH sets the state to Installing and continues the FCH

session. Installation and removal of CXC’s, parameter editsetc can be performed here, just as in a normal FCH session..

i FCH once again verifies the system resources. It checks thatthe quorum resource is available, that the current node is thepassive node, that the current node is paused and that theother node is not paused.

j FCH prompts the operator to confirm that he wishes to switchto the new system configuration.

k The operator confirms or denies. If the operator denies, theFCH session is aborted.

l FCH sends an event specifying that this is a controlled FCHreboot, initiates a reboot of the system using the prcbootcommand and exits.

3.1.8.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an erroroccurs that FCH cannot handle, the session will be aborted, but differentfrom other use cases the node will not be automatically be restored. Thefchrst component must be used to acheive this. Examples of errors thatcannot be handled are for instance missing or failed system resources,missing or faulty input data and node to node communication failure.

3.1.9 Switch over

3.1.9.1 Description

Switching over to the new system, i.e. making the upgraded node the activenode to test the new configuration is handled by the FCH server componentafter fchstart has performed a reboot. As the server component starts upafter a reboot, it checks if this was a reboot after a fchstart (Reboot state) ,and if so proceeds to send events to both nodes about the successful FCH

Page 27: AP Function Change

27(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

reboot. FCH sends events about switchover attempt and goes into sleep for(default) 5 minutes. FCH checks if cluster configuration change is necces-sary and sets Failover1 state. All cluster resources are stopped, both nodesare resumed and Move1 state is set. Common resource groups are movedto the modified node. If configuration change is neccessary, Config1 stateis set and current configuration is deleted, state is set Config1B and newconfiguration is added. This node is resumed and other node paused. Super-vision state is set and FCH starts all the resources on upgraded node.Current node is paused and other resumed. Other node resource are started.Then current node is resumed and other paused. Finally sends an eventsaying that switchover was successful and that supervision has now begun.

Page 28: AP Function Change

28(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.9.2 Flow of Events

a The supervisor thread is started at boot. It starts by checkingthat it can access the cluster, checking for cyclic reboots,checking and setting timer values , sets a mutex to preventother FCH processes from being executed and the FCH state

FCH Server FCHLIBFCH Supervisor

initiate supervisor (a)

handle orphan resources (b)

report boot (c)

initiate switch over (d)

report switch attempt (e)

check PRC config (f)

stop cluster (g)

fail over (h)

edit cluster db (i)

prepare cluster start (j)

start current node (k)

start other node (l)

report switch (m)

Page 29: AP Function Change

29(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

and if it is executing on the passive or active node. In this usecase its on the passive node and the FCH state is Reboot, i.e.the node is coming back up after a fchstart boot.

b the supervisor checks for orphan resources, i.e. clusterresource groups that have no owner node and assigns them tothe correct node.

c the supervisor sends an event about the successful fchstartreboot and creates a reboot file to prevent PRC from countingthis reboot as a spontaneous reboot.

d the supervisor calls the switchover function and state is set toFailover1.

e switchover sends an event reporting that a switch attempt isin progress.

f switchover checks if the PRC service start schedule has beenmodified, and sets a boolean if it has to indicate that thecluster database needs to be updated (done in step i). In thisuse case the file has been changed.

g switchover stops all cluster resources. Any resource that failsto come offline is subsequently killed.

h the current node is resumed, the FCH state set to Move1 andthe resource groups are moved to the current node, making itthe active node.

i switchover sends an event reporting that it’s about the edit thecluster database. The old configuration is then removed andthe new configuration inserted. The FCH state is set toConfig1B and an event is sent reporting that the clusterdatabase was sucessfully edited.

j the FCH state is set to Failover2, the PRC reboot log file isupdated to indicate that the last reboot was a FCH reboot. Thecurrent node is then paused.

k the current (now active) node is resumed (and the otherpaused), the FCH state set to Supervision, and the clusterresources on the current node are started. FCH first waits forPRC cluster control to come online, and then verifies that allother resources also have come online.

l the current node is now paused and the other resuemed. Thecluster resources on the other (now passive) node are startedand the switchover function waits for them to come online.After this the other node is paused and this one resumed.

m the supervisor moves the alarm AP FUNCTION CHANGEIN PROGRESS to the new active node by ceasing it on theother node and raising it again on the current node. Finally, itsends an event reporting that the FCH switch over wassuccessful and that the system is now in supervision mode.

Page 30: AP Function Change

30(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.1.9.3 Exceptional Flow of Events

Non-stopping errors are logged to the FCH log file. Stopping errors arelogged and an event will be sent reporting what error occurred. Examplesof stopping errors is resources failing to come online during switch overand failed cluster database updates.

3.2 ENDING A SESSION

3.2.1 Commit new system

3.2.1.1 Description

The commit command’s main purpose is to establish that the new AP iscommitted and that there is no way back. It also copies the data needed toupgrade the old node. It can be called in various situations: to commit andcopy again after the old node has been restored, to copy and commit afteran interrupted fchcommit command.

Committing the new system configuration is initiated with the fchcommitcommand during the supervision period (i.e. after a successful switch over).This will end the supervision period and copy all necessary data, such asCXC packages and parameter files, to the other node. LBB and 3pp soft-ware packages will not be copied however. These must be transferred to theother node by the operator.

Page 31: AP Function Change

31(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.2.1.2 Flow of Events

a operator executes fchcommit.b FCH verifies the the cluster server is running, that the FCH

state is Supervision and that no other FCH processes arerunning.

c FCH verifies that the connections to the other node and thedata disk is OK, that the quorum resource is available, that thecurrent node is the active node, that the current node is notpaused and that the other node is paused.

d FCH makes backup copies of the FCH and ACS binaries tothe other node. State is set to Committing.

Operator fchcommit FCHLIB

fchcommit (a)

initial check (b)

verify before commit (c)

backup system binaries (d)

commit packages (e)

commit cluster config (f)

commit LBB files (g)

commit parameters (h)

finish commit (i)

ok

ok

ok

ok

ok

ok

ok

ok

ok

Page 32: AP Function Change

32(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

e FCH commits the CXC packages by copying new packagesto the other node and updating the FCH transaction log on theother node to prepare for installation and removal duringfchend.

f The backup of the old PRC service start schedule file isdeleted.

g all edited LBB files are copied to the other node for laterinstallation during fchend.

h all edited parameter files are copied to the other node for laterupdate of the PHA database during fchend.

i the FCH state is set to CommitDone to indicate that thecommit was successfully ended.

3.2.1.3 Exceptional Flow of Events

If the fchcommit command fails or is interrupted, it will simply exit with anerror message. The command may then be executed again to finish thecommit. Failures during commit are normally node to node communicationerrors, preventing the command from copying files.

3.2.2 Fall back to previous system

3.2.2.1 Description

Fall back to the system that existed before the FCH session was started isinitiated with the fchfb command. This will raise the FCH failed alarm, stopall resources, restore the cluster configuration, move the resource groupsback to the unmodified node and start the resources again, in essenceperforming a reversed switch over.

After this, if no LBB/3pp software was upgraded, the command undoing allchanges made during the fchstart command, deleting new CXC’s, rein-stalling old CXC and restoring modified files. Finally the node is rebooted.

If LBB/3pp software was upgraded, the command only restores the clusterconfiguration and performs the reversed switchover. After this the operatoris prompted to execute fchrst, and the command exits.

After the fall back has been performed (or, in the case of LBB/3pp upgrade,the restore) the session must be ended using the fchend component.

Page 33: AP Function Change

33(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.2.2.2 Flow of Events

Page 34: AP Function Change

34(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

Operator fchcommit FCHLIB

fchfb (a)initial check (b)

verify berfore fallback (c)

prompt for fallback (d)

answer (e)

initiate fallback (f)

prepare for switch (g)

delete cluster config (h)

fail over (i)

insert cluster config (j)

start other node (k)

remove instances, rin-files(l)

fall back LBB files (m)

fall back packages (n)

add old instances, rin-files(o)

fallback CXC parameters (p)

ok

ok

ok

ok

ok

ok

ok

ok

ok

ok

ok

ok

ok

ok

Added LBB files removed (q)

ok

Add node cluster config (r)

ok

finish fallback & reboot (s)ok

Page 35: AP Function Change

35(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

a operator executes fchfbb FCH verifies that the cluster server is running, that the FCH

server is running on both nodes, checks that no other fall backor FCH process is running and sets a mutex to prevent otherFCH processes from being started and checks that the FCHstate is Supervision.

c FCH checks that the quorum resource is availble, that thecurrent node is the active node, that the FCH state isSupervision, that the current node is not paused and that theother node is paused.

d FCH prompts the operator to perform fall back.e operator confirms or denies. If operator denies the fall back is

aborted.f FbFailover1 state is set. The cluster resources are

stopped.Any resource that fails to come offline issubsequently killed. FCH checks if the PRC service startschedule file has changed and sets a boolean to indicate if thecluster configuration is to updated later on. It also ceases thealarm AP FUNCTION CHANGE IN PROGRESS and raisesthe alarm AP FUNCTION CHANGE FAILED.

g FCH copies the old PRC service start schedule file to theother node and sends an event to report that a switch attemptis in progress.

h the FCH state is set to Config3, the new cluster configurationis deleted and the FCH state then set to Move2.

i Both nodes are resumed and FCH moves all cluster resourcegroups to the other node. Upgraded node is paused again.

j the FCH state is set to Config4, the old cluster configurationis inserted on the old other node, the FCH state is set toFbFailover2, the PRC reboot log file is updated to indicatethat the comming reboot is a FCH reboot. The current node isthen paused.

k the cluster resources on the old other node are taken online.FCH waits for all resources to come online.

l All instances of services corresponding to existing rin filesare removed

m any replaced LBB files are restored.n CXC packages that were changed (installed, updated,

removed) are restored.o the old instances of services with it’s corresponding rin-files

are restoredp changed CXC parameters are restored.q any added LBB files are removed.r the FCH state is set to Config5, an event is sent reporting that

an attempt to edit the cluster database is in progress, thecluster configuration for the current node is deleted and theold configuration is reinserted.

Page 36: AP Function Change

36(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

s the FCH state is set to FbFailover3, an event reporting that thefallback was successful is sent and the old PRC service startschedule files on both nodes are deleted. The FCH state isthen set to FbReboot, an event is sent to report that a rebootafter fall back is pending and the node is rebooted.

3.2.2.3 Exceptional Flow of Events

If the fchfb command fails, the fchrst command must be used to restore thenode. In extreme cases, where even the fchrst does not work, a full BURemergency restore of both nodes may be necessary.

3.2.3 End successful FCH session

3.2.3.1 Description

Ending of a successful FCH session, i.e. after a successful fchcommit wasexecuted, is initiated with the fchend command. The command willupgrade the unmodified node to the same state as the upgraded node.Removed CXC’s will be deleted, new and upgraded CXC’s installed andedited files updated.

State is set to EndInstalling when upgrade starts.

After this, the command reboots the node and the supervisor componentends the session after the reboot.

Page 37: AP Function Change

37(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.2.3.2 Flow of Events

a operator executes fchend.b FCH verifies that the cluster server is running, that the FCH

state is CommitDone, that no other FCH processes arerunning and that the FCH server is running on both nodes. Italso checks the amount of available diskspace and makesbackup copies of the FCH binaries.

c FCH verifies that the current node is the passive node, that thecurrent node is paused and that the other node is not paused.It also rebuilds the PRC service start schedule file.

d FCH sends an event reporting that a swich attempt is inprogress, stops all cluster resources on the current node andsets the FCH state to EndInstalling.

e FCH removes the current instances based on the rin files andremoves the rinfiles themselves. FCH replaces all LBB filesthat were changed during fchstart on the other node.

f FCH installs, upgrades and removes the CXC packages thatwere changed during fchstart, making the current nodesoftware configuration identical to the other node.Setupservices is run to setup random users for services.

Operator fchend FCHLIB

fchend (a)initial check (b)

initiate end session (c)

prepare upgrade (d)

rm rinf. commit LBB files(e)

commit packages(f)

commit parameters + rin(g)

change cluster config (h)

end session (i)

ok

ok

ok

ok

ok

ok

ok

ok

ok

Page 38: AP Function Change

38(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

g FCH updates the CXC parameter files and the PHA databaseto the same status as on the other node. All instances basedoncurrent rinfiles are added.

h FCH rebuilds the PRC service start schedule file and checksif the file has changed. In this use case it has changed. FCHsets the FCH state to Config2, sends an event reporting thatthe cluster database is about to be edited, deletes the oldcluster configuration, sends another edit event, sets the FCHstate to Config2B, adds the new cluster configuration and setsthe FCH state to EndInstallDone.

i FCH sends an event reporting that the switch attempt wassuccessfull, sets the FCH state to EndReboot, sends an eventreporting that a reboot is about to take place and reboots thenode.

3.2.3.3 Exceptional Flow of Events

If the fchend command fails, the fchrst command may be used to restore thenode. The fchcommit command may then be executed again, and then thefchend command can be executed once more to upgrade the unmodifiednode.

3.2.4 End a successful FCH session with LBB/3pp software upgrade

3.2.4.1 Description

Ending of a successful FCH session where LBB/3pp software wasupgraded is initiated with the fchend command. The command will displaythe LBBShell> prompt so that the operator may perform the same upgradeson this node as well. Just as during the fchstart command, reboots arepermitted at the LBBShell> prompt, and the operator may continue thesession after the reboot by executing the command again.

When LBB upgrade starts, state is set to LBBReboot2.

After LBB upgrades have been finished, any other changes made during theFCH session, i.e. CXC install, delete or parameter changes, will also beperformed. After this, the command reboots the node and the supervisorcomponent ends the session after the reboot.

Page 39: AP Function Change

39(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.2.4.2 Flow of Events

a operator executes fchend.b FCH verifies that the cluster server is running, checks if LBB

software is to be upgraded (in this use case it is), that the FCHstate is CommitDone, that no other FCH processes arerunning and that the FCH server is running on both nodes. Italso checks the amount of available diskspace and makesbackup copies of the FCH binaries.

c FCH verifies that the current node is the passive node, that thecurrent node is paused and that the other node is not paused.It also rebuilds the PRC service start schedule file.

d FCH sends an event reporting that a swich attempt is inprogress, stops all cluster resources on the current node andsets the FCH state to LBBReboot2.

e FCH displays the LBBShell> prompt to the operator.f the operator enters a DOS command. The steps e and f are

interated until the operator selects to leave the menu or abortthe session.

Operator fchend FCHLIB

fchend (a)initial check (b)

initiate end session (c)

prepare upgrade (d)

commit packages (g)

commit LBB files (h)

commit parameters (i)

change cluster config (j)

end session (k)

ok

ok

ok

ok

ok

ok

ok

ok

ok

LBBShell prompt (e)

command (f) (loop)

Page 40: AP Function Change

40(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

g FCH installs, upgrades and removes the CXC packages thatwere changed during fchstart, making the current nodesoftware configuration identical to the other node.

h FCH replaces all LBB files that were changed during fchstarton the other node.

i FCH updates the CXC parameter files and the PHA databaseto the same status as on the other node.

j FCH rebuilds the PRC service start schedule file and checksif the file has changed. In this use case it has changed. FCHsets the FCH state to Config2, sends an event reporting thatthe cluster database is about to be edited, deletes the oldcluster configuration, sends another edit event, sets the FCHstate to Config2B, adds the new cluster configuration and setsthe FCH state to EndInstallDone.

k FCH sends an event reporting that the switch attempt wassuccessfull, sets the FCH state to EndReboot, sends an eventreporting that a reboot is about to take place and reboots thenode.

3.2.4.3 Exceptional Flow of Events

If the fchend command fails or the operator aborts the session, the fchrstcommand may be used to restore the node. The fchcommit command maythen be executed again, and then the fchend command can be executed oncemore to upgrade the unmodified node.

3.2.5 End failed FCH session after fall back

3.2.5.1 Description

If the session was ended with fchfb, the command performs a last clean upand ends the session, thereby enabling a new session to be initited with thefchstart command.

Page 41: AP Function Change

41(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.2.5.2 Flow of Events

a operator executes fchend.b FCH verifies that the cluster server is running, that the FCH

state is End, that no other FCH process is executing and thatthe FCH server is running on both nodes. It also makes abackup of the FCH binaries.

c FCH verifies that the current node is the passive node.d if the current node is paused, FCH resumes it. If the cluster

resource groups are stopped FCH starts them and verifies thatthey come online.

e FCH ceases the alarm AP FUNCTION CHANGE FAILED,resets the LBB upgrade keys in the registry and sets the FCHstate to noFCH.

f FCH deletes the old PRC service start schedule files on bothnodes and cleans up the FCH temporary directories.

3.2.5.3 Exceptional Flow of Events

Since this use case involves very few and quite simple operations, errorsrarely occurs. Should the command fail or be interrupted it can be executedagain.

3.2.6 Fall back using FCH restore

Operator fchend FCHLIB

fchend (a)initial check (b)

initiate end session (c)

resume node (d)

end session (e)

clean up (f)

ok

ok

ok

ok

ok

ok

Page 42: AP Function Change

42(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.2.6.1 Description

FCH restore is initiated with the command fchrst. The backup image mustreside in the partitions for both nodes. The command first restores thecluster configuration to its previous state and then proceeds to make theunmodified node acticve.

Before the state CommitDone, the newly upgraded node will be restored, ifneccessary. From the state CommitDone and later, the old node will berestored if neccessary.

After this, the command prints a message on the screen on how to proceedwith the restore using BUR, and then exits. Operator uses BUR to restorethe node from the backup image, and then reboots the node.

After the boot the fchend command must be used to end the session.

In this particaular use case a fchstart with LBB software upgrade has beenperformed sucessfully, but the operator wants to fall back anyway. TheFCH state is Supervision and the operator initiates the command from thepassive (unmodified) node.

Note that the reason for that fchrst must start from a certain node lies in it’sadaption to the old BUR. In the future it should be possible to start fchrstfrom any node.

Page 43: AP Function Change

43(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.2.6.2 Flow of Events

a the operator executes fchrst from the passive node.b FCH verifies that the cluster server is running, and that no

other FCH processes are running. If another FCH process isdetected the operator is prompted whether to continueanyway. FCH also checks the FCH state and verifies that thecommand is being executed on the correct node (this dependson which FCH state we are in).

c The operator is also prompted if he is sure he wants tocontinue with the operation.

d the FCH state is set to FbFailover1.e the FCH alarms are moved to the current node.f FCH stops all cluster resources and verify that they all come

offline.

Operator fchrst FCHLIB

fchrst(a)initial check (b)

prompt(c)

handle supervision (d)

handle alarms (e)

stop cluster (f)

check cluster config (g)

handle switch over (h)

check cluster config (i)

start current node (j)

initiate restore (k)

ok

ok

ok

ok

ok

ok

ok

ok

ok

ok

ok

Page 44: AP Function Change

44(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

g FCH checks if the PRC service start schedule file haschanged. In this use case it has not changed so the clusterconfiguration does not need to be updated. The FCH state isset to Move2.

h FCH resumes both nodes, moves the cluster resource groupsto the current node and pauses the other node.

i FCH again checks the PRC service schedule file for changesand then sets the FCH state to FbFailover2.

j FCH verifies that the other node is paused, and if it is not itpauses it. FCH start the cluster resources on the current nodeand verifies that they all come online.

k FCH sets the FCH state to Restore and lets the user restore theother node.

3.2.6.3 Exceptional Flow of Events

Should the fchrst command fail, a BUR emergency restore of both nodeswill be necessary.

3.2.7 End failed FCH session after restore

3.2.7.1 Description

If the session was ended with a FCH restore, the fchend command makessure the cluster configuration is correct, does a final clean up and ends thesession, thereby enabling a new session to be started.

The state is Restore when fchend starts.

Page 45: AP Function Change

45(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.2.7.2 Flow of Events

a operator executes fchend.b FCH verifies that the cluster server is running, that the FCH

state is End, that no other FCH process is executing and thatthe FCH server is running on both nodes. It also makes abackup of the FCH binaries.

c FCH verifies that the current node is the passive node.d FCH copies the PRC service start schedule file from the other

node, checks if they differ with the PRC service start schedulefile on the current node. In this use case they differ. FCH thenstops all cluster resources on the current node, sends an eventreporting that the cluster database is about to be edited,deletes the old cluster configuration, sends another edit eventand adds the new cluster configuration.

e if the current node is paused, FCH resumes it. If the clusterresource groups are stopped FCH starts them and verifies thatthey come online.

f FCH ceases the alarm AP FUNCTION CHANGE FAILED,resets the LBB upgrade keys in the registry and sets the FCHstate to noFCH.

g FCH deletes the old PRC service start schedule files on bothnodes and cleans up the FCH temporary directories.

Operator fchend FCHLIB

fchend (a)initial check (b)

initiate end session (c)

resume node (e)

end session (f)

clean up (g)

ok

ok

ok

ok

ok

ok

check cluster config (d)

ok

Page 46: AP Function Change

46(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.2.7.3 Exceptional Flow of Events

This is similar to the case of ending a session after fall back, but there is acritical point if the PRC service start schedule file has changed. If thecluster configuration edit fails there is a risk of single point of failure whereyou might be forced to do a BUR Restore.

3.2.8 Re-commit (re-install) new system using FCH restore

3.2.8.1 Description

If fchend should fail when committing a new system configuration, theFCH restore component can be used to restore the uncommitted node, i.e.the node where fchend failed. The fchrst command is initiated from theactive, committed node and it uses BUR to restore the uncommitted node.

When the node has been restored, the fchcommit command can be executedagain as normally followed by the fchend command to re-commit the newsystem.

In this use case the fchend command was interrupted in the middle ofinstalling CXC packages, the FCH state is EndInstalling. The operatorexecutes fchrst from the active node with the backup file for the passivenode as argument.

Note that fchrst is adapted to the old BUR and in the future it should bepossble to run from both nodes, not just one.

Page 47: AP Function Change

47(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3.2.8.2 Flow of Events

a the operator executes fchrst on the active node.b FCH verifies that the cluster server is running, and that no

other FCH processes are running. If another FCH process isdetected the operator is prompted whether to continueanyway. FCH also checks the FCH state and verifies that thecommand is being executed on the correct node (this dependson which FCH state we are in).

c The operator is prompted if he is sure he wants to continuewith the operation.

d the FCH state is set to Restore2.e FCH sets the FCH state to Restore2 and lets the operator

execute BUR to restore the other old node.f after the restore is finished, the operator executes fchcommit.

See chapter 3.2.1.g after the fchcommit, the operator executes fchend to finish the

session. See chapter 3.2.3.

3.2.8.3 Exceptional Flow of Events

Should the fchrst command fail, a BUR emergency restore of both nodeswill be necessary.

Operator fchrst FCHLIB

fchrst (a)initial check (b)

prompt operator(c)

handle EndInstalling (d)

initiate restore (e)

ok

ok

ok

ok

ok

fchcommit fchend

fchcommit (f)

fchend (g)ok

ok

Page 48: AP Function Change

48(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

4 STRUCTURE

The block FCH consists of three software units, two CAA and one CXC, asshown below.

CNZ 222 59 FCH SW ProductCAA 109 0261 FCH executables source containerCAA 109 0262 FCH library source containerCXC 137 413 FCH binaries container

4.1 RESPONSIBILITIES

4.1.1 FCHEXESRC

This software unit implements the FCH user interface commands, internalcommands and supervisor component.

4.1.2 FCHLIBSRC

This software unit implements the FCH core functionality Dynamic LinkLibrary (DLL) used by the FCHEXESRC components.

4.2 INTERFACES

FCH has no external interface.

5 SOFTWARE UNITS

5.1 FCHEXESRC

5.1.1 Components

The FCH executables consist of seven components: ACS_FCH_Server,fchcommit, fchend, fchevent, fchfb, fchrst and fchstart.

FCH

CNZ 222 59

FCHEXESRC

CAA 109 0261

FCHLIBSRC

CAA 109 0262FCHBIN

CXC 137 413

Page 49: AP Function Change

49(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

5.1.1.1 ACS_FCH_Server

This component is implemented as a Windows NT Service and contain twosub components:

1 FCH remote execute server - a named pipe to allow FCHcomponents to execute commands on the other node. Thefollowing commands can be executed remotely:

a sync - flush file system cache to disk for a specified volume.b prcconf - change cluster configuration.c fchevent - send event or alarm.d kill - terminate a process.e prcboot - reboot the node.f test - test the communication one pipe bidirectional.g test2 - test the communication two pipes bidirectional.

2 FCH supervisor - handles reboot and crashes during a FCHsession. It performs different actions depending on the FCHstate:

a Perform switch over after a reboot initiated by fchstart.b Perform clean up and alarm raising and ceasing after reboot

initiated by fchfb and fchend.c Handle fall back and clean up in case of uncontrolled reboot

during FCH session.

5.1.1.2 fchcommit

This component ends the supervision period and copies all necessary data,such as CXC software packages and parameter files, to the other node toprepare it for upgrade. LBB and 3pp software upgrades are not copiedhowever. They must be transferred manually by the operator.

5.1.1.3 fchend

This component has two functionalities, to upgrade the old node after acommit, and to clean up after a FCH session that ended with fallback orrestore.

5.1.1.4 fchevent

This component is used to send event, raise alarms and cease alarms. Allevent and alarm handling in FCH has been implemented in this componentto minimize dependencies between FCH and other ACS components suchas AEH.

Page 50: AP Function Change

50(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

5.1.1.5 fchfb

This component allows the FCH session to be aborted and reverts thesystem to the configuration that existed before the FCH was initiated. Incase of a normal FCH of CXC software or parameters, it performs acomplete return to the old system. If LBB and 3pp software was upgraded,it performs a switch over, i.e. it returns control of the system to the unmodi-fied node, but does not restore software or parameters, which must behandled by fchrst in this case.

5.1.1.6 fchrst

This component is used to restore one node using the BUR restore function-ality. It is used to achieve fall back when LBB/3pp software has beenupgraded, to restore the old node if fchend fails in order to allow re-commitof the new system, and generally to handle severe errors during a FCHsession where normal FCH functionality cannot restore the system - forinstance if a blue screen occurs during fchstart.

5.1.1.7 fchstart

This component is used to initiate a FCH session and to upgrade the firstnode. It allows the operator to select CXC packages for install and delete,edit CXC parameter files online or offline, add or replace LBB files, add orreplace resource instance files and upgrade LBB and 3pp software.

5.2 FCHLIBSRC

5.2.1 Classes

The FCH library is implemented as a Windows DLL and provides themajor part of the FCH functionality. It contains eight classes, as describedbelow.

5.2.1.1 ACS_FCH_ClusterControl

This class implements the methods in FCH to control the MSCS. Thisincludes starting and stopping of resources, ordered failover, node andresource status control, reconfiguration of the cluster database via PRC, andpausing and resuming cluster nodes.

5.2.1.2 ACS_FCH_Common

This class is the base class and implements common functions used by allthe other classes, such as event reporting, activity and error logging andvarious I/O and file handling functions.

5.2.1.3 ACS_FCH_Error

This class implements FCH error message handling.

Page 51: AP Function Change

51(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

5.2.1.4 ACS_FCH_Exception

This is an exception class for FCH.

5.2.1.5 ACS_FCH_LBBFiles

This class implements replacing of LBB files, i.e. arbitrary files in thesystem. It has methods for backing up replacing a file, fall back andcommit.

5.2.1.6 ACS_FCH_Package

This class implements handling of CXC packages. It has methods forinstalling and removing CXC packages, version checking, listing of CXCpackages, transactional install and remove logs for packages, fall back andcommit.

5.2.1.7 ACS_FCH_Parameter

This class implements editing of CXC parameter files. It has methods forbackup and edit of parameter files, syntax check, updating the PHA param-eter database, fall back and commit.

5.2.1.8 lbbfile

This class is used to represent a LBB file.

5.2.1.9 ACS_FCH_Exception

Handles error messages in combination with exceptions.

5.2.1.10 rinUpdate

This class is used to represent a resource instance and it’s relations.

5.2.1.11 parfile

Represents a parameter file and it’s replacements.

5.2.1.12 ACS_FCH_Time

Represents boot time measurement and time measurement tied to a chosenregistry key.

6 PROCESSES

FCH does not implement any supervised processes, but require that allsupervised process are online. FCH also supervises the PRC ClusterControl process during switch over, to make sure it is properly stopped andstarted.

Page 52: AP Function Change

52(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

7 PERSISTENT STORAGE

FCH uses the following persistent storages:

a The FCH activity log acs_fch_activity.log which is located in<ACS_LOGS>\FCH on the data disk. A node local log file isalso kept in C:\ACS\logs\FCH with the same name. This ismainly used by the ACS_FCH_Server component to logstartup and shutdown of the service, but is also used as FCHactivity log in case the data disk cannot be accessed.

b FCH stores CXC packages to be installed during the FCHsession at C:\ACS\data\FCH\new.

c FCH stores installed and committed CXC packages atC:\ACS\data\FCH\current.

d FCH stores backups of ACS file and binaries inC:\ACS\data\FCH\bin and C:\ACS\data\FCH\fchbin.

e FCH uses the registry keyHKEY_LOCAL_MACHINE\Cluster\FCHIP to store thecurrent FCH state.

f FCH uses the registry keyHKEY_LOCAL_MACHINE\Cluster\LBB to store a booleanvalue to indicate if LBB software upgrade is in progress.

g FCH uses the registry keyHKEY_LOCAL_MACHINE\Cluster\ASKBOOT to store aboolean value to indicate if the operator is to be able to selectif he wants to boot at the end of fchstart, fchfb and fchend.This is used for testing purposes only and should never beused on site.

h FCH uses the registry structure underHKEY_LOCAL_MACHINE\Software\Ericsson\AdjunctProcessor\ACS\FCH to store installation and removaltransaction logs.

i FCH uses the registry keyHKEY_LOCAL_MACHINE\Cluster\ORIGNODE toestablish on which node the FCH session started.

j HKEY_LOCAL_MACHINE\Cluster\BOOTCOUNT_NODENisusedto count number of reboots on that node during one state.

k HKEY_LOCAL_MACHINE\Cluster\BEGSW is usedmeasure the time from a switchover started to it’s finished.

l HKEY_LOCAL_MACHINE\Cluster\OLDBOOTSTATE_NODENisused to verfy that the reboot occurred in the same state.

m HKEY_LOCAL_MACHINE\Cluster\BOOTTIME_NODENisusedtoestablish when a boot occurred and how long time that haspassed since it.

n HKEY_LOCAL_MACHINE\Cluster\OLDBOOTTIME_NODENiscompared with the previous value to verify that a new boot hasoccurred or not.

o lopt is used to save the argument to -l (-i) and use it again onthe old other node to do the same LBB files update there.

Page 53: AP Function Change

53(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

8 ERROR HANDLING

All errors are logged in the FCH activity log file. Depending on the situa-tion FCH tries to either perform the action again to attempt to bypass inter-mittent errors, ignore the error and continue the FCH session if the error isnot serious, or if the error cannot be handled or bypassed printing an errormessage on the console and exit.

9 FUNCTION CHANGE

NA

10 START, STOP AND RESTART

FCH has no supervised processes.

11 CONFIGURATION

FCH has one configuration file. <AP_HOME>\ACS\etc\FCH_service_def.It contains the name of the FCH service which is added to the LCTBIN fileSetupService.def. This allows the FCH service to be configured by theSetupServices command.

12 CAPACITY

12.1 DATA FOR CAPACITY ESTIMATION

FCH will have small impact on system capacity.

12.2 CAPACITY ESTIMATION

NA

13 SPECIAL FEATURES

NA

14 FCH, THE STATE MACHINE

14.1 PAUSING THE NODES AND GROUP OWNERSHIP

In the Microsoft Cluster, there is a important concept called cluster nodepausing. It is normally used for maintenance. If a node is paused, clusterresource groups cannot be moved to that node. There is an exceptionhowever: If a group can belong to several nodes and if a node goes down,and the only node available to groups is a node that is paused, the groups

Page 54: AP Function Change

54(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

will be moved to that node anyway with the groups offline. I.e. a group cantemporarily belong to a paused node but is offline. If a group can onlybelong to one node and that node is down, it will have no owner.

FCH is using the pausing of a node to prevent any “spontaneus” moves ofgroups. Why? We shall see below.

14.2 THE CLUSTER DATABASE

The cluster database is really 2 registry databases which are equal or madeequal. On the data disk exists a change log.

Now, let’s say a FCH session has upgraded one of the nodes in the Clusterincluding the cluster database. For example, a cluster resource has beenadded. This resource belongs to the current node, let’s say. One could betempted to beleive that a restore of the upgraded node would revert the FCHsession to it’s previous state. This is, however not the case. The addition ofthe cluster resource affects the database on both nodes. If one node isrestored, one of the “identical” databases will be different from the other.In this case, the database with the latest timestamp will “win” and the oper-ator will have the old original node with an upgraded database! The conse-qvence of this is that the cluster database will need special handling to bereverted back to it’s original state. When the cluster database has beenchanged, FCH uses the PRC command prcconf to update or revert the data-base. It should be obvious now that an failover with upgrade of databaseneeds the cluster resources on the executing node to be offline.

Any failover should therefore be made offline to provide change of data-base using prcconf.

FCH uses states to keep track of what has been done, so it can properly fall-back the system should a failure occur.

The use of states is extremely important when keeping track of cluster data-base changes.

14.3 NODE AVAILABILITY.

A very important use of states is when an AP node becomes unavailabledue to node failure. For example if the upgraded node crashes and FCH isin the middle of an upgrade of the other node. The FCH should then providethat the remaining node becomes available as quickly as possible. It usesFCH states to acheive this.

14.4 FCH STATES.

Among other things, FCH has always been a state machine. In this APG40NT version, with a 2-node cluster, the states are more of a transaction logwhere each state represents a set of actions and direction. In the APG30

Page 55: AP Function Change

55(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

double-partition Unix version, any state could be changed to the state"Failed" where FCH returned to the original partition, ceased the alarmsetc. Thus it was more of a real state machine.

In all the states from Installing to CommitDone it is possible to use fchfb tofallback or fchrst to restore the upgraded system.

In each of the states, before CommitDone, if reboot or other interrupt orfailure occurs, automatic activation of original non-upgraded node willoccur. If non-LBB upgrade, automatic fallback of upgraded system willtake place. If something goes wrong during fallback, or if LBB upgrade hastaken place, FCH restore (fchrst) execution is necessary.

After state CommitDone, only local fallback (INGO1 addition) can takeplace of non-upgraded node (during upgrade attempt). Also, if newlyupgraded node fails, old non-upgraded node can be activated.

noFCH Committing

noFCH Committing

Move1

Move2

FailureExample, failure in Move1 state

Page 56: AP Function Change

56(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

14.4.1 Successful FCH session until supervision (normal flow of events)

To inititate the FCH session, the operator executes fchstart. The commandwill upgrade the system. Then, fchstart sets Reboot state, reboots andACS_FCH_Server takes care of the remaining steps until Supervision.

Assume that the node A is beeing upgraded. fchstart will reboot the systemand ACS_FCH_Server will switchover and make the node A active afterreboot. The newly upgraded system is now active and beeing supervised bythe operator and ACS_PRC_ClusterControl..

Table 14.1

1. noFCH No FCH is going on. If initiation of FCH is desired, fchstartcommand is started from passive cluster node.

2. Installing Passive cluster node is paused, (the concept ofpaused and resumed node is used to decidewhether a failover can be done to the node or not),FCH in progress alarm is raised, the clusterresource group(s) owned by current passive nodeare shut down to ensure a problem-free installa-tion, a system edit/update is done (see separatechapter) on passive cluster node. Finally, the oper-ator is prompted for reboot to activate the new sys-tem

2B LBBRe-boot1

This special state tells FCH fchstart that LBB isbeeing upgraded (fchstart -L) and that several con-sequtive reboots can occur. The state is changed toInstalling when operator types “l” (leave) in theLBB window.

noFCH Committing

noFCH Committing

Move1

Move2

FailureExample, Failure in Move1 state, LBB upgrade

FbFailover2

Restore

LBBReboot

Page 57: AP Function Change

57(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

3. Reboot The state is set by fchstart before reboot of systemto activate the new software. Events are sent. Afterreboot “reboot success” events are sent and state ischanged to Failover1.

4. Failover1 All cluster groups except of “Cluster Group” arebrought down. Both cluster nodes are resumed toenable failover. (So-called MoveClusterGroup).The cluster is now ready for an offline failover.Online failover is not suitable for FCH since thecluster database might be changed.

5. Move1 Move (failover) the cluster groups. Only the clus-ter groups that has more than one node owner canbe failed over. If the owner for a cluster group wasnon-upgraded node A, the new owner will beupgraded node B. The current updated node is nowactive with it’s system upgraded.

6. Config1 If old and new PRC_Config files are different, thatis, if cluster configuration has changed, deletecluster database based on current old config file.(Current configuration).

7. Config1B Create new cluster database based on new updatedconfig file. (New cluster resource configuraion).

8. Failover2 This state indicates that failover and possible con-figuration change is done. Pause other node andresume the current one.

9. Supervi-sion

The current upgraded node is started. This node ispaused and other node is resumed. Other node isstarted. The upgraded node is started first which issomewhat more complex than the other wayaround. The reason of why the more complex solu-tion is used is improved “ISP”, in service perfor-mance.Both nodes are now started and the operatorshould now observe the system in at least 2 hours.He can choose to fallback using fchfb, commit theAP using fchcommit or in the worst case, restoreusing the fchrst command.

10. Commit-ting

The operator should normally repeat the fchcom-mit command should a failure happen. If no otheralternative, a fallback can be tried.

Table 14.1

Page 58: AP Function Change

58(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

14.4.2 Exceptional flow of events up to CommitDone (fallback or restore)

In all states, up to, (but not included) CommitDone, the FCH session mightbe reverted to the situation that existed prior to Function Change. This isdue to a failure, for example unexpected reboot, or operator interventionusing for example fchfb, Function Change fallback. FCH can be interruptedin any state and a fallback will start. An alternative to FCH fallback is asingle node restore which is implemented by fchrst, Function Changerestore. fchrst is always needed when a LBB upgrade is needed.

11. FbFail-over1

Check if cluster database needs to be reverted.Cease FCH in progress alarm and raise FCH failedalarm. Stop all groups except for “cluster group”.This state is entered from Supervision state.

12. Config3 If cluster database has changed, delete the parts ofthe cluster database based on the newPRC_ConfigFile. Only the resources belonging tothe non-upgraded node gets deleted.

13. Move2 Failover (Failback) to original non-upgraded node.

14. Config4 If cluster database has changed, activate old data-base (used at initialisation of the FCH session)using the old PRC config file. The resources withthe currently upgraded node as owner areunchanged.

15. FbFail-over2

Pause current upgraded node. If LBB wasupgraded, exit fchfb (or return ACS_FCH_Serverthread) command now to let operator restore nodeusing fchrst. If LBB was not upgraded, allupgrades (or downgrades) are falled back to it’soriginal status. The fallback occurs automatically(initiated by ACS_PRC_ClusterControl) or byoperator intervention using fchfb command.

16. Restore If the user chose to restore system due to LBBupgrade or other, the Restore state is set before theactual restore. fchend then does a possible updateof the cluster database and finishes the FCH ses-sion by executing the actions of the End state.

17. Config5. Delete the resources in cluster database belongingto upgraded node using new PRC_Config file. Cre-ate new cluster database for upgraded node usingold PRC_Config file.

18. FbFail-over3.

Means that fallback (of node and cluster database)is done. Send events and do some cleanup.

Page 59: AP Function Change

59(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

14.4.3 Misc exceptionalflow of events,fallbackfrom Installingor Failover1state

19. FbReboot. Set this state when FbFailover3 is ready. This stateis used when a calling function wants to give areboot order during fallback. Set FbReboot2 state.

20. FbReboot2 Set this state before reboot. Create reboot file tocommunicate to PRC that reboot should not becounted. Do the actual reboot. After reboot, sendsome events, and cleanup.

21. End. Pause other node, resume the fallbacked node,start up the services, pause this node and resumeother node again. Run fchend and ensure that fall-backed node is executing again, cease alarm andcleanup.

4. Failover1 If this state (see above) gets interrupted, pauseupgraded node, start other node. Set Installingstate.

2. Installing If fallback occurs, and Failover1 was the previousstate, this state is set. All upgrades are falled backand state is set to Reboot2.

37. Reboot2 If state was Reboot2 and this was the upgradednode and other node was down, set state Start-OrigNode.

38 Start-OrigNode

This special state ensures that the upgraded nodegets started, online and active. Otherwise, the oldother node would be online and active.

39. End-OrigNode

This special state shows that the original formerlyupgraded node now is online and active. Normally,after a fallback, the other node would be onlineand active.fchend can be run and alarms are ceased, state isset to noFCH and cleanup is performed.

Page 60: AP Function Change

60(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

14.4.4 Normal flow of events until noFCH is set and FCH session is successful.

If fchcommit was successful and state CommitDone was set, the FCHsession has to be ended by installing the passive current node to make itequal the active node.

22. Commit-Done

This state is set after state Committing when a suc-cessful fchcommit has been executed. The newlyupgraded node is active and has been approved bythe operator.

23. LBBRe-boot2

This special state is set if the user has upgraded theLBB during the FCH session. The operator can doseveral consequtive reboots to install drivers etc. Itis the operator’s responsibility that he follows theprocedures exactly as was done on the originalnode.

24. EndInstall-ing

This corresponds to the Installing state but on theother node. FCH will automatically update thesystem exactly as was done on the originallyupdated node with the exception of LBB upgrades.

25. Config2 If cluster database was updated, the correspondingchanges for this node will be doneonline. In thisstate, the old configuration will be deleted.

26. Config2B If cluster database was updated, the correspondingchanges for this node will be doneonline. In thisstate, the new configuration will be added.

27. EndInstall-Done

Installation of node is ready. Send events.

28. EndReboot The system will be rebooted to activate the newsoftware. Events are sent.

29. EndRebootDone

The system has successfully rebooted. Events aresent and alarms ceased. State noFCH is set andcleanup is performed.

Page 61: AP Function Change

61(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

14.4.5 Exceptional flow of events. Active node not available.

If the active, newly upgraded node becomes unavailable, the old node mustbecome the active one. A failover cannot be done right away, since thecluster database has to be reverted, if neccessary. The states are bidirec-tional..

Table 14.2

22. Commit-Done

This state is set after state Committing when a suc-cessful fchcommit has been executed. The newlyupgraded node is active and has been approved bythe operator.

30. InitWrong-Node

Failure of active upgraded node has occurred andcheck is done to see if cluster database change isneccessary. Resume node to be able to move clus-ter groups without current owner. Stop all groupsexcept for cluster group. Start cluster group ifoffline. Resume both nodes. Failover to currentnon-upgraded node

31. Config6 If cluster database has changed, delete the servicesbelonging to current node using new PRC_Configfile.

32. Config6B If cluster database has changed, add the resourcesbelonging to current node using old PRC_Configfile.

33. InitWrong-NodeDone

The switch to old non-upgraded node has beendone. Resume this node, pause other node, ensurethat that Cluster group is online and start currentnode.

34. End-Wrong-Node

The old node is up and running.

Page 62: AP Function Change

62(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

14.4.6 Exceptional flow of events, local fallback or restore of old node

If the non-upgraded node fails during upgrade, local fallback of this nodeis neccessary. In the worst case, if the newly upgraded node fails duringupgrade of the inactive node, both a local fallback and a switchover to thisnode might be neccessary.

If restore of old node is required, due to, for example, failed LBB upgrade,fchrst is run by operator and Restore2 state is set.

24. EndInstall-ing

When part of the fallback procedure, fallback ofpackages, parameters etc. will take place.

35. FbEn-dReboot

This state is set if after EndInstalling state. FCHwill try to reboot to activate the old software.

36. FbEn-dReboot-Done

Reboot has occurred and old software is activated.

22. Commit-Done

This state is set after state FbEndRebootDone. Thenewly upgraded node is active and has beenapproved by the operator. The old node is not yetupgraded.

CommitDone EndWrongNode

CommitDone

Example, old node active, then, new node active again.

EndWrongNode

Upgraded node down Old node active

Upgraded node active again Upgraded node up

Page 63: AP Function Change

63(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

15 REFERENCES

[1] 2/1551-ANZ 222 01 Uen Adjunct Computer Subsystem,Terms and Abbreviations

[2] 5/1056-ANZ 222 03 Uen Adjunct Computer Subsystem(ACS) - System Version Control in the AP.

24. LBBReboot2

If this stateor other state belonging to old node isinterrupted by failure, the operator runs fchrst andRestore2 state is set.

40. Restore2 When this state is set, restore should be done onold node. When Restore is done, fchcommit mustbe run again to commit the AP system.

Page 64: AP Function Change

64(64)Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other)

Kontr - Checked Datum - Date RevDokansv/Godk - Doc respons/Approved File

Nr - No.

DESIGN SPECIFICATION

102 62-CNZ 222 59 Uen

B2000-11-15

4/00

2 01

-CA

L 11

9 04

01 U

en A

16 ANNEXES

16.1 ANNEX REVISION HISTORY

Rev Date Prepared Description

PA1 1999-12-09 UABRUDO First revision.

PA2 2000-05-26 UABRUDO Updated for CM12 delivery.

A 2000-05-29 UABRUDO Firm revision.

PB1 2000-11-13 QABKULD INGO1 updates.