hp openview operations for unix: advanced tools and techniques · monitoring infrastructure hp ovo...

40
© 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice HP OpenView Operations for UNIX: Advanced tools and techniques IT Operations Operations Management HP Software Universe June 18-22, 2007 | The Venetian | Las Vegas, Nevada

Upload: doananh

Post on 21-Dec-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

© 2007 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

HP OpenView Operations for UNIX: Advanced tools and techniques

IT Operations

Operations Management

HP Software

Universe

June 18-22, 2007 | The Venetian | Las Vegas, Nevada

Memorial SloanMemorial Sloan--Kettering Kettering Cancer CenterCancer Center

OVO for UNIXOVO for UNIXAdvanced Tools & TechniquesAdvanced Tools & Techniques

Gregg RomanGregg RomanHans ChouHans Chou

3 MAIN AREAS3 MAIN AREAS

OVO for UNIX OVO for UNIX -- Automation ToolsAutomation Tools

Response Center 101 Response Center 101

HP OpenView EnhancementsHP OpenView Enhancements

GOALGOAL

ONE GOOD IDEAONE GOOD IDEA

THEMESTHEMES

CENSORED

CENSORED

CENSORED

CENSORED

CENSORED

Policies and Procedures are just a list of Mistakes we’ve made.

Monitoring InfrastructureMonitoring Infrastructure

HP OVO 8.24 & NNM 7.51HP OVO 8.24 & NNM 7.51HP 3440HP 3440’’s running HPs running HP--UX 11.11UX 11.11MC/Service GuardMC/Service GuardBMC Patrol AgentBMC Patrol AgentTelalertTelalert

Log File

•Local OVO Agent Problem

Repsonse Center 101Repsonse Center 101Kill OVO AgentKill OVO Agent

ovc ovc ––killkillRemove Agent Queue FilesRemove Agent Queue Files

rm /var/opt/OV/tmp/OpC/* rm /var/opt/OV/tmp/OpC/* rm /var/opt/OV/tmp/public/OpC/*rm /var/opt/OV/tmp/public/OpC/*

Start OVO AgentStart OVO Agentovc ovc ––startstart

Stop OVO Server ProcessesStop OVO Server Processesovstopovstop

Remove Server Queue FilesRemove Server Queue Filesrm rm ––r /var/opt/OV/share/tmp/mgmt_sv/* r /var/opt/OV/share/tmp/mgmt_sv/*

Start OVO Server ProcessesStart OVO Server Processesovstartovstart

Checks and fixes database inconsistencies Checks and fixes database inconsistencies in the OVO databasein the OVO databaseRun Run ‘‘opcdbidx opcdbidx ––allall’’ without stopping without stopping server.server.Look in OVO error logfile for errors, then Look in OVO error logfile for errors, then run with run with ––dupl_fix optiondupl_fix optionWe now run this daily as a cron jobWe now run this daily as a cron job

Detect and correct inconsistencies Detect and correct inconsistencies between the NNM IP Topology and between the NNM IP Topology and Object databases Object databases ““CleanClean--up and removeup and remove””Command can be run several timesCommand can be run several times

Checks consistencies between the NNM Checks consistencies between the NNM Map and Object databasesMap and Object databasesChecks for invalid symbols and submapsChecks for invalid symbols and submapsRemoves and updates objects & symbolsRemoves and updates objects & symbols

Remove all entries from the SNMP Remove all entries from the SNMP Configuration cacheConfiguration cache

Updates object status in the topology Updates object status in the topology databasedatabase

Process field registration filesProcess field registration files

Recommendations:Recommendations:

•• Create maintenance scriptCreate maintenance script•• Run this during scheduled outages Run this during scheduled outages (monthly, quarterly)(monthly, quarterly)

Tip: Google next OpenView ErrorTip: Google next OpenView Error

Is everything working?

Automatic Notification OverrideAutomatic Notification Override

Override the autoOverride the auto--acknowledgeacknowledgeMessage appears in Message Message appears in Message BrowserBrowserAutoAuto--Action runs a script Action runs a script ‘‘email.shemail.sh’’or or ‘‘page.shpage.sh’’Is it working? Buttons to press to Is it working? Buttons to press to send an email or page to anyone.send an email or page to anyone.

5/4/2007

PROBLEM:opc_adm

makes changesnot reflected

on Operator maps

5/19/2007

SOLUTION:Pop-up reminderevery 12-hours

6/6/2007

IMPROVEMENT:Notify operatorif GUI open for

more than 12-hours

6/24/2007

PROBLEM:Operators donot exit GUIgracefully

7/30/2007

SOLUTION:Automated

inconsistencychecking

8/18/2007

IMPROVEMENT:Opcsessions

shows sessions &inconsistencies,

removes table entries

7/12/2007

PROBLEM:DBA interventions

necessary toremove Oracleuser table entry

OpenViewOpenView Administrator makes changes Administrator makes changes over the course of the dayover the course of the dayChanges will not reflect on Operator Changes will not reflect on Operator maps until maps are restarted (e.g. maps until maps are restarted (e.g. Message Group, Node Bank, Tools) Message Group, Node Bank, Tools)

5/4/2007

PROBLEM:opc_adm

makes changesnot reflected

on Operator maps

5/19/2007

SOLUTION:Pop-up reminderevery 12-hours

6/6/2007

IMPROVEMENT:Notify operatorif GUI open for

more than 12-hours

6/24/2007

PROBLEM:Operators donot exit GUIgracefully

7/30/2007

SOLUTION:Automated

inconsistencychecking

8/18/2007

IMPROVEMENT:Opcsessions

shows sessions &inconsistencies,

removes table entries

7/12/2007

PROBLEM:DBA interventions

necessary toremove Oracleuser table entry

A A ‘‘opcwallopcwall’’ message is sent to all message is sent to all Operators maps every 12 hoursOperators maps every 12 hours

5/4/2007

PROBLEM:opc_adm

makes changesnot reflected

on Operator maps

5/19/2007

SOLUTION:Pop-up reminderevery 12-hours

6/6/2007

IMPROVEMENT:Notify operatorif GUI open for

more than 12-hours

6/24/2007

PROBLEM:Operators donot exit GUIgracefully

7/30/2007

SOLUTION:Automated

inconsistencychecking

8/18/2007

IMPROVEMENT:Opcsessions

shows sessions &inconsistencies,

removes table entries

7/12/2007

PROBLEM:DBA interventions

necessary toremove Oracleuser table entry

A script checks the duration of time A script checks the duration of time each Operator map is openeach Operator map is openA A ‘‘opcwallopcwall’’ message is sent to the message is sent to the offending Operatoroffending Operator’’s GUI if the map has s GUI if the map has been up for more than 12 hoursbeen up for more than 12 hours

5/4/2007

PROBLEM:opc_adm

makes changesnot reflected

on Operator maps

5/19/2007

SOLUTION:Pop-up reminderevery 12-hours

6/6/2007

IMPROVEMENT:Notify operatorif GUI open for

more than 12-hours

6/24/2007

PROBLEM:Operators donot exit GUIgracefully

7/30/2007

SOLUTION:Automated

inconsistencychecking

8/18/2007

IMPROVEMENT:Opcsessions

shows sessions &inconsistencies,

removes table entries

7/12/2007

PROBLEM:DBA interventions

necessary toremove Oracleuser table entry

Operators do not exit the GUI gracefully, Operators do not exit the GUI gracefully, reboot PC, exit Reflections/Exceed.reboot PC, exit Reflections/Exceed.Oracle Table entry not clearedOracle Table entry not clearedCanCan’’t login using same idt login using same id

5/4/2007

PROBLEM:opc_adm

makes changesnot reflected

on Operator maps

5/19/2007

SOLUTION:Pop-up reminderevery 12-hours

6/6/2007

IMPROVEMENT:Notify operatorif GUI open for

more than 12-hours

6/24/2007

PROBLEM:Operators donot exit GUIgracefully

7/30/2007

SOLUTION:Automated

inconsistencychecking

8/18/2007

IMPROVEMENT:Opcsessions

shows sessions &inconsistencies,

removes table entries

7/12/2007

PROBLEM:DBA interventions

necessary toremove Oracleuser table entry

DBA needs to be contacted to remove DBA needs to be contacted to remove Oracle table entryOracle table entry

5/4/2007

PROBLEM:opc_adm

makes changesnot reflected

on Operator maps

5/19/2007

SOLUTION:Pop-up reminderevery 12-hours

6/6/2007

IMPROVEMENT:Notify operatorif GUI open for

more than 12-hours

6/24/2007

PROBLEM:Operators donot exit GUIgracefully

7/30/2007

SOLUTION:Automated

inconsistencychecking

8/18/2007

IMPROVEMENT:Opcsessions

shows sessions &inconsistencies,

removes table entries

7/12/2007

PROBLEM:DBA interventions

necessary toremove Oracleuser table entry

Automated comparison between the GUI Automated comparison between the GUI process IDs and Oracle User Table process IDs and Oracle User Table entriesentries

5/4/2007

PROBLEM:opc_adm

makes changesnot reflected

on Operator maps

5/19/2007

SOLUTION:Pop-up reminderevery 12-hours

6/6/2007

IMPROVEMENT:Notify operatorif GUI open for

more than 12-hours

6/24/2007

PROBLEM:Operators donot exit GUIgracefully

7/30/2007

SOLUTION:Automated

inconsistencychecking

8/18/2007

IMPROVEMENT:Opcsessions

shows sessions &inconsistencies,

removes table entries

7/12/2007

PROBLEM:DBA interventions

necessary toremove Oracleuser table entry

OpcsessionsOpcsessions shows:shows:•• Motif GUI sessionsMotif GUI sessions•• Java GUI sessionsJava GUI sessions•• Template GUIs openTemplate GUIs open•• Oracle User Table entriesOracle User Table entries

The The ‘‘--kk’’ option removes inconsistencies option removes inconsistencies in the Oracle User Tablein the Oracle User Table

Output of Output of ‘‘opcsessionsopcsessions’’

Remove all Remove all ““If If –– ThenThen”” InstructionsInstructions

AutoAuto--action runs a script action runs a script ‘‘workhour.shworkhour.sh’’ or or ‘‘offhour.shoffhour.sh’’Scripts determine the day of week & Scripts determine the day of week & time of day, and exits accordingly.time of day, and exits accordingly.Instruction Interface has similar logic Instruction Interface has similar logic and displays the appropriate and displays the appropriate Instructions to follow.Instructions to follow.

Operations Daily Procedures into Operations Daily Procedures into OpenViewOpenView

Convert all timeConvert all time--based manual based manual procedures into OVO Schedules.procedures into OVO Schedules.Use Use YellowYellow (Minor) to represent a (Minor) to represent a PostPost--It.It.

More Intellectual Property More Intellectual Property $$$$$$

SYNCCONFSYNCCONF –– (MC/SG) Daily copy of all (MC/SG) Daily copy of all configconfigfiles to secondary node. (/etc/hosts, Tools, files to secondary node. (/etc/hosts, Tools, TelalertTelalert))ARCHCONFARCHCONF -- Dedicated Dedicated filesystemfilesystem stores all stores all OpenViewOpenView configuration. Any file ever restored. configuration. Any file ever restored. Retention = 30 days.Retention = 30 days.OVO <OVO <--> /etc/hosts> /etc/hostsOVO OVO --> NNM> NNM (Not in NNM delete > 2 wks, No (Not in NNM delete > 2 wks, No interface under OVO icon)interface under OVO icon)OVO <OVO <--> OVO> OVO (2 environments)(2 environments)

OvintegratorOvintegrator

/etc/hosts file logic /etc/hosts file logic No need to keep separate No need to keep separate ““Master Master Device ListDevice List””Department & Notification methodDepartment & Notification method

OpenViewOpenView Add/Delete ProgramAdd/Delete ProgramReplaces a manual Replaces a manual ““toto--dodo”” listlistIntegrates with Integrates with ovintegratorovintegrator (places it in correct section)(places it in correct section)Eliminates pilot errorEliminates pilot errorSTEP 1: Confirm STEP 1: Confirm nodenamenodename in NNM by "in NNM by "snmpwalkingsnmpwalking" the " the devicedeviceSTEP 2: Creates SNMP configurationSTEP 2: Creates SNMP configurationSTEP 3: Adds the STEP 3: Adds the nodenamenodename to /etc/hoststo /etc/hostsSTEP 4: Update /etc/opt/OV/share/conf/STEP 4: Update /etc/opt/OV/share/conf/netmon.noDiscovernetmon.noDiscoverfilefileSTEP 5: Add the node to Node Bank and Node Layout STEP 5: Add the node to Node Bank and Node Layout GroupGroupSTEP 6: Push templatesSTEP 6: Push templates

NOTE: Delete Program include downloading of Alarm History. NOTE: Delete Program include downloading of Alarm History. Clearing of Cache and deleting from NNM. Clearing of Cache and deleting from NNM.

How do we hide planned outages?How do we hide planned outages?

Trouble TicketsTrouble TicketsReports not usefulReports not usefulEliminate Operations manual tasksEliminate Operations manual tasks

Blackout ProgramBlackout Program

Blackout Script Blackout Script –– Menu drivenMenu drivenAbility to process spreadsheetAbility to process spreadsheetFully integrated with Change ControlFully integrated with Change ControlEliminates Manual proceduresEliminates Manual procedures

Planning for the Future…

•OVO to OVO Sync (2 separate environments)

•Blackberry initiated “OpenView Health status”

406 August 2007

HP Software Universe

June 18-22, 2007 | The Venetian | Las Vegas, Nevada