advanced server monitoring and alert notifications

82
© 2013 Wellesley Information Services. All rights reserved. Advanced Server Monitoring and Alert Notifications Andy Pedisich Technotics

Upload: rhoda

Post on 25-Feb-2016

84 views

Category:

Documents


4 download

DESCRIPTION

Advanced Server Monitoring and Alert Notifications. Andy Pedisich Technotics. Your Presenter. One half of a pair of two hard-working IBM ® Notes ® Administrators/Developers who have worked with IBM ® Notes ® and IBM Domino ® since version 2.1 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advanced Server Monitoring and Alert Notifications

© 2013 Wellesley Information Services. All rights reserved.

Advanced Server Monitoring and Alert Notifications

Andy Pedisich Technotics

Page 2: Advanced Server Monitoring and Alert Notifications

Your Presenter

• One half of a pair of two hard-working IBM® Notes® Administrators/Developers who have worked with IBM® Notes® and IBM Domino® since version 2.1 From Technotics, Inc. in Philadelphia, Pennsylvania – USA

• Andy Pedisich 28 years in IT 19 years with Lotus Notes

• Rob Axelrod 23 years in IT 19 years with Lotus Notes

2

Page 3: Advanced Server Monitoring and Alert Notifications

What We’ll Cover …

• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in

the log• Using LotusScript to access server statistics• Wrap-up

3

Page 4: Advanced Server Monitoring and Alert Notifications

What We’ll Cover …

• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in

the log• Using LotusScript to access server statistics• Wrap-up

4

Page 5: Advanced Server Monitoring and Alert Notifications

Requirements for Efficient and Accurate Statistics Collection

• Two things are required for statistics collection: The Collect task must be running on any server that is

designated to collect the statistics And Not all servers should run the Collect task Only servers designated as collecting servers

The EVENTS4 Monitoring Configuration database must have at least one Statistics Collection document Minimum collection time should be an hour

5

Page 6: Advanced Server Monitoring and Alert Notifications

There Is a Special Replica ID for Your EVENTS4.NSF

• The replica ID of system databases, such as EVENTS4, is derived from the replica ID of the Domino directoryDatabase Replica IDNAMES.NSF 852564AC:004EBCCFCATALOG.NSF 852564AC:014EBCCFEVENTS4.NSF 852564AC:024EBCCFADMIN4.NSF 852564AC:034EBCCF

• Notice that the first two numbers after the colon for the EVENTS4.NSF replica are 02 Make sure EVENTS4.NSF is the same replica ID Opening a copy from every server and putting it on your

desktop There’s some code on the next slide to help you do that

6

Page 7: Advanced Server Monitoring and Alert Notifications

Add a Button to Your Toolbar

• Add this code to a button on your toolbar This is courtesy of Thomas Bahn He’s a smart guy, nice guy, and sometimes brings chocolates to

his friends from Europe www.assono.de/blog

_names := @Subset(@MailDbName; 1) : "names.nsf";

_servers := @PickList([Custom]; _names; "Servers"; "Select servers"; "Select servers to add database from"; 3);

_db := @Prompt([OkCancelEdit]; "Enter database"; "Enter the file name and path of the database to add."; "log.nsf");

@For(   n := 1;   n <= @Elements(_servers);   n := n + 1;   @Command([AddDatabase]; _servers[n] : _db) )

7

Page 8: Advanced Server Monitoring and Alert Notifications

Add a Database Icon from All Servers to the Desktop

• This code will prompt you to pick the servers that have the database you want on your desktop Then it will prompt for the name of the database

And open it on all the servers you’ve selected• Use it to make sure all the EVENTS4.NSF are the same replica in

your domain

8

Page 9: Advanced Server Monitoring and Alert Notifications

What We’ll Cover …

• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in

the log• Using LotusScript to access server statistics• Wrap-up

9

Page 10: Advanced Server Monitoring and Alert Notifications

Event Monitoring Details

• Enough setting up already!• Event monitors of all types are set in the

EVENTS4 database• Two broad categories of events:

Event handlers Specify the action that Domino takes

when a specific event occurs Event generators

Each type of event generator has a view that provides a list of all event generators, plus additional configuration information

10

Page 11: Advanced Server Monitoring and Alert Notifications

Event Generators

• Event generators deal with specific Notes/Domino issues• There are six types of event generators:

Database Event Generator Domino Server Response Event Generator Mail-Routing Event Generator Statistic Event Generator Task Status Event Generator TCP Server Event Generator

Some are used more than others• We’ll stick to the more popular ones that every administrator

should use, for starters

11

Page 12: Advanced Server Monitoring and Alert Notifications

Here’s One That Everyone Should Use

• The ACL of Names.nsf should rarely change, so monitor it! Alarms should go off if it changes

• Select Names.nsf Choose either a single server or all

servers in the domain• I like to pick all servers in the domain

Admins won’t get away with anything! But I do get a storm of messages when

an ACL change occurs Every server tells me about

the change

12

Page 13: Advanced Server Monitoring and Alert Notifications

Unused Space Event Generator

• This is an example of the Events system actually doing something automatically when a certain condition exists It’s questionable – it is going to execute the Compact task

immediately upon detection of free space threshold being exceeded I could see this event being used on archive servers And I wish there was a way to run it during specific hours

13

Page 14: Advanced Server Monitoring and Alert Notifications

Domino Server Response Generator• One server checks others by sending a probe

It’s a good idea to try opening Names.nsf If you can’t open Names.nsf, then something is wrong!

• Default is every three minutes• Default response time tolerance is 1,000 Msecs (one second)

Your settings will depend on your own environment

14

Page 15: Advanced Server Monitoring and Alert Notifications

More About Probes

• The response time is a bit on the harsh side If you leave it at 1,000 Msecs (one second), you will receive a lot

of notifications You should make it ten seconds, or whatever the metrics in

your Service Level Agreement (SLA) require• Also, be careful what servers you choose to probe other servers

Try to pick probing servers that are in the same LAN as the probed servers Otherwise, your probing will actually be testing network

latency, rather than the servers themselves I have used these probes as a method of testing exactly that

Network latency

15

Page 16: Advanced Server Monitoring and Alert Notifications

Statistic Event Generators

• Statistic Event Generators monitor a specific Domino or platform statistic They can let you know when a stat goes over a particular

threshold These stat event generators are extremely valuable

Smart administrators use them every day!

16

Page 17: Advanced Server Monitoring and Alert Notifications

Complete Listing of All Statistics Is in EVENTS4.NSF

• The Monitoring Configuration (EVENTS4.NSF) supplies document detailing thresholds for each statistic 1,193 statistic documents available

The complete listing is in the view Statistics by Name• But only 166 of them are considered useful for setting thresholds

and are found in the default statistics view The default statistics thresholds view only shows documents

where the field “useful” is equal to the word “Yes”

17

Page 18: Advanced Server Monitoring and Alert Notifications

Finding the “Not Useful” Stats

• You might find that a statistic you need has been marked as not useful

• To see which are marked as not useful, full text index the EVENTS4.nsf

• Create an advanced query checking the field useful = “No” You might discover a statistic who’s threshold would be just

right for using

18

Page 19: Advanced Server Monitoring and Alert Notifications

Why Are Most Stats Considered “Not Useful” for Thresholds?• One setting on the advanced query that controls whether it will

appear in the drop-down list when you’re setting an event generator Note that there are no Agent statistics in this list

19

Page 20: Advanced Server Monitoring and Alert Notifications

Why No Agent Stats

• It’s not that the Agent stats aren’t useful They might not be valuable for threshold tracking

• In some releases, Agent.Hourly.UsedRunTime has a data type of text We can’t set a threshold with text values

20

Page 21: Advanced Server Monitoring and Alert Notifications

We Do Have a Nice Way of Seeing That Stat, Though

• Technotics has created a super-customized version of the Monitoring Results database, STATREP.NSF Technotics R8.5.3 statrep It’s the stock statrep with

added views• One of these valuable

views is Agent Stats view• You can download this from:

www.andypedisich.com Look for the Admin2013 link

21

Page 22: Advanced Server Monitoring and Alert Notifications

Show Me the Stats

• When you issue a SHOW STAT command at the console, Domino dumps every statistic it is tracking

• Every one of these statistics is in every single one of the documents in the STATREP.NSF database All you need is a view to see them

22

Page 23: Advanced Server Monitoring and Alert Notifications

Static Statistics Are Not Useful for Thresholds

• Statistics that don’t change usually represent the operating environment of the server Server.Version.Notes = Release 8.5.3 Server.Version.OS = Windows NT 5.0 Server.CPU.Type = Intel Pentium Disk.D.Size = 71,847,784,448 Mem.PhysicalRAM = 527,433,728 Platform.Network.1.AdapterName = Intel[R] PRO_1000 MT

Server Adapter• Think these stats aren’t helpful? They are!• You can take a pretty detailed worldwide server inventory

Just by looking at the fields in STATREP.NSF

23

Page 24: Advanced Server Monitoring and Alert Notifications

Wizard Lets You Choose the Method of Handling the Event

• There are lots of methods of event handing Which one you choose depends a lot on your infrastructure We’re going to talk more about the notification methods in the

next section of the presentation• For now, just remember that an event generator is fairly worthless

by itself Unless you have an effective event handler that tells you, in its

own way, what is going on with your servers

24

Page 25: Advanced Server Monitoring and Alert Notifications

Event Handlers Are an Exquisite Gift

• They can give you a heads-up about issues provided by event generators

• They also give you a free-form way of being alerted of anything that happens in the Domino server log and most of what happens on the Domino server console

• You can use event handlers to respond to generators and certain add-in tasks They are most valuable for picking out text on the console that

will mean trouble if ignored• We’re going to focus on this type of event handling, since it is

less intuitive than responding to generators or add-ins

25

Page 26: Advanced Server Monitoring and Alert Notifications

Basics of the Event Handler Configuration

• 3 screens to deal with• Decide whether you want to track

an event on just a few servers or all servers You might want to track a

particular event on mail servers only

• Decide what triggers a notification We’re going for free-form, so

we will select “any event that matches a criteria”

26

Page 27: Advanced Server Monitoring and Alert Notifications

Second Set of Choice for Event Handling

• When working with console events, select: “Events can be of any type” “Events can be of any severity”

• Then look for a particular string of text in the event message This can be absolutely any text

that appears on the console We will explain why we are

picking the text “full administrator access” in a moment

27

Page 28: Advanced Server Monitoring and Alert Notifications

Final Set-Up Tab for Event Handling

• Define action to occur when the text appears

• We’ve selected email notification But there are over a dozen

others that we will discuss in a few moments

• Note: You can control the time of day the event handler is on the job I wish they did that for event

generators

28

Page 29: Advanced Server Monitoring and Alert Notifications

Why Did We Monitor the Text Full Access Administrator?

• It is the highest level of administrative access to the server Manager access with all access privileges enabled to all

databases on the server, regardless of the ACL settings or readername settings

Access to any unencrypted data on the server• Your security model should make FAA almost unnecessary

When full FAA is turned on, you want to know about it to prevent some hooligan from doing shenanigans

29

Page 30: Advanced Server Monitoring and Alert Notifications

Other Words You Should Track with Event Handlers

• “Deleted by” This generally means someone has deleted a database Usually their mail file if they have manager access

You’ll be getting out the back-up tapes in a minute

01/05/2013 04:02:17 PM Opened live remote console session for Andrew M Pedisich/DomLab01/05/2013 04:04:50 PM Database ArchiveOfIncriminatingPhotos.nsf deleted by Andrew M Pedisich/DomLab

30

Page 31: Advanced Server Monitoring and Alert Notifications

Other Bad Words to Watch for Extremely Inefficient

• Here are some other words and expressions to watch for:

31

Expression Issue

An exception occurred while writing data into database

Bad news all around. You’re going have to get to the database and run some maintenance.

Replication cannot proceed Replication cannot proceed because it cannot maintain uniform access control list on replicas.This is a result of “Enforce Consistent ACL.”

RRV bucket is corrupt RRV stands for Record Relocation Vector. It is a pointer that tells Notes where to find a specific NoteID, and it is bad if it’s corrupted. You can try a fixup, but it might be borked and needs a new replica.

Truncated Try fixup. Maybe. Maybe not.

Device error Uh oh

Database is corrupt; cannot allocate space

This one is bad, too

B-tree structure is invalid You never want to see a b-tree error. It usually means you have to replace the database.

Extremely inefficient Agent Manager: Full text operations on database “xyz.nsf” which is not full-text indexed

Page 32: Advanced Server Monitoring and Alert Notifications

What We’ll Cover …

• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in

the log• Using LotusScript to access server statistics• Wrap-up

32

Page 33: Advanced Server Monitoring and Alert Notifications

We’re Circling Back to Notification Methods

• Here is the panoply of notification methods• The most widely-used notification method is to send an email to

an admin group when a problem occurs And yet, that is also very risky, since the email system itself

might be the problem

33

Page 34: Advanced Server Monitoring and Alert Notifications

Paging Dr. Howard, Dr. Fine, Dr. Howard …

• 14 ways to be notified – these 2 are the most widely used But not necessarily the best to use

• Paging notification is a good choice, but not if you are paging through a third-party phone system, like Verizon or AT&T They generally require an email to be sent They have no Service Level Agreement – NONE!

• Sadly, due to budget and resource constraints, we generally see these two mail or paging methods used the most in production environments

Method Result CommentsMail Mails the event to a person

or to a mail-in databaseGood for most events in multi-protocol environments, but as mentioned, it’s bad if the mail system goes down

Pager Uses the mail address of an alphanumeric pager

OK, but limited value because it uses mail system; if mail itself is down, there are issues

34

Page 35: Advanced Server Monitoring and Alert Notifications

The Most Important Notification Options

• These two are the best, and there’s one more that’s not listed

Method Result Comments

SNMP Trap Sends the event as an SNMP trap. Select this method only if the specified server is running the Event Interceptor task and the Domino SNMP Agent.

This is truly an ideal notification method because it does not depend on Notes protocols actually working

Forward event to Tivoli Event Console

Allows the Tivoli Enterprise Console (TEC) to receive IBM Domino events and reformat them as TEC events. The reformatted TEC event is then sent to the TEC server that you specify in the Configuration Settings document.

Check with the Tivoli team to see if it’s possible to use this in your environment

35

Page 36: Advanced Server Monitoring and Alert Notifications

Customized Tivoli Package

• In one case, I developed a custom monitoring solution that fed trouble tickets into a version of the Tivoli Event Console that was not supported by the Domino Tivoli event handler system When you have to deal with extreme monitoring capability with

high reliability, you sometimes need to get in deep This is very effective because it uses that postemsg.exe

executable on the OS level to send the message to the TEC Note that the message is carefully crafted to form a large

command string which sends the ticket to Tivoli Check with your Tivoli team to see if you can take advantage

of this method

36

Page 37: Advanced Server Monitoring and Alert Notifications

Customized Tivoli Package (cont.)

• As someone who creates a lot of Domino monitoring solutions, I often have to bend the rules and do some development (Ugh!) Executable called postemsg.exe was placed on the c: drive of a

Windows server that was the central Domino monitoring hub• This is very effective because it uses that postemsg.exe

executable on the OS level to send the message to the TEC With some knowledge of LotusScript, I crafted a system to

monitor servers and send results back to the Tivoli event console

vMess1 = {C:\Windows\System32\postemsg.exe -f F:\TECAlerts\tecserver.cfg -r CRITICAL -m "} + vLongMessage + {" } vMess2 = {hostname="} + vReportServerName + {" }vMess3 = {sub_source="MESSAGINGLOTUS" Mynotify_supportfilter="1" MyNotify_severity="2" }vMess4 = {MyNotify_tin=“0066" MyNotify_atin="0066" MyNotify_msg="Domino mail server outage" } vMess5 = {MyNotify_srcplatform="W" MyNotify_processreturncode="0" MyNotify_correlation="0" }vMess6 = {MyNotify_app="DominoMail" MyNotify_env="Production" MESSAGING_LOTUS MESSAGING}vMess = vMess1+ vMess2 + vMess3 + vMess4 + vMess5 +vMess6result = Shell( vmess , 6 )

37

Page 38: Advanced Server Monitoring and Alert Notifications

What We’ll Cover …

• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in

the log• Using LotusScript to access server statistics• Wrap-up

38

Page 39: Advanced Server Monitoring and Alert Notifications

DDM Is an Advanced Topic and Is Best Used by New Admins

• Domino Domain Monitoring (DDM) is a powerful, yet complex tool, that is often overlooked by administrators

• If you are using Domino 6, 7, or 8, you are already a proud owner of Domino Domain Monitoring Database, and could already be using its powerful functionality

• If you’re not using DDM, you see this with each server start

01/22/2013 11:49:08 AM Warning: All Domino Domain Monitoring probes are disabled resulting in the loss of valuable diagnostic information.Please configure DDM probes in events4.nsf. Assess DDM reports in ddm.nsf.

39

Page 40: Advanced Server Monitoring and Alert Notifications

DDM Backs Up Its Discoveries with Explanations

• DDM explains the probable cause, possible solution, and sometimes corrective actions That’s right; actions that will actually correct the problem you’re

experiencing• These are stored in the EVENTS4.NSF and are configurable by you

Let’s look for the error “ATTEMPT TO ACCESS DATABASE BY”

40

Page 41: Advanced Server Monitoring and Alert Notifications

Looking in the View, “Event Messages by Text”

• We can find that error message in the EVENTS4.NSF And discover how we might change report DDM produces

41

Page 42: Advanced Server Monitoring and Alert Notifications

The Cause, Solution, and Corrective Action Are Listed

• This document has all the probable cause, possible solution, and corrective action These are supplied by Lotus and include the code in the

corrective action

42

Page 43: Advanced Server Monitoring and Alert Notifications

Click the Link to the Modular Corrective Action

• Clicking the link will take you to the code This could be in formula language, LotusScript

43

Page 44: Advanced Server Monitoring and Alert Notifications

The Modular Corrective Action Is Re-Usable

• At the bottom of the modular action, there is a list of other error text messages that also use this action That same action that was written only a single time can be

used as a corrective action multiple times

44

Page 45: Advanced Server Monitoring and Alert Notifications

Modular Documents – Cause, Solution, and Corrective Actions• Domino 8 comes with over 1,000 modular documents

Chances are your solutions are already there for most issues You can use any of the same solutions provided by IBM for your

custom solution Or you can add brand new ones

45

Page 46: Advanced Server Monitoring and Alert Notifications

Modular Documents Let You Create Describe Issues

• Modular documents let you add your own probable cause and possible solution text And create corrective actions that are created with

formula code and LotusScript agents

46

Page 47: Advanced Server Monitoring and Alert Notifications

You Can Add to the Solutions That Will Display with the Error• Select the custom entries tab and add the description • A custom solution of composing an email to the target user can

be inserted

47

Page 48: Advanced Server Monitoring and Alert Notifications

Changes the DDM Report

• The modular document now has the “compose an email” choice

48

Page 49: Advanced Server Monitoring and Alert Notifications

It Starts the Email for You

• The code plugs in the user’s name and the database that was being accessed And it’s all done with modular documents in EVENTS4.SNF

49

Page 50: Advanced Server Monitoring and Alert Notifications

Role in DDM ACL That Will Restrict Who Can Use Actions

• Many events have corrective actions associated with them Only users with the Execute CA role in the DDM ACL are able

to access the command actions and the corrective action text and links This ensures that only qualified team members will be able to

make the changes

50

Page 51: Advanced Server Monitoring and Alert Notifications

What We’ll Cover …

• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in

the log• Using LotusScript to access server statistics• Wrap-up

51

Page 52: Advanced Server Monitoring and Alert Notifications

Dealing with Problematic Servers

• Sometimes there are servers with issues that crop up We would like to collect statistics for analysis from these

systems more frequently than we do from the standard statistics collection interval If you try to add a second collection interval on a server,

you’ll get this:

52

Page 53: Advanced Server Monitoring and Alert Notifications

Each Server Is Allowed to Collect Stats with Only One Interval• A server can only have one

collection interval You must create a second

collection document for another server

Don’t forget to add the “collect” task to servertasks= in NOTES.INI

• Let’s look at a server that has CPU spikes

• First, we create a statistics collection document for a second server to take statistics from our problem server

53

Page 54: Advanced Server Monitoring and Alert Notifications

Set the Collection Interval for Five Minutes

• Set collection interval for 5 minutes Do not check any filters!!!

They tell the collector to ignore the statistics you checked

• Note that stats are being logged to a database called ProblemServer.NSF Used exclusively to track CPU util

of Traveler task Note that the data in this

example has been fictionalized for effect

54

Page 55: Advanced Server Monitoring and Alert Notifications

Create a Special View That Tracks CPU Utilization for Traveler• In this case, it’s the Traveler CPU we want to track• We create a custom view for the collecting database that only has

the server name, the time of collection, and the statistic called Platform.Process.Traveler.1.PctCpuUtil This will be used to easily create a graph of the CPU activity

55

Page 56: Advanced Server Monitoring and Alert Notifications

Collect the Data, Copy It as a Table from the Custom View

• After collecting a week’s worth of data, we experience the CPU utilization

• All the data in the view is selected using Ctrl-A It is copied as a table

Copying views as a table is my favorite feature in Notes• A Monitoring Results template is posted on my Web site

A URL to this template is included at the end of the presentation

56

Page 57: Advanced Server Monitoring and Alert Notifications

Data Has Been Copied to a Spreadsheet

• A simple paste of the data puts it into a spreadsheet where we are ready to turn it into a chart

57

Page 58: Advanced Server Monitoring and Alert Notifications

Use the Tools in Your Spreadsheet to Create a Graph

• Select the columns Collection Time and Traveler CPU

• Create a graph from the data In this example, a scatter chart

type with smooth lines is being used

58

Page 59: Advanced Server Monitoring and Alert Notifications

The Resulting Graph

• This produces an excellent graph of the CPU utilization over a ten-day period with samples being taken at intervals of 5 minutes And it took less than 5 minutes to make this chart

One adjustment was made to the x-axis formatting and the legend was removed

59

Page 60: Advanced Server Monitoring and Alert Notifications

Demonstration

• Creating a graph of results from a custom view of collected data

60

Page 61: Advanced Server Monitoring and Alert Notifications

What We’ll Cover …

• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in

the log• Using LotusScript to access server statistics• Wrap-up

61

Page 62: Advanced Server Monitoring and Alert Notifications

Some Events Occur on the Console, but Not in the Log

• Note: In this example, the server stops reporting at 11:04 pm• Then, at 11:27 pm, it is back on line• What happened in the interim?Name: Mail1/domlabTime: 01/04 11:02:05 PM  Miscellaneous Events:

01/04/2013 11:04:17 PM Pulling icl.ntf from Maill2/domlab icl.ntf01/04/2013 11:04:31 PM Access control is set in catalog.nsf to not allow replication from BES02/domlab catalog.nsf01/04/2013 11:04:31 PM Access control is set in mail2/domlab catalog.nsf to not allow replication from catalog.nsf01/04/2013 11:04:33 PM Pulling ddm.nsf from Mail2/domlab ddm.nsf01/04/2013 11:04:35 PM Pushing ddm.nsf to Mail2/domlab ddm.nsf01/04/2013 11:04:38 PM Finished replication with server Mail2/domlab01/04/2013 11:04:43 PM Router: Transferred 1 messages to MAIL2.domlab.COM (host MAIL02.domlabUSA.COM) via SMTP01/04/2013 11:04:51 PM Opened session for Mail2/domlab (Release 8.5.2FP1)

Name: Mail1/domlabTime: 01/04 11:27:11 PM - 01/04 11:27:47 PM  Miscellaneous Events:01/04/2013 11:27:11 PM Recovery Manager: Restart Recovery complete. (196/1686 databases needed full/partial recovery)01/04/2013 11:27:11 PM Informational - The DAOS catalog is not synchronized. Deletions will be postponed. Please run 'tell daosmgr resync' at the next convenient opportunity to re-synchronize.01/04/2013 11:27:12 PM Event Monitor started01/04/2013 11:27:12 PM Warning: All Domino Domain Monitoring probes are disabled res

62

Page 63: Advanced Server Monitoring and Alert Notifications

There Is Action in the CONSOLE.LOG

• CONSOLE.LOG and other logs are in the folder called IBM_TECHNICAL_SUPPORT under the data folder

• The CONSOLE.LOG on a server often contains data that has been seen on the Domino server console, but not in the server log It shows there was a Long Held Lock Dump and then a panic!

Lock(Mode=SIX* LockID(DB DB=G:\Lotus\Domino\Data\mail\web\Complaints.nsf)) Waiters countNonIntentLocks = 1 countIntentLocks = 1, queuLength = 95[Req(Status=Granted Mode=IS Class=Manual Nest=0 Cnt=1Tran=0 Func=N/A m\lkmgr.cpp:159 [0D64:0002-0D60])rm_lkmgr_cpp:2070 rm_lkmgr_cpp:1306nsfsem1_c:169nsfsem1_c:1020nsfsem6_c:503Req(Status=Granted Mode=SIX Class=Manual Nest=0 Cnt=1Tran=0 Func=N/A inplace.c:153 [099C:0165-12FC])LkMgr END Long Held Lock Dump ------------------01/04/2013 11:04:51 PM Opened session for Terry Mallory/domlab (Release 8.5.2FP2)01/04/2013 11:04:51 PM Closed session for Terry Mallory/domlab Databases accessed: 1 Documents read: 0 Documents written: 0The server process terminated abnormally with the exit status = 1. Please send this information and the collected nsd log to IBM Support. This process will now Panic in order to start fault recovery operations.

63

Page 64: Advanced Server Monitoring and Alert Notifications

Why Did This Happen?

• In this case, there was a large number of email messages with big attachments waiting to be processed in the MAIL.BOXES

• The server was relatively underpowered• Plus, I think the messages were part of an emailing made by

a CEO And we all know, the mostly visible executives have the worst

time with any piece of messaging software

64

Page 65: Advanced Server Monitoring and Alert Notifications

Here’s Another Example of Helpful Console Logging

• I entered the following into the Domino server console Tell traveler stat show

• That command generates hundreds of lines of statistics and other information It shows clearly on the

server console

65

Page 66: Advanced Server Monitoring and Alert Notifications

Here’s Another Reason for Console Logging

• Here’s the Domino server log showing me doing several furious requests to the Traveler task to Tell traveler stat show

• I get nothing

> tell traveler stat show01/06/2013 12:24:49 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show

> tell traveler stat show

> tell traveler stat show01/06/2013 12:24:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show01/06/2013 12:24:55 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show

> tell traveler stat show01/06/2013 12:24:55 PM Directory Cataloger finished processing names.nsf: Directory Catalog has no Configuration record01/06/2013 12:25:43 PM AMgr: Start executing agent 'PullFromAdmin4' in 'certreq.nsf' by Executive '1'01/06/2013 12:25:43 PM AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'PullFromAdmin4' in 'certreq.nsf'01/06/2013 12:25:43 PM AMgr: 'Agent 'PullFromAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab'01/06/2013 12:25:44 PM AMgr: Start executing agent 'SubmitToAdmin4' in 'certreq.nsf' by Executive '1'01/06/2013 12:25:44 PM AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'SubmitToAdmin4' in 'certreq.nsf'01/06/2013 12:25:44 PM AMgr: 'Agent 'SubmitToAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab'01/06/2013 12:25:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show

> tell traveler stat show

66

Page 67: Advanced Server Monitoring and Alert Notifications

Check the IBM_TECHNICAL_SUPPORT Folder

• CONSOLE.LOG from the IBM_TECHNICAL_SUPPORT folder on the server

• Whenever there are server issues, don’t forget to check the console.log for evidence

01/06/2013 12:25:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat showtell traveler stat show CPU.Pct.000-010 = 7 ClusterCache.Access = 1 Constrained.count = 0 Constrained.state = false DB.Connections = 1 DB.Connections.Idle = 1 DB.Connections.Max = 7000 DCA.C.CheckAccessRights = 2 DCA.C.Count.NSFDbClose = 3 DCA.C.Count.NSFDbOpen = 3 DCA.C.Count.NSFNoteClose = 2 DCA.C.Count.NSFNoteOpen = 2 DCA.C.HTMLCreateConverter = 1 DCA.C.HTMLDestroyConverter = 1 DCA.C.ModDoc.RunCount = 1 DCA.C.ModDoc.SyncableDocs = 1

67

Page 68: Advanced Server Monitoring and Alert Notifications

Console Logging Configuration

• To start a console log permanently on your servers, add this to the NOTES.INI Console_Log_Enabled = 1

• Use the following values 0 – Disable Console Log file logging 1 – Enable Console Log file logging

• You can also toggle logging to the Console Log file from the server console Use the start consolelog and stop consolelog commands

• Obviously, this is an important feature and you’d want it to be enabled all the time

• Set a maximum size of almost 100MB for the console log using the following parameter Console_Log_Max_Kbytes = 100000

68

Page 69: Advanced Server Monitoring and Alert Notifications

Console Mirroring

• You can also use Console Mirroring, which is slightly different than just the normal console logging

• Console log mirroring causes a new server thread to be created It monitors all messages written to the Console Log file and

duplicates these messages into another file When this file is filled, the thread closes the mirrored file and

creates a new file into which subsequent messages are written• Console log mirroring has three related NOTES.INI settings:

Console_Log_Mirror=1 – Enables the mirroring feature Retain_Mirror_Logs=1 – Prevents deletion of previous mirrors

when Domino starts Console_Log_Max_Kbytes= – Sets the max size of the Console

Log/mirror files

69

Page 70: Advanced Server Monitoring and Alert Notifications

What We’ll Cover …

• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in

the log• Using LotusScript to access server statistics• Wrap-up

70

Page 71: Advanced Server Monitoring and Alert Notifications

Can You Be an Admin/Dev Person?• When you’re an admin, there are a lot of reasons to learn

LotusScript Write your own agents that gather statistics and monitor servers

• LotusScript lets you ask for a statistic on all of your servers, one by one, then store it in a database and produce alerts and notifications These can be more sophisticated than native Notes monitoring

• The following are two examples of coding that you might find helpful If you have buddies in the Dev side of the house, they might find

this interesting Generally, Dev people don’t do applications that help

administrators Their focus is on user applications

• These two snippets can give you an idea of the potential you have when dealing with statistics and LotusScript

71

Page 72: Advanced Server Monitoring and Alert Notifications

Gathering Script Using LotusScript Is Easy

• Here’s an agent that simply issues a Domino server console command Then shows you the value in a MessageBox

• It’s pretty cool for 10 lines of code

Sub Initialize Dim session As New NotesSession Dim vServername As String Dim vConsoleCommand As String Dim vConsoleReturn As String vConsoleCommand = "sho stat server.trans.total“ vServerName = "admin/domlab“ vConsoleReturn = session.sendConsoleCommand(vServerName,vConsoleCommand) MessageBox(vConsoleReturn)End Sub

72

Page 73: Advanced Server Monitoring and Alert Notifications

The Mail.TotalPending Statistic

• This stat was introduced in Release 5, and I use it all the time in monitoring servers for mail backing up

• From SPR# BSAW4HFMPY www-304.ibm.com/support/docview.wss?uid=sim43d86a0d3e79

e0e6785256a8500737f2b Added a new Mail.TotalPending statistic that shows the count of

messages pending in mail.box• This statistic is updated once every 5 minutes by the Server task

Does not depend on the Router task for updates• Provides information about total backlog of mail in the event that

the router is hung or not started High value indicates that a mail routing problem needs further

investigation

73

Page 74: Advanced Server Monitoring and Alert Notifications

Here’s a Similar Code Snippet That Gets Total Pending Mail

• This is from a much larger agent that runs every 5 minutes on 70 servers

• Remember, LotusScript lets you issue console commands Then, take the results of the command and take other actions

• Our job is to parse out the number 130 from the show stat command Show stat mail.totalpending

• We’re grabbing the stat mail.waiting, which looks like this on the consoleMail.TotalPending = 1301 statistics found

74

Page 75: Advanced Server Monitoring and Alert Notifications

Here’s the Meat and Potatoes

• Mail.TotalPending = 1301 statistics found

• Then, it’s being parsed out so that only the number is grabbed vLocStart = InStr(1,vConsoleReturn,"=",5 )+2

Gives location 2 chars past = sign where the number starts vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStart

Gives location of end of number at line feed CHR(13) vStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd)

That’s the number as a string, which is converted to a number

75

Page 76: Advanced Server Monitoring and Alert Notifications

Here’s the Meat and Potatoes (cont.)

• Mail.TotalPending = 1301 statistics found

• Here’s a snippet of code that gets you the mail.totalpending statistic

vConsoleCommandPending = "sh stat mail.pending“

'lets ask the console how many messages are pendingvConsoleReturn = session.SendConsoleCommand(vServerName, vConsoleCommandPending)

vLocStart = InStr(1,vConsoleReturn,"=",5 )+2vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStartvStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd)'Print "Pending: " + Str(vMailTotalPending) + " Pending: " + vStatStr

vMailPending = Val(vStatStr)

76

Page 77: Advanced Server Monitoring and Alert Notifications

LotusScript and Monitoring/Alerting – A Great Pair of Tools

• You get the advantage of automation with the power of monitoring and alerting

• Stop issues before they become problems• Don’t forget, download the custom statrep Technotics Statrep

8.5.3 from: www.andypedisich.com

77

Page 78: Advanced Server Monitoring and Alert Notifications

What We’ll Cover …

• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in

the log• Using LotusScript to access server statistics• Wrap-up

78

Page 79: Advanced Server Monitoring and Alert Notifications

Where to Find More Information

• www-01.ibm.com/support/docview.wss?uid=swg27008849 Notes/Domino Best Practices: Performance (IBM, 2010).

• www-10.lotus.com/ldd/__00256C3E0030650D.nsf/0/1F2EBFCA1F3 5CA71852571DB00618159?Open Harry Peebles, “Domino Domain Monitoring (DDM) Educational

Resources” (IBM, 2006). • www-01.ibm.com/support/docview.wss?uid=swg21293213

How Does the notes.ini File Parameter ‘server_session_timeout’ Affect Server Performance? (IBM, 2010).

• www.ibm.com/developerworks/lotus/library/domino-server-crashes/ Kiran Bellari, “Troubleshooting Lotus Domino Hangs and

Crashes” (developerWorks, 2006).

79

Page 80: Advanced Server Monitoring and Alert Notifications

7 Key Points to Take Home

• Write your own program in LotusScript or formula language and add it to DDM’s corrective actions

• Collect statistics from problem servers by creating a second collecting server in your domain

• Console logs collect everything that happens on the console, including messages from tasks and from NOTES.INI debug parameters

• Check the replica ID for the Events4.NSF in your domain to ensure it is the same on all servers

80

Page 81: Advanced Server Monitoring and Alert Notifications

7 Key Points to Take Home (cont.)

• Full Administrator Access is a powerful tool that should be monitored for proper usage

• Event handlers can notify you about any message that appears on the console

• Email is the most widely-used notification system, but is also the most risky

81

Page 82: Advanced Server Monitoring and Alert Notifications

Thank You for Attending Our Session!

• Please don’t forget to fill out your evaluations. We read them all!• Please feel free to stop us and ask questions or just have pleasant

conversations

Contact [email protected]

www.technotics.comwww.andypedisich.com

82