advanced server monitoring and alert notifications
DESCRIPTION
Advanced Server Monitoring and Alert Notifications. Andy Pedisich Technotics. Your Presenter. One half of a pair of two hard-working IBM ® Notes ® Administrators/Developers who have worked with IBM ® Notes ® and IBM Domino ® since version 2.1 - PowerPoint PPT PresentationTRANSCRIPT
© 2013 Wellesley Information Services. All rights reserved.
Advanced Server Monitoring and Alert Notifications
Andy Pedisich Technotics
Your Presenter
• One half of a pair of two hard-working IBM® Notes® Administrators/Developers who have worked with IBM® Notes® and IBM Domino® since version 2.1 From Technotics, Inc. in Philadelphia, Pennsylvania – USA
• Andy Pedisich 28 years in IT 19 years with Lotus Notes
• Rob Axelrod 23 years in IT 19 years with Lotus Notes
2
What We’ll Cover …
• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in
the log• Using LotusScript to access server statistics• Wrap-up
3
What We’ll Cover …
• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in
the log• Using LotusScript to access server statistics• Wrap-up
4
Requirements for Efficient and Accurate Statistics Collection
• Two things are required for statistics collection: The Collect task must be running on any server that is
designated to collect the statistics And Not all servers should run the Collect task Only servers designated as collecting servers
The EVENTS4 Monitoring Configuration database must have at least one Statistics Collection document Minimum collection time should be an hour
5
There Is a Special Replica ID for Your EVENTS4.NSF
• The replica ID of system databases, such as EVENTS4, is derived from the replica ID of the Domino directoryDatabase Replica IDNAMES.NSF 852564AC:004EBCCFCATALOG.NSF 852564AC:014EBCCFEVENTS4.NSF 852564AC:024EBCCFADMIN4.NSF 852564AC:034EBCCF
• Notice that the first two numbers after the colon for the EVENTS4.NSF replica are 02 Make sure EVENTS4.NSF is the same replica ID Opening a copy from every server and putting it on your
desktop There’s some code on the next slide to help you do that
6
Add a Button to Your Toolbar
• Add this code to a button on your toolbar This is courtesy of Thomas Bahn He’s a smart guy, nice guy, and sometimes brings chocolates to
his friends from Europe www.assono.de/blog
_names := @Subset(@MailDbName; 1) : "names.nsf";
_servers := @PickList([Custom]; _names; "Servers"; "Select servers"; "Select servers to add database from"; 3);
_db := @Prompt([OkCancelEdit]; "Enter database"; "Enter the file name and path of the database to add."; "log.nsf");
@For( n := 1; n <= @Elements(_servers); n := n + 1; @Command([AddDatabase]; _servers[n] : _db) )
7
Add a Database Icon from All Servers to the Desktop
• This code will prompt you to pick the servers that have the database you want on your desktop Then it will prompt for the name of the database
And open it on all the servers you’ve selected• Use it to make sure all the EVENTS4.NSF are the same replica in
your domain
8
What We’ll Cover …
• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in
the log• Using LotusScript to access server statistics• Wrap-up
9
Event Monitoring Details
• Enough setting up already!• Event monitors of all types are set in the
EVENTS4 database• Two broad categories of events:
Event handlers Specify the action that Domino takes
when a specific event occurs Event generators
Each type of event generator has a view that provides a list of all event generators, plus additional configuration information
10
Event Generators
• Event generators deal with specific Notes/Domino issues• There are six types of event generators:
Database Event Generator Domino Server Response Event Generator Mail-Routing Event Generator Statistic Event Generator Task Status Event Generator TCP Server Event Generator
Some are used more than others• We’ll stick to the more popular ones that every administrator
should use, for starters
11
Here’s One That Everyone Should Use
• The ACL of Names.nsf should rarely change, so monitor it! Alarms should go off if it changes
• Select Names.nsf Choose either a single server or all
servers in the domain• I like to pick all servers in the domain
Admins won’t get away with anything! But I do get a storm of messages when
an ACL change occurs Every server tells me about
the change
12
Unused Space Event Generator
• This is an example of the Events system actually doing something automatically when a certain condition exists It’s questionable – it is going to execute the Compact task
immediately upon detection of free space threshold being exceeded I could see this event being used on archive servers And I wish there was a way to run it during specific hours
13
Domino Server Response Generator• One server checks others by sending a probe
It’s a good idea to try opening Names.nsf If you can’t open Names.nsf, then something is wrong!
• Default is every three minutes• Default response time tolerance is 1,000 Msecs (one second)
Your settings will depend on your own environment
14
More About Probes
• The response time is a bit on the harsh side If you leave it at 1,000 Msecs (one second), you will receive a lot
of notifications You should make it ten seconds, or whatever the metrics in
your Service Level Agreement (SLA) require• Also, be careful what servers you choose to probe other servers
Try to pick probing servers that are in the same LAN as the probed servers Otherwise, your probing will actually be testing network
latency, rather than the servers themselves I have used these probes as a method of testing exactly that
Network latency
15
Statistic Event Generators
• Statistic Event Generators monitor a specific Domino or platform statistic They can let you know when a stat goes over a particular
threshold These stat event generators are extremely valuable
Smart administrators use them every day!
16
Complete Listing of All Statistics Is in EVENTS4.NSF
• The Monitoring Configuration (EVENTS4.NSF) supplies document detailing thresholds for each statistic 1,193 statistic documents available
The complete listing is in the view Statistics by Name• But only 166 of them are considered useful for setting thresholds
and are found in the default statistics view The default statistics thresholds view only shows documents
where the field “useful” is equal to the word “Yes”
17
Finding the “Not Useful” Stats
• You might find that a statistic you need has been marked as not useful
• To see which are marked as not useful, full text index the EVENTS4.nsf
• Create an advanced query checking the field useful = “No” You might discover a statistic who’s threshold would be just
right for using
18
Why Are Most Stats Considered “Not Useful” for Thresholds?• One setting on the advanced query that controls whether it will
appear in the drop-down list when you’re setting an event generator Note that there are no Agent statistics in this list
19
Why No Agent Stats
• It’s not that the Agent stats aren’t useful They might not be valuable for threshold tracking
• In some releases, Agent.Hourly.UsedRunTime has a data type of text We can’t set a threshold with text values
20
We Do Have a Nice Way of Seeing That Stat, Though
• Technotics has created a super-customized version of the Monitoring Results database, STATREP.NSF Technotics R8.5.3 statrep It’s the stock statrep with
added views• One of these valuable
views is Agent Stats view• You can download this from:
www.andypedisich.com Look for the Admin2013 link
21
Show Me the Stats
• When you issue a SHOW STAT command at the console, Domino dumps every statistic it is tracking
• Every one of these statistics is in every single one of the documents in the STATREP.NSF database All you need is a view to see them
22
Static Statistics Are Not Useful for Thresholds
• Statistics that don’t change usually represent the operating environment of the server Server.Version.Notes = Release 8.5.3 Server.Version.OS = Windows NT 5.0 Server.CPU.Type = Intel Pentium Disk.D.Size = 71,847,784,448 Mem.PhysicalRAM = 527,433,728 Platform.Network.1.AdapterName = Intel[R] PRO_1000 MT
Server Adapter• Think these stats aren’t helpful? They are!• You can take a pretty detailed worldwide server inventory
Just by looking at the fields in STATREP.NSF
23
Wizard Lets You Choose the Method of Handling the Event
• There are lots of methods of event handing Which one you choose depends a lot on your infrastructure We’re going to talk more about the notification methods in the
next section of the presentation• For now, just remember that an event generator is fairly worthless
by itself Unless you have an effective event handler that tells you, in its
own way, what is going on with your servers
24
Event Handlers Are an Exquisite Gift
• They can give you a heads-up about issues provided by event generators
• They also give you a free-form way of being alerted of anything that happens in the Domino server log and most of what happens on the Domino server console
• You can use event handlers to respond to generators and certain add-in tasks They are most valuable for picking out text on the console that
will mean trouble if ignored• We’re going to focus on this type of event handling, since it is
less intuitive than responding to generators or add-ins
25
Basics of the Event Handler Configuration
• 3 screens to deal with• Decide whether you want to track
an event on just a few servers or all servers You might want to track a
particular event on mail servers only
• Decide what triggers a notification We’re going for free-form, so
we will select “any event that matches a criteria”
26
Second Set of Choice for Event Handling
• When working with console events, select: “Events can be of any type” “Events can be of any severity”
• Then look for a particular string of text in the event message This can be absolutely any text
that appears on the console We will explain why we are
picking the text “full administrator access” in a moment
27
Final Set-Up Tab for Event Handling
• Define action to occur when the text appears
• We’ve selected email notification But there are over a dozen
others that we will discuss in a few moments
• Note: You can control the time of day the event handler is on the job I wish they did that for event
generators
28
Why Did We Monitor the Text Full Access Administrator?
• It is the highest level of administrative access to the server Manager access with all access privileges enabled to all
databases on the server, regardless of the ACL settings or readername settings
Access to any unencrypted data on the server• Your security model should make FAA almost unnecessary
When full FAA is turned on, you want to know about it to prevent some hooligan from doing shenanigans
29
Other Words You Should Track with Event Handlers
• “Deleted by” This generally means someone has deleted a database Usually their mail file if they have manager access
You’ll be getting out the back-up tapes in a minute
01/05/2013 04:02:17 PM Opened live remote console session for Andrew M Pedisich/DomLab01/05/2013 04:04:50 PM Database ArchiveOfIncriminatingPhotos.nsf deleted by Andrew M Pedisich/DomLab
30
Other Bad Words to Watch for Extremely Inefficient
• Here are some other words and expressions to watch for:
31
Expression Issue
An exception occurred while writing data into database
Bad news all around. You’re going have to get to the database and run some maintenance.
Replication cannot proceed Replication cannot proceed because it cannot maintain uniform access control list on replicas.This is a result of “Enforce Consistent ACL.”
RRV bucket is corrupt RRV stands for Record Relocation Vector. It is a pointer that tells Notes where to find a specific NoteID, and it is bad if it’s corrupted. You can try a fixup, but it might be borked and needs a new replica.
Truncated Try fixup. Maybe. Maybe not.
Device error Uh oh
Database is corrupt; cannot allocate space
This one is bad, too
B-tree structure is invalid You never want to see a b-tree error. It usually means you have to replace the database.
Extremely inefficient Agent Manager: Full text operations on database “xyz.nsf” which is not full-text indexed
What We’ll Cover …
• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in
the log• Using LotusScript to access server statistics• Wrap-up
32
We’re Circling Back to Notification Methods
• Here is the panoply of notification methods• The most widely-used notification method is to send an email to
an admin group when a problem occurs And yet, that is also very risky, since the email system itself
might be the problem
33
Paging Dr. Howard, Dr. Fine, Dr. Howard …
• 14 ways to be notified – these 2 are the most widely used But not necessarily the best to use
• Paging notification is a good choice, but not if you are paging through a third-party phone system, like Verizon or AT&T They generally require an email to be sent They have no Service Level Agreement – NONE!
• Sadly, due to budget and resource constraints, we generally see these two mail or paging methods used the most in production environments
Method Result CommentsMail Mails the event to a person
or to a mail-in databaseGood for most events in multi-protocol environments, but as mentioned, it’s bad if the mail system goes down
Pager Uses the mail address of an alphanumeric pager
OK, but limited value because it uses mail system; if mail itself is down, there are issues
34
The Most Important Notification Options
• These two are the best, and there’s one more that’s not listed
Method Result Comments
SNMP Trap Sends the event as an SNMP trap. Select this method only if the specified server is running the Event Interceptor task and the Domino SNMP Agent.
This is truly an ideal notification method because it does not depend on Notes protocols actually working
Forward event to Tivoli Event Console
Allows the Tivoli Enterprise Console (TEC) to receive IBM Domino events and reformat them as TEC events. The reformatted TEC event is then sent to the TEC server that you specify in the Configuration Settings document.
Check with the Tivoli team to see if it’s possible to use this in your environment
35
Customized Tivoli Package
• In one case, I developed a custom monitoring solution that fed trouble tickets into a version of the Tivoli Event Console that was not supported by the Domino Tivoli event handler system When you have to deal with extreme monitoring capability with
high reliability, you sometimes need to get in deep This is very effective because it uses that postemsg.exe
executable on the OS level to send the message to the TEC Note that the message is carefully crafted to form a large
command string which sends the ticket to Tivoli Check with your Tivoli team to see if you can take advantage
of this method
36
Customized Tivoli Package (cont.)
• As someone who creates a lot of Domino monitoring solutions, I often have to bend the rules and do some development (Ugh!) Executable called postemsg.exe was placed on the c: drive of a
Windows server that was the central Domino monitoring hub• This is very effective because it uses that postemsg.exe
executable on the OS level to send the message to the TEC With some knowledge of LotusScript, I crafted a system to
monitor servers and send results back to the Tivoli event console
vMess1 = {C:\Windows\System32\postemsg.exe -f F:\TECAlerts\tecserver.cfg -r CRITICAL -m "} + vLongMessage + {" } vMess2 = {hostname="} + vReportServerName + {" }vMess3 = {sub_source="MESSAGINGLOTUS" Mynotify_supportfilter="1" MyNotify_severity="2" }vMess4 = {MyNotify_tin=“0066" MyNotify_atin="0066" MyNotify_msg="Domino mail server outage" } vMess5 = {MyNotify_srcplatform="W" MyNotify_processreturncode="0" MyNotify_correlation="0" }vMess6 = {MyNotify_app="DominoMail" MyNotify_env="Production" MESSAGING_LOTUS MESSAGING}vMess = vMess1+ vMess2 + vMess3 + vMess4 + vMess5 +vMess6result = Shell( vmess , 6 )
37
What We’ll Cover …
• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in
the log• Using LotusScript to access server statistics• Wrap-up
38
DDM Is an Advanced Topic and Is Best Used by New Admins
• Domino Domain Monitoring (DDM) is a powerful, yet complex tool, that is often overlooked by administrators
• If you are using Domino 6, 7, or 8, you are already a proud owner of Domino Domain Monitoring Database, and could already be using its powerful functionality
• If you’re not using DDM, you see this with each server start
01/22/2013 11:49:08 AM Warning: All Domino Domain Monitoring probes are disabled resulting in the loss of valuable diagnostic information.Please configure DDM probes in events4.nsf. Assess DDM reports in ddm.nsf.
39
DDM Backs Up Its Discoveries with Explanations
• DDM explains the probable cause, possible solution, and sometimes corrective actions That’s right; actions that will actually correct the problem you’re
experiencing• These are stored in the EVENTS4.NSF and are configurable by you
Let’s look for the error “ATTEMPT TO ACCESS DATABASE BY”
40
Looking in the View, “Event Messages by Text”
• We can find that error message in the EVENTS4.NSF And discover how we might change report DDM produces
41
The Cause, Solution, and Corrective Action Are Listed
• This document has all the probable cause, possible solution, and corrective action These are supplied by Lotus and include the code in the
corrective action
42
Click the Link to the Modular Corrective Action
• Clicking the link will take you to the code This could be in formula language, LotusScript
43
The Modular Corrective Action Is Re-Usable
• At the bottom of the modular action, there is a list of other error text messages that also use this action That same action that was written only a single time can be
used as a corrective action multiple times
44
Modular Documents – Cause, Solution, and Corrective Actions• Domino 8 comes with over 1,000 modular documents
Chances are your solutions are already there for most issues You can use any of the same solutions provided by IBM for your
custom solution Or you can add brand new ones
45
Modular Documents Let You Create Describe Issues
• Modular documents let you add your own probable cause and possible solution text And create corrective actions that are created with
formula code and LotusScript agents
46
You Can Add to the Solutions That Will Display with the Error• Select the custom entries tab and add the description • A custom solution of composing an email to the target user can
be inserted
47
Changes the DDM Report
• The modular document now has the “compose an email” choice
48
It Starts the Email for You
• The code plugs in the user’s name and the database that was being accessed And it’s all done with modular documents in EVENTS4.SNF
49
Role in DDM ACL That Will Restrict Who Can Use Actions
• Many events have corrective actions associated with them Only users with the Execute CA role in the DDM ACL are able
to access the command actions and the corrective action text and links This ensures that only qualified team members will be able to
make the changes
50
What We’ll Cover …
• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in
the log• Using LotusScript to access server statistics• Wrap-up
51
Dealing with Problematic Servers
• Sometimes there are servers with issues that crop up We would like to collect statistics for analysis from these
systems more frequently than we do from the standard statistics collection interval If you try to add a second collection interval on a server,
you’ll get this:
52
Each Server Is Allowed to Collect Stats with Only One Interval• A server can only have one
collection interval You must create a second
collection document for another server
Don’t forget to add the “collect” task to servertasks= in NOTES.INI
• Let’s look at a server that has CPU spikes
• First, we create a statistics collection document for a second server to take statistics from our problem server
53
Set the Collection Interval for Five Minutes
• Set collection interval for 5 minutes Do not check any filters!!!
They tell the collector to ignore the statistics you checked
• Note that stats are being logged to a database called ProblemServer.NSF Used exclusively to track CPU util
of Traveler task Note that the data in this
example has been fictionalized for effect
54
Create a Special View That Tracks CPU Utilization for Traveler• In this case, it’s the Traveler CPU we want to track• We create a custom view for the collecting database that only has
the server name, the time of collection, and the statistic called Platform.Process.Traveler.1.PctCpuUtil This will be used to easily create a graph of the CPU activity
55
Collect the Data, Copy It as a Table from the Custom View
• After collecting a week’s worth of data, we experience the CPU utilization
• All the data in the view is selected using Ctrl-A It is copied as a table
Copying views as a table is my favorite feature in Notes• A Monitoring Results template is posted on my Web site
A URL to this template is included at the end of the presentation
56
Data Has Been Copied to a Spreadsheet
• A simple paste of the data puts it into a spreadsheet where we are ready to turn it into a chart
57
Use the Tools in Your Spreadsheet to Create a Graph
• Select the columns Collection Time and Traveler CPU
• Create a graph from the data In this example, a scatter chart
type with smooth lines is being used
58
The Resulting Graph
• This produces an excellent graph of the CPU utilization over a ten-day period with samples being taken at intervals of 5 minutes And it took less than 5 minutes to make this chart
One adjustment was made to the x-axis formatting and the legend was removed
59
Demonstration
• Creating a graph of results from a custom view of collected data
60
What We’ll Cover …
• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in
the log• Using LotusScript to access server statistics• Wrap-up
61
Some Events Occur on the Console, but Not in the Log
• Note: In this example, the server stops reporting at 11:04 pm• Then, at 11:27 pm, it is back on line• What happened in the interim?Name: Mail1/domlabTime: 01/04 11:02:05 PM Miscellaneous Events:
01/04/2013 11:04:17 PM Pulling icl.ntf from Maill2/domlab icl.ntf01/04/2013 11:04:31 PM Access control is set in catalog.nsf to not allow replication from BES02/domlab catalog.nsf01/04/2013 11:04:31 PM Access control is set in mail2/domlab catalog.nsf to not allow replication from catalog.nsf01/04/2013 11:04:33 PM Pulling ddm.nsf from Mail2/domlab ddm.nsf01/04/2013 11:04:35 PM Pushing ddm.nsf to Mail2/domlab ddm.nsf01/04/2013 11:04:38 PM Finished replication with server Mail2/domlab01/04/2013 11:04:43 PM Router: Transferred 1 messages to MAIL2.domlab.COM (host MAIL02.domlabUSA.COM) via SMTP01/04/2013 11:04:51 PM Opened session for Mail2/domlab (Release 8.5.2FP1)
Name: Mail1/domlabTime: 01/04 11:27:11 PM - 01/04 11:27:47 PM Miscellaneous Events:01/04/2013 11:27:11 PM Recovery Manager: Restart Recovery complete. (196/1686 databases needed full/partial recovery)01/04/2013 11:27:11 PM Informational - The DAOS catalog is not synchronized. Deletions will be postponed. Please run 'tell daosmgr resync' at the next convenient opportunity to re-synchronize.01/04/2013 11:27:12 PM Event Monitor started01/04/2013 11:27:12 PM Warning: All Domino Domain Monitoring probes are disabled res
62
There Is Action in the CONSOLE.LOG
• CONSOLE.LOG and other logs are in the folder called IBM_TECHNICAL_SUPPORT under the data folder
• The CONSOLE.LOG on a server often contains data that has been seen on the Domino server console, but not in the server log It shows there was a Long Held Lock Dump and then a panic!
Lock(Mode=SIX* LockID(DB DB=G:\Lotus\Domino\Data\mail\web\Complaints.nsf)) Waiters countNonIntentLocks = 1 countIntentLocks = 1, queuLength = 95[Req(Status=Granted Mode=IS Class=Manual Nest=0 Cnt=1Tran=0 Func=N/A m\lkmgr.cpp:159 [0D64:0002-0D60])rm_lkmgr_cpp:2070 rm_lkmgr_cpp:1306nsfsem1_c:169nsfsem1_c:1020nsfsem6_c:503Req(Status=Granted Mode=SIX Class=Manual Nest=0 Cnt=1Tran=0 Func=N/A inplace.c:153 [099C:0165-12FC])LkMgr END Long Held Lock Dump ------------------01/04/2013 11:04:51 PM Opened session for Terry Mallory/domlab (Release 8.5.2FP2)01/04/2013 11:04:51 PM Closed session for Terry Mallory/domlab Databases accessed: 1 Documents read: 0 Documents written: 0The server process terminated abnormally with the exit status = 1. Please send this information and the collected nsd log to IBM Support. This process will now Panic in order to start fault recovery operations.
63
Why Did This Happen?
• In this case, there was a large number of email messages with big attachments waiting to be processed in the MAIL.BOXES
• The server was relatively underpowered• Plus, I think the messages were part of an emailing made by
a CEO And we all know, the mostly visible executives have the worst
time with any piece of messaging software
64
Here’s Another Example of Helpful Console Logging
• I entered the following into the Domino server console Tell traveler stat show
• That command generates hundreds of lines of statistics and other information It shows clearly on the
server console
65
Here’s Another Reason for Console Logging
• Here’s the Domino server log showing me doing several furious requests to the Traveler task to Tell traveler stat show
• I get nothing
> tell traveler stat show01/06/2013 12:24:49 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
> tell traveler stat show
> tell traveler stat show01/06/2013 12:24:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show01/06/2013 12:24:55 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
> tell traveler stat show01/06/2013 12:24:55 PM Directory Cataloger finished processing names.nsf: Directory Catalog has no Configuration record01/06/2013 12:25:43 PM AMgr: Start executing agent 'PullFromAdmin4' in 'certreq.nsf' by Executive '1'01/06/2013 12:25:43 PM AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'PullFromAdmin4' in 'certreq.nsf'01/06/2013 12:25:43 PM AMgr: 'Agent 'PullFromAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab'01/06/2013 12:25:44 PM AMgr: Start executing agent 'SubmitToAdmin4' in 'certreq.nsf' by Executive '1'01/06/2013 12:25:44 PM AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'SubmitToAdmin4' in 'certreq.nsf'01/06/2013 12:25:44 PM AMgr: 'Agent 'SubmitToAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab'01/06/2013 12:25:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
> tell traveler stat show
66
Check the IBM_TECHNICAL_SUPPORT Folder
• CONSOLE.LOG from the IBM_TECHNICAL_SUPPORT folder on the server
• Whenever there are server issues, don’t forget to check the console.log for evidence
01/06/2013 12:25:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat showtell traveler stat show CPU.Pct.000-010 = 7 ClusterCache.Access = 1 Constrained.count = 0 Constrained.state = false DB.Connections = 1 DB.Connections.Idle = 1 DB.Connections.Max = 7000 DCA.C.CheckAccessRights = 2 DCA.C.Count.NSFDbClose = 3 DCA.C.Count.NSFDbOpen = 3 DCA.C.Count.NSFNoteClose = 2 DCA.C.Count.NSFNoteOpen = 2 DCA.C.HTMLCreateConverter = 1 DCA.C.HTMLDestroyConverter = 1 DCA.C.ModDoc.RunCount = 1 DCA.C.ModDoc.SyncableDocs = 1
67
Console Logging Configuration
• To start a console log permanently on your servers, add this to the NOTES.INI Console_Log_Enabled = 1
• Use the following values 0 – Disable Console Log file logging 1 – Enable Console Log file logging
• You can also toggle logging to the Console Log file from the server console Use the start consolelog and stop consolelog commands
• Obviously, this is an important feature and you’d want it to be enabled all the time
• Set a maximum size of almost 100MB for the console log using the following parameter Console_Log_Max_Kbytes = 100000
68
Console Mirroring
• You can also use Console Mirroring, which is slightly different than just the normal console logging
• Console log mirroring causes a new server thread to be created It monitors all messages written to the Console Log file and
duplicates these messages into another file When this file is filled, the thread closes the mirrored file and
creates a new file into which subsequent messages are written• Console log mirroring has three related NOTES.INI settings:
Console_Log_Mirror=1 – Enables the mirroring feature Retain_Mirror_Logs=1 – Prevents deletion of previous mirrors
when Domino starts Console_Log_Max_Kbytes= – Sets the max size of the Console
Log/mirror files
69
What We’ll Cover …
• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in
the log• Using LotusScript to access server statistics• Wrap-up
70
Can You Be an Admin/Dev Person?• When you’re an admin, there are a lot of reasons to learn
LotusScript Write your own agents that gather statistics and monitor servers
• LotusScript lets you ask for a statistic on all of your servers, one by one, then store it in a database and produce alerts and notifications These can be more sophisticated than native Notes monitoring
• The following are two examples of coding that you might find helpful If you have buddies in the Dev side of the house, they might find
this interesting Generally, Dev people don’t do applications that help
administrators Their focus is on user applications
• These two snippets can give you an idea of the potential you have when dealing with statistics and LotusScript
71
Gathering Script Using LotusScript Is Easy
• Here’s an agent that simply issues a Domino server console command Then shows you the value in a MessageBox
• It’s pretty cool for 10 lines of code
Sub Initialize Dim session As New NotesSession Dim vServername As String Dim vConsoleCommand As String Dim vConsoleReturn As String vConsoleCommand = "sho stat server.trans.total“ vServerName = "admin/domlab“ vConsoleReturn = session.sendConsoleCommand(vServerName,vConsoleCommand) MessageBox(vConsoleReturn)End Sub
72
The Mail.TotalPending Statistic
• This stat was introduced in Release 5, and I use it all the time in monitoring servers for mail backing up
• From SPR# BSAW4HFMPY www-304.ibm.com/support/docview.wss?uid=sim43d86a0d3e79
e0e6785256a8500737f2b Added a new Mail.TotalPending statistic that shows the count of
messages pending in mail.box• This statistic is updated once every 5 minutes by the Server task
Does not depend on the Router task for updates• Provides information about total backlog of mail in the event that
the router is hung or not started High value indicates that a mail routing problem needs further
investigation
73
Here’s a Similar Code Snippet That Gets Total Pending Mail
• This is from a much larger agent that runs every 5 minutes on 70 servers
• Remember, LotusScript lets you issue console commands Then, take the results of the command and take other actions
• Our job is to parse out the number 130 from the show stat command Show stat mail.totalpending
• We’re grabbing the stat mail.waiting, which looks like this on the consoleMail.TotalPending = 1301 statistics found
74
Here’s the Meat and Potatoes
• Mail.TotalPending = 1301 statistics found
• Then, it’s being parsed out so that only the number is grabbed vLocStart = InStr(1,vConsoleReturn,"=",5 )+2
Gives location 2 chars past = sign where the number starts vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStart
Gives location of end of number at line feed CHR(13) vStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd)
That’s the number as a string, which is converted to a number
75
Here’s the Meat and Potatoes (cont.)
• Mail.TotalPending = 1301 statistics found
• Here’s a snippet of code that gets you the mail.totalpending statistic
vConsoleCommandPending = "sh stat mail.pending“
'lets ask the console how many messages are pendingvConsoleReturn = session.SendConsoleCommand(vServerName, vConsoleCommandPending)
vLocStart = InStr(1,vConsoleReturn,"=",5 )+2vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStartvStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd)'Print "Pending: " + Str(vMailTotalPending) + " Pending: " + vStatStr
vMailPending = Val(vStatStr)
76
LotusScript and Monitoring/Alerting – A Great Pair of Tools
• You get the advantage of automation with the power of monitoring and alerting
• Stop issues before they become problems• Don’t forget, download the custom statrep Technotics Statrep
8.5.3 from: www.andypedisich.com
77
What We’ll Cover …
• Setting up the foundation for guarding your domain• Working with event generators and event handlers• Selecting a notification method• Customizing recommended actions in Domino Domain Monitoring• Tracking problem servers• Finding and tracking events that show on the console, but not in
the log• Using LotusScript to access server statistics• Wrap-up
78
Where to Find More Information
• www-01.ibm.com/support/docview.wss?uid=swg27008849 Notes/Domino Best Practices: Performance (IBM, 2010).
• www-10.lotus.com/ldd/__00256C3E0030650D.nsf/0/1F2EBFCA1F3 5CA71852571DB00618159?Open Harry Peebles, “Domino Domain Monitoring (DDM) Educational
Resources” (IBM, 2006). • www-01.ibm.com/support/docview.wss?uid=swg21293213
How Does the notes.ini File Parameter ‘server_session_timeout’ Affect Server Performance? (IBM, 2010).
• www.ibm.com/developerworks/lotus/library/domino-server-crashes/ Kiran Bellari, “Troubleshooting Lotus Domino Hangs and
Crashes” (developerWorks, 2006).
79
7 Key Points to Take Home
• Write your own program in LotusScript or formula language and add it to DDM’s corrective actions
• Collect statistics from problem servers by creating a second collecting server in your domain
• Console logs collect everything that happens on the console, including messages from tasks and from NOTES.INI debug parameters
• Check the replica ID for the Events4.NSF in your domain to ensure it is the same on all servers
80
7 Key Points to Take Home (cont.)
• Full Administrator Access is a powerful tool that should be monitored for proper usage
• Event handlers can notify you about any message that appears on the console
• Email is the most widely-used notification system, but is also the most risky
81
Thank You for Attending Our Session!
• Please don’t forget to fill out your evaluations. We read them all!• Please feel free to stop us and ask questions or just have pleasant
conversations
Contact [email protected]
www.technotics.comwww.andypedisich.com
82