intro scope sizing guide

132
Date: 08-2008 Version 8.0 CA Wily Introscope ® Sizing and Performance Guide

Upload: hbabtiwa

Post on 02-Oct-2014

807 views

Category:

Documents


33 download

TRANSCRIPT

Page 1: Intro Scope Sizing Guide

CA Wily Introscope®

Date: 08-2008

Sizing and Performance Guide

Version 8.0

Page 2: Intro Scope Sizing Guide

Copyright © 2008, CA. All rights reserved.

Wily Technology, the Wily Technology Logo, Introscope, and All Systems Green are registered trademarks of CA.

Blame, Blame Game, ChangeDetector, Get Wily, Introscope BRT Adapter, Introscope ChangeDetector, Introscope Environment Performance Agent, Introscope ErrorDetector, Introscope LeakHunter, Introscope PowerPack, Introscope SNMP Adapter, Introscope SQL Agent, Introscope Transaction Tracer, SmartStor, Web Services Manager, Whole Application, Wily Customer Experience Manager, Wily Manager for CA SiteMinder, and Wily Portal Manager are trademarks of CA. Java is a trademark of Sun Microsystems in the U.S. and other countries. All other names are the property of their respective holders.

For help with Introscope or any other product from CA Wily Technology, contact Wily Technical Support at 1-888-GET-WILY ext. 1 or [email protected].

If you are the registered support contact for your company, you can access the support Web site directly at http://support.wilytech.com.

We value your feedback

Please take this short online survey to help us improve the information we provide you. Link to the survey at: http://tinyurl.com/6j6ugb

6000 Shoreline Court, Suite 200South San Francisco, CA 94080

US Toll Free 888 GET WILY ext. 1US +1 630 505 6966Fax +1 650 534 9340Europe +44 (0)870 351 6752Asia-Pacific +81 3 6868 2300Japan Toll Free 0120 974 580Latin America +55 11 5503 6167

www.wilytech.com

Page 3: Intro Scope Sizing Guide

CONTENTS

Table of Contents

Chapter 1 Introscope Sizing and Performance Introduction . . . . . . 9

New and changed features in Introscope 8.0 . . . . . . . . . 10

Agent load balancing . . . . . . . . . . . . . . . . . 10

Agent metric aging. . . . . . . . . . . . . . . . . . 10

Changed Heap Capacity (%) metric . . . . . . . . . . . . 10

Changed Metric Count metric . . . . . . . . . . . . . . 10

Changed way of determining events. . . . . . . . . . . . 11

Changed Number of Inserts metric . . . . . . . . . . . . 11

Changed Overall Capacity (%) metric . . . . . . . . . . . 11

Dynamic instrumentation . . . . . . . . . . . . . . . 11

Enterprise Manager dead metric removal . . . . . . . . . . 11

How to detect metric explosions . . . . . . . . . . . . . 11

Metric clamping . . . . . . . . . . . . . . . . . . . 11

MOM hot failover . . . . . . . . . . . . . . . . . . 12

MOM sizing limits examples. . . . . . . . . . . . . . . 12

New metric for Collector Metrics Received Per Interval . . . . . 12

New metric for Historical Metric Count . . . . . . . . . . . 12

New metric for Number of Historical Metrics . . . . . . . . . 12

New metric for Transaction Traces Dropped Per Interval. . . . . 12

New tab for CPU Overview . . . . . . . . . . . . . . . 12

New tab for Enterprise Manager Overview . . . . . . . . . 13

New tab for Metric Count . . . . . . . . . . . . . . . 13

Ping time threshold properties . . . . . . . . . . . . . . 13

Running multiple Collectors on one machine . . . . . . . . . 13

Scalability . . . . . . . . . . . . . . . . . . . . . 13

SmartStor metadata stored in uncompressed format . . . . . . 14

SQL statements, statement normalizers, and metric explosions . . 14

Contents iii

Page 4: Intro Scope Sizing Guide

CA Wily Introscope

Support for RAID 5 data storage . . . . . . . . . . . . . 14

Transaction Trace component clamp . . . . . . . . . . . 14

Chapter 2 EM Requirements and Recommendations . . . . . . . . . 15

Enterprise Manager overview . . . . . . . . . . . . . . . 17

Enterprise Manager databases . . . . . . . . . . . . . . 20

Factors that affect the Introscope environment . . . . . . . . 20

Factors that affect EM maximum capacity . . . . . . . . . . 21

Differences between EMs and J2EE servers . . . . . . . . . 22

About Introscope system size . . . . . . . . . . . . . . 26

Enterprise Manager health . . . . . . . . . . . . . . . . 27

About the Enterprise Manager Overview tab . . . . . . . . . 27

About EM health and supportability metrics . . . . . . . . . 28

Harvest Duration metric . . . . . . . . . . . . . . . . 29

Number of Collector Metrics . . . . . . . . . . . . . . 30

Collector Metrics Received Per Interval metric . . . . . . . . 31

Converting Spool to Data metric . . . . . . . . . . . . . 32

Overall Capacity (%) metric . . . . . . . . . . . . . . 33

Heap Capacity (%) metric . . . . . . . . . . . . . . . 34

Troubleshooting Enterprise Manager health . . . . . . . . . 35

Additional supportability metrics . . . . . . . . . . . . . 38SmartStor overview . . . . . . . . . . . . . . . . . . 40

About SmartStor spooling and reperiodization . . . . . . . . 40

Report generation and performance . . . . . . . . . . . . 43

Concurrent historical queries and performance . . . . . . . . 43

About SmartStor and flat file archiving . . . . . . . . . . . 43

MOM overview . . . . . . . . . . . . . . . . . . . . 44

Collector overview. . . . . . . . . . . . . . . . . . . 44

Collector metric capacity and CPU usage . . . . . . . . . . 45

About the CPU Overview tab . . . . . . . . . . . . . . 46

Enterprise Manager basic requirements . . . . . . . . . . . 47

Enterprise Manager file system requirements . . . . . . . . 47

EM OS disk file cache memory requirements . . . . . . . . . 47

Enterprise Manager heap sizing . . . . . . . . . . . . . 48

SmartStor requirements . . . . . . . . . . . . . . . . 49

Each EM requires SmartStor on a dedicated disk or I/O subsystem . 49

SmartStor Duration metric limit . . . . . . . . . . . . . 50

iv Contents

Page 5: Intro Scope Sizing Guide

Sizing and Performance Guide

MOM and Collector EM requirements . . . . . . . . . . . . 51

Local network requirement for MOM and Collectors . . . . . . 51

When to run reports, custom scripts, and large queries . . . . . 52

Introscope 8.0 EM settings and capacity . . . . . . . . . . . 53

Estimating Enterprise Manager databases disk space needs . . . 53

SmartStor settings and capacity . . . . . . . . . . . . . . 55

Setting the SmartStor dedicated controller property . . . . . . 55

Planning for SmartStor storage using SAN . . . . . . . . . 57

Planning for SmartStor storage using SAS controllers . . . . . 57

Enterprise Manager thread pool and available CPUs . . . . . . 57

Collector and MOM settings and capacity . . . . . . . . . . . 58

MOM disk subsystem sizing requirements . . . . . . . . . . 58

MOM hardware requirements . . . . . . . . . . . . . . 59MOM to Collectors connection limits . . . . . . . . . . . . 59

MOM to Workstation connection limits . . . . . . . . . . . 60Metric load limit on MOM-Collector systems . . . . . . . . . 60

Configuring a cluster to support 1,000,000 MOM metrics . . . . 61

MOM hot failover . . . . . . . . . . . . . . . . . . 62

Agent load balancing on MOM-Collector systems . . . . . . . 63

Avoid Management Module hot deployments . . . . . . . . . 68

Collector applications limits . . . . . . . . . . . . . . . 69

Collector metrics limits . . . . . . . . . . . . . . . . 69

Collector events limits . . . . . . . . . . . . . . . . 70

Collector agent limits . . . . . . . . . . . . . . . . . 70

Collector hardware requirements . . . . . . . . . . . . . 71

Collector with metrics alerts limits . . . . . . . . . . . . 71

Collector to MOM clock drift limit . . . . . . . . . . . . . 71

Reasons Collectors combine slices . . . . . . . . . . . . 72

Increasing Collector capacity with more and faster CPUs . . . . 73

Standalone EM hardware requirements example . . . . . . . 74

Running multiple Collectors on one machine . . . . . . . . . 74

Chapter 3 Metrics Requirements and Recommendations . . . . . . . 77

Metrics background . . . . . . . . . . . . . . . . . . 78

About metrics groupings and metric matching . . . . . . . . 78

8.0 metrics setup, settings, and capacity . . . . . . . . . . . 79

Matched metrics limits . . . . . . . . . . . . . . . . 79

Contents v

Page 6: Intro Scope Sizing Guide

CA Wily Introscope

Inactive and active metric groupings and EM performance . . . . 80

Performance and metrics groupings using the wildcard (*) symbol . 80

SmartStor metrics limits . . . . . . . . . . . . . . . . 80

Virtual agent metrics match limits . . . . . . . . . . . . 80

About alerted metrics and slow Workstation startup . . . . . . 81

About aggregated metrics and Management Module hot deployments 81

Detecting metrics leaks . . . . . . . . . . . . . . . . . 81

Metrics leak causes . . . . . . . . . . . . . . . . . 82

Finding a metrics leak. . . . . . . . . . . . . . . . . 82

Metrics for diagnosing a metrics leak . . . . . . . . . . . 83

Detecting metric explosions . . . . . . . . . . . . . . . 84

Metric explosion causes . . . . . . . . . . . . . . . . 84

Finding a metric explosion . . . . . . . . . . . . . . . 85

Investigator metrics and tab for diagnosing metric explosions. . . 85

How Introscope prevents metric explosions . . . . . . . . . 91SQL statements and metric explosions . . . . . . . . . . . 92

SQL statement normalizers . . . . . . . . . . . . . . . 94

Enterprise Manager dead metric removal . . . . . . . . . . 96

Metric clamping . . . . . . . . . . . . . . . . . . . 96

SmartStor metadata files are uncompressed . . . . . . . . . 98

Chapter 4 Workstation and WebView Requirements and Recommendations 99

Workstation and WebView background . . . . . . . . . . . 100

8.0 Workstation and WebView requirements . . . . . . . . . 100

OS RAM requirements for Workstations running in parallel . . . . 100

WebView and Enterprise Manager hosting requirement . . . . . 100

8.0 Workstation and WebView setup, settings, and capacity . . . . 101

Workstation to standalone EM connection capacity. . . . . . . 101

Workstation to MOM connection capacity . . . . . . . . . . 102WebView server capacity . . . . . . . . . . . . . . . 103

WebView server guidelines . . . . . . . . . . . . . . . 103

Top N graph metrics limit per Workstation . . . . . . . . . 103

Chapter 5 Agent Requirements and Recommendations . . . . . . . 105

Agent background . . . . . . . . . . . . . . . . . . . 106

About virtual agents . . . . . . . . . . . . . . . . . 106

Agent sizing setup, settings, and capacity . . . . . . . . . . 107

vi Contents

Page 7: Intro Scope Sizing Guide

Sizing and Performance Guide

Agent metrics reporting limit . . . . . . . . . . . . . . 107

About the Metric Count tab . . . . . . . . . . . . . . . 107

Transaction Trace component clamp . . . . . . . . . . . 108

Agent maximum load when disabling Boundary Blame . . . . . 109

Configuring agent heuristics subsets . . . . . . . . . . . 109

Virtual agent metrics match limits . . . . . . . . . . . . 109

Virtual agent reported applications capacity . . . . . . . . . 110

Agents limits per Collector . . . . . . . . . . . . . . . 110

Agent heap sizing . . . . . . . . . . . . . . . . . . 110

High agent CPU overhead from deep nested front-end transactions . 111

Dynamic instrumentation . . . . . . . . . . . . . . . . 112

Appendix A Introscope 8.0 Sizing and Performance FAQs . . . . . . . 113

Appendix B Sample Introscope 8.0 Collector and MOM Sizing Limits by OS 119

Sample Introscope 8.0 Collector sizing limits table . . . . . . . 119

Sample Introscope 8.0 MOM sizing limits table . . . . . . . . . 122

Index . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Contents vii

Page 8: Intro Scope Sizing Guide

CA Wily Introscope

viii Contents

Page 9: Intro Scope Sizing Guide

CHAPTER

1

Introscope Sizing and Performance Introduction

This document contains background, instructions, best practices, and tips for optimizing the sizing and performance of your Introscope 8.0 deployment and environment. Use it in conjunction with the following Introscope 8.0 documentation:

Introscope Configuration and Administration Guide

Introscope Installation and Upgrade Guide

Introscope Java Agent Guide

Introscope .NET Agent Guide

Introscope Overview Guide

Introscope WebView Guide

Introscope Workstation User Guide

For additional information about this product, you can take the CA Wily Technology Education Services class, Introscope®: Enterprise Manager (EM) Capacity Management. For more information, go to http://www.wilytech.com/services/education.html.

In addition, CA Wily Technology Professional Services and Technical Support have service offerings to address specific needs in your application management environment.

Where to get the latest version of this book

You can find the most current version of this book on the CA Wily Community site at https://community.wilytech.com/. Check back periodically to see if the book has been updated.

NOTE: The Wily Community Site is for use by registered members of the Wily User Community. If you need a user account, you can request one at the site.

Introscope Sizing and Performance Introduction 9

Page 10: Intro Scope Sizing Guide

CA Wily Introscope

New and changed features in Introscope 8.0The following sections detail new or changed features in Introscope 8.0 that affect sizing and performance.

Agent load balancing

Introscope 8.0 (8.0 only) agents in a clustered environment can connect to the

MOM and get load-balanced to a Collector. Pre-8.0 agents must connect directly to a Collector. Also, the MOM keeps the metric load balanced between Collectors by ejecting participating 8.0 agents from over-burdened Collectors. A participating agent is one that connected to the MOM. The ejected agents reconnect to the MOM, and are reallocated to under-burdened Collectors. To configure agent load balancing, see the Introscope Configuration and Administration Guide. To understand how agent load balancing affects Introscope performance, see Agent load balancing on MOM-Collector systems on page 63.

Agent metric aging

By default, agent metric aging periodically removes dead metrics from the agent memory cache. This helps prevent metric explosions. See About agent metric aging on page 91.

Changed Heap Capacity (%) metric

The Heap Capacity (%) metric is created when the Enterprise Manager periodically asks the JVM how much maximum heap there is and how much it is currently using (based on the GC Heap: In Use Post GC (mb) metric). Formerly this metric was calculated based on a ratio of current heap total and how much heap is in use. See Overall Capacity (%) metric on page 33 and Heap Capacity (%) metric on page 34.

Changed Metric Count metric

The Metric Count metric Investigator node, which was previously under the Agent Stats node, is now here:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Metric Count

See Metric Count metric on page 85.

10 Introscope Sizing and Performance Introduction

Page 11: Intro Scope Sizing Guide

Sizing and Performance Guide

Changed way of determining events

The way that the Enterprise Manager handles Transaction Trace incoming events has changed, and uses new and changed metrics. See Events and Transaction Traces on page 36.

Changed Number of Inserts metric

The former Data Store|Transactions:Number of Inserts metric was renamed to Data Store|Transactions:Number of Inserts Per Interval. This metric value now shows the number of Transaction Traces placed into the Transaction Trace insert queue during an interval. Previously this metric showed the number of Transaction Traces that were reported to the Enterprise Manager. See Events and Transaction Traces on page 36 and See Collector events limits on page 70.

Changed Overall Capacity (%) metric

The Overall Capacity (%) metric is calculated using an additional value from the CPU Capacity (%) metric value. See Overall Capacity (%) metric on page 33.

Dynamic instrumentation

Introscope uses dynamic instrumentation (also called dynamic ProbeBuilding) to implement new and changed PBDs without restarting managed applications or the Introscope agent. Dynamic instrumentation affects CPU utilization, memory, and disk utilization. See Dynamic instrumentation on page 112.

Enterprise Manager dead metric removal

Starting with Introscope 8.0, when a metric has not produced data for more than eight minutes (default), it is removed from the Investigator tree. See Enterprise Manager dead metric removal on page 96.

How to detect metric explosions

Introscope 8.0 includes a number of new metrics and capabilities to help you detect metric explosions. For more information, see Detecting metric explosions on page 84.

Metric clamping

Several properties that limit, or clamp, the number of metrics on the agent and the Enterprise Manager help to prevent spikes in the number of reported metrics (metric explosions) on the Enterprise Manager. See Metric clamping on page 96.

New and changed features in Introscope 8.0 11

Page 12: Intro Scope Sizing Guide

CA Wily Introscope

MOM hot failover

If the MOM gets disconnected or goes down due to, for example a hardware or network failure, you can configure a second MOM to take over using hot failover. See MOM hot failover on page 62.

MOM sizing limits examples

CA Wily now provides MOM hardware and cluster requirements examples. See Sample Introscope 8.0 MOM sizing limits table on page 122.

New metric for Collector Metrics Received Per Interval

The Collector Metrics Received Per Interval metric is an extremely simple way of gauging how much load metric data queries are placing on the cluster. The Number of Collector Metrics metric is the total sum of Collector metric data points that the MOM has received each 15 second time period, including data queries. See Collector Metrics Received Per Interval metric on page 31.

New metric for Historical Metric Count

The Historical Metric Count metric shows the total number of metrics from an agent that are either live or recently active. The Enterprise Manager uses this metric to decide whether to start clamping more metrics from the agent. For more information, see Historical Metric Count metric on page 88.

New metric for Number of Historical Metrics

A new metric, Number of Historical Metrics, tracks the number of metrics for which Introscope has historical data in SmartStor. For more information, see Number of Historical Metrics metric on page 89.

New metric for Transaction Traces Dropped Per Interval

A new metric, Performance.Transactions.Num.Dropped.Per.Interval, shows the number of Transaction Traces that the Enterprise Manager could not handle during the interval and were dropped. See Events and Transaction Traces on page 36.

New tab for CPU Overview

By viewing the CPU Overview tab you can assess agent CPU health and performance-related statistics in one centralized location. See About the CPU Overview tab on page 46.

12 Introscope Sizing and Performance Introduction

Page 13: Intro Scope Sizing Guide

Sizing and Performance Guide

New tab for Enterprise Manager Overview

By viewing the EM Overview tab you can assess a number of EM health and performance-related statistics and components in one centralized location. See About the Enterprise Manager Overview tab on page 27 and Enterprise Manager Overview tab on page 90.

New tab for Metric Count

By viewing the Metric Count tab you can assess the number and distribution of agent and resource metrics in one centralized location. See About the Metric Count tab on page 107.

Ping time threshold properties

For optimal Workstation response times, Collector ping times should average no higher than 500 ms. Ping times of 10 seconds or longer indicate a slow Collector that may be overloaded. Ping times over the 10 second threshold cause the Enterprise Manager|MOM|Collectors|<host@port>:Connected metric to display a value of 2. You can adjust this threshold for your environment.

In Introscope 8.0, there is an additional ping time threshold of 60 seconds. If the ping time exceeds this value, the MOM automatically disconnects from the Collector associated with the slow ping time. A disconnected Collector causes the Enterprise Manager|MOM|Collectors|<host@port>:Connected metric to display a value of 3. You can adjust this threshold for your environment. See Local network requirement for MOM and Collectors on page 51.

Running multiple Collectors on one machine

By following CA Wily’s guidelines, you can set up multiple Collectors on a single machines. See Running multiple Collectors on one machine on page 74.

Scalability

Introscope 8.0 includes a number of scalability improvements, which are documented across this guide:

Each Collector Enterprise Manager can handle up to 500 K metrics (varies according to hardware) about twice the Introscope 7.x Enterprise Manager metric limit.

Collectors can take advantage of additional CPUs to increase these limits:

number of applications per Collector

number agents per Collector

New and changed features in Introscope 8.0 13

Page 14: Intro Scope Sizing Guide

CA Wily Introscope

number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager).

Each MOM can connect to a five million metric cluster (10 collectors, 500 K metrics per collector), which is a five-fold increase in clustered Enterprise Manager scale.

The MOM now requires more powerful hardware than Collectors. See MOM hardware requirements on page 59.

Support for 50 concurrent Workstation connections

» Important The limits may differ substantially depending on the specific platform and hardware used in your environment.

SmartStor metadata stored in uncompressed format

To increase SmartStor’s speed in reading stored metadata files, starting with Introscope 8.0, all new metadata files are written in an uncompressed format. See SmartStor metadata files are uncompressed on page 98.

SQL statements, statement normalizers, and metric explosions

Metric explosions can be caused by a number of factors, including poorly written and long SQL statements. Introscope includes four SQL statement normalizers to address long SQL statements. The regular expression SQL statement normalizer is new for Introscope 8.0. CA Wily recommends that you use this normalizer before the other normalizers provided with Introscope, as the regular expression SQL statement normalizer allows you to configure regular expressions and normalize any characters or sequence of characters in the SQL statement. See SQL statements and metric explosions on page 92.

Support for RAID 5 data storage

CA Wily now supports Redundant Array of Inexpensive Disks (RAID) 5 for data storage. See Setting the SmartStor dedicated controller property on page 55.

Transaction Trace component clamp

In the case of an infinitely expanding transaction—for example when a servlet executes hundreds of object interactions and backend SQL calls—Introscope clamps the Transaction Trace, resulting in a truncated trace. This helps prevent the JVM from running out of memory. The clamped Transaction Traces are marked as truncated in the Workstation Transaction Trace Viewer. See Transaction Trace component clamp on page 108.

14 Introscope Sizing and Performance Introduction

Page 15: Intro Scope Sizing Guide

CHAPTER

2

EM Requirements and Recommendations

This chapter provides background and specifics to help you understand how to size and tune your Enterprise Manager for good performance. In this chapter you’ll find the following topics:

Enterprise Manager overview . . . . . . . . . . . . . . . 17

Factors that affect the Introscope environment . . . . . . . . . 20

Factors that affect EM maximum capacity . . . . . . . . . . . 21

Differences between EMs and J2EE servers. . . . . . . . . . . 22

Enterprise Manager health . . . . . . . . . . . . . . . . 27

About EM health and supportability metrics . . . . . . . . . . 28

SmartStor overview . . . . . . . . . . . . . . . . . . 40

About SmartStor spooling and reperiodization. . . . . . . . . . 40

Report generation and performance . . . . . . . . . . . . . 43

Concurrent historical queries and performance . . . . . . . . . 43

About SmartStor and flat file archiving . . . . . . . . . . . . 43

MOM overview . . . . . . . . . . . . . . . . . . . . 44

Collector overview . . . . . . . . . . . . . . . . . . 44

Enterprise Manager basic requirements . . . . . . . . . . . . 47

Enterprise Manager file system requirements . . . . . . . . . . 47

EM OS disk file cache memory requirements . . . . . . . . . . 47

Each EM requires SmartStor on a dedicated disk or I/O subsystem . . . 49

SmartStor requirements. . . . . . . . . . . . . . . . . 49

MOM and Collector EM requirements. . . . . . . . . . . . . 51

Local network requirement for MOM and Collectors . . . . . . . . 51

When to run reports, custom scripts, and large queries. . . . . . . 52

Introscope 8.0 EM settings and capacity . . . . . . . . . . . 53

Estimating Enterprise Manager databases disk space needs . . . . . 53

SmartStor settings and capacity . . . . . . . . . . . . . . 55

Setting the SmartStor dedicated controller property . . . . . . . . 55

Collector and MOM settings and capacity . . . . . . . . . . . 58

EM Requirements and Recommendations 15

Page 16: Intro Scope Sizing Guide

CA Wily Introscope

MOM disk subsystem sizing requirements . . . . . . . . . . . 58

MOM to Collectors connection limits . . . . . . . . . . . . . 59

MOM to Workstation connection limits . . . . . . . . . . . . 60

Metric load limit on MOM-Collector systems . . . . . . . . . . 60

Configuring a cluster to support 1,000,000 MOM metrics . . . . . . 61

MOM hot failover . . . . . . . . . . . . . . . . . . . 62

Agent load balancing on MOM-Collector systems . . . . . . . . . 63

Avoid Management Module hot deployments . . . . . . . . . . 68

Collector applications limits . . . . . . . . . . . . . . . 69

Collector metrics limits . . . . . . . . . . . . . . . . . 69

Collector events limits . . . . . . . . . . . . . . . . . 70

Collector agent limits . . . . . . . . . . . . . . . . . . 70

Collector hardware requirements . . . . . . . . . . . . . . 71

Collector with metrics alerts limits . . . . . . . . . . . . . 71

Collector to MOM clock drift limit . . . . . . . . . . . . . . 71

Reasons Collectors combine slices . . . . . . . . . . . . . 72

Increasing Collector capacity with more and faster CPUs . . . . . . 73

Standalone EM hardware requirements example . . . . . . . . . 74

Running multiple Collectors on one machine . . . . . . . . . . 74

16 EM Requirements and Recommendations

Page 17: Intro Scope Sizing Guide

Sizing and Performance Guide

Enterprise Manager overviewThe Enterprise Manager (EM) is an integral component of the Introscope system. An Enterprise Manager is a server that collects, performs calculations on, and stores metrics reported by multiple agents. In a simple Introscope environment such as the one shown in the figure below, one single standalone Enterprise Manager collects, persists, and processes all the agent metrics, then supplies the resultant data for viewing in the Introscope Workstation or WebView browser instances.

Enterprise Manager overview 17

Page 18: Intro Scope Sizing Guide

CA Wily Introscope

In a more complex environment, as shown in the figure below, Enterprise Managers in the role of Collectors can be clustered so that their collected metrics data is compiled in a single Manager of Managers (MOM) Enterprise Manager. The MOM provides a unified view of all the metrics to the connected Workstation and WebView instances.

» Note In cases where the data is specific to a single Enterprise Manager or where clustering makes no difference to the topic, this guide uses the generic term Enterprise Manager. However in some cases, Collectors and MOM Enterprise Managers perform different functions that require different sizing capacity guidelines or result in different performance behaviors. In these cases, the term Collector or MOM is used as appropriate. While the Collector and MOM perform very different functions within a cluster, the system requirements are quite similar with the exception of data persistence, as the MOM persists relatively little data in its role.

18 EM Requirements and Recommendations

Page 19: Intro Scope Sizing Guide

Sizing and Performance Guide

In an Introscope deployment, the agent collects application and environmental metrics and relays them to the Enterprise Manager. Multiple physical agents can be configured into a single virtual agent, which enables an aggregated, logical view of the metrics reported by multiple agents.

To an Introscope Enterprise Manager, an application is an agent-specific association of metrics that is derived from the Java application .war files deployed on the managed J2EE application server. In an Introscope Enterprise Manager Investigator metric tree, applications, which are agent-specific, are found under the Frontends node, as shown in the following figure.

» Note You can have multiple applications running within a single JVM, but you can assign only one Introscope agent per JVM to collect the performance data.

Enterprise Manager overview 19

Page 20: Intro Scope Sizing Guide

CA Wily Introscope

Enterprise Manager databases

The Enterprise Manager writes to three separate databases: SmartStor, Transaction Event database (traces.db) and the metrics baselining (heuristics) database (baselines.db).

Introscope features such as Transaction Tracing, Transaction Trace sampling, and metrics baselining (heuristics) incur additional load on the disk subsystem. For this reason, the Transaction Event database (traces.db) and the metrics baseline (heuristics) database (baselines.db) can be located on the same disk with each other. However, SmartStor MUST be located on a separate dedicated disk or I/O subsystem.

In the default Enterprise Manager installation process, the SmartStor data directory defaults to the target Enterprise Manager installation directory. However, for optimal performance, move the SmartStor data directory to a separate physical disk from the Enterprise Manager installation directory. For heavy-duty, production Enterprise Managers, disk I/O is the primary bottleneck for Enterprise Manager capacity, so CA Wily strongly recommends the use of multiple drives.

For more information, see SmartStor requirements on page 49.

Factors that affect the Introscope environment

The first questions to answer when considering your Introscope environment are: How many Java application server processes do I want to monitor (number of agents)? and How many metrics per server on average (metrics per agent) will be generated? The answers to those questions depend on the complexity of the server and the agent instrumentation settings. For more information, see the Introscope Configuration and Administration Guide.

The capacity of the Enterprise Manager is dependent on the hardware it is running on as well as other complicating factors. For example, one factor is the JVM being used for the Enterprise Manager on the platform under consideration. The Enterprise Manager performs much better when its underlying JVM uses concurrent garbage collection (traditional garbage collection can halt the system when it is busy), and JVMs that support concurrent garbage collection are preferred.

If the CA Wily sizing recommendations are exceeded, the system becomes more likely to suddenly experience sluggish behavior if too many operations all occur simultaneously. You can use the Overall Capacity metric for alerting purposes. For more information, see Overall Capacity (%) metric on page 33. For example, the metrics limit is the number of metrics that can be written safely to the disk I/O system.

20 EM Requirements and Recommendations

Page 21: Intro Scope Sizing Guide

Sizing and Performance Guide

» Important On typical server configurations, the metrics limit is usually the primary limitation on the capacity of the Enterprise Manager. This is a critical factor when sizing an Enterprise Manager.

CPU performance, network bandwidth, and availability of RAM are also influential, but disk I/O seek time is typically the primary bottleneck. In Introscope 8.0, exceeding the limits found in the Sample Introscope 8.0 Collector sizing limits table on page 119 will potentially bring the system to a state where you begin to see performance problems. These problems depend on what is impacted. Overloaded disk I/O typically causes combined time slices and sluggish Workstation refresh times. Lack of RAM causes memory exceptions during spool file conversion, as too many metrics are tracked. Network bandwidth problems cause slow cluster response time, and more rarely, may cause agents to be dropped. Lagging CPU causes performance problems including calculators not updating and alerts to be missed.

Another example, as seen in Sample Introscope 8.0 Collector sizing limits table on page 119, the recommended limit for monitored applications (maximum number of applications) for a Windows-based Enterprise Manager is about 170% of that found on a Solaris machine. In the case of applications, the limit is strongly dependent on the performance characteristics of the CPUs available to the Enterprise Manager, since applications create alerts that must be calculated every time slice.

Factors that affect EM maximum capacity

The maximum capacity of an Enterprise Manager can be reduced by the factors listed in the table below.

Factor Reducing Enterprise Manager Maximum Capacity

For More Information See

SmartStor NOT on a separate disk drive or I/O subsystem

Each EM requires SmartStor on a dedicated disk or I/O subsystem on page 49

If metric groupings are used, exceeds the maximum number of metrics placed in metric groupings.

Matched metrics limits on page 79

Boundary Blame is disabled and maximum loads are not redistributed across all Enterprise Managers.

Sample Introscope 8.0 Collector sizing limits table on page 119

The Enterprise Manager runs at greater than 40-50% average CPU utilization range

Collector metric capacity and CPU usage on page 45

The sum of all metrics behind every TOP N graph viewed by every Workstation instance exceeds 100,000

Top N graph metrics limit per Workstation on page 103

Enterprise Manager overview 21

Page 22: Intro Scope Sizing Guide

CA Wily Introscope

Differences between EMs and J2EE servers

Users who maintain enterprise application servers are accustomed to purchasing hardware that scales well with their applications, and have general understandings of target utilization levels and capacity. Although the Enterprise Manager itself is a Java server, the Enterprise Manager neither behaves nor performs like a typical J2EE server. Therefore it should not be modeled as such when purchasing hardware or performing an Enterprise Manager capacity forecast.

J2EE servers and web applications receive requests for work at irregular intervals, with varying load throughout the day. Therefore the J2EE server only performs as much work as is requested of it in a given interval.

Under standard usage, when incoming user requests come into a J2EE server, the requests are serviced by a pool of worker threads, which perform necessary business logic in servlets and pools of EJBs. The servlets and EJBs in turn make requests to external databases or systems. In well-designed J2EE applications, each of these worker threads is:

largely independent from one another

free to obtain the necessary resources and information needed to satisfy the request

not forced through a common checkpoint for synchronization (although J2EE applications often aren't designed well).

More than 4 concurrent historical queries

are issued against SmartStor.

Concurrent historical queries and performance on page 43

SmartStor is used in conjunction with flat file archiving

About SmartStor and flat file archiving on page 43

Improper sizing is used for Enterprise Managers, Workstations, metrics, and agents

All chapters in this book.

Factor Reducing Enterprise Manager Maximum Capacity

For More Information See

22 EM Requirements and Recommendations

Page 23: Intro Scope Sizing Guide

Sizing and Performance Guide

Therefore, in most situations, application servers scale well in throughput by adding additional CPUs, because each CPU can run additional worker threads to satisfy more requests. Occasionally one request might be slowed down, but whether it takes 100 milliseconds (ms) or 5 seconds doesn't cause the rest of the system to come to a halt. Only in the event of an external bottleneck, such as a database, can all threads come to a halt waiting for data. Eventually the request threads all become busy, and the application server slows to a crawl, maintaining most throughput while rejecting additional requests for work. When the bottleneck is relieved, the system begins to service requests again, and returns to normal.

In contrast, the Enterprise Manager behaves very differently because of its architecture and the nature of the work it performs. Introscope monitors production systems in real time, and provides information, warnings, and alerts in real time. In order to accomplish this, the Enterprise Manager performs as a real time system as well. The Enterprise Manager receives a continual flow of data from agents every 7.5 seconds. Once every 15 seconds, the Enterprise Manager must do all of the following:

examine all of the metric data that it has received for the interval for consistency

perform calculations

perform actions, such as fire alerts or send messages

store the data to disk

respond to Workstation requests for live data

handle incoming events (Transaction Traces, errors, and so on) and persist them.

Enterprise Manager overview 23

Page 24: Intro Scope Sizing Guide

CA Wily Introscope

For the most part, the Enterprise Manager can only use two threads to perform calculations and actions on the large set of agent-generated data, and only a single thread to perform the data storage. If the Enterprise Manager is unable to complete these operations within the 15 second interval, it may fall behind and not catch up with all the processing that needs to be completed because another set of data arrives. The Enterprise Manager then continually combines data or suffers from sluggish performance as it attempts to process and write more data than it can handle. There are internal buffers to allow for bursts of activity so that the Enterprise Manager can catch up, but if the Enterprise Manager has too many metrics being reported, these buffers fill up quickly. The Enterprise Manager is very different from a J2EE server in this regard, because the standard J2EE server does not examine data requests on a regularly scheduled basis to decide what to do with them. The Enterprise Manager's scenario is more similar to the classic factory production conveyor belt analogy, in which a continual set of finished products (data) arrives for two workers to examine. Then the two workers must transfer the product packages (metric data) to a single worker who drives the packaged data in a truck down a single-lane road to a warehouse, where several more workers off load the packages from the truck into storage (SmartStor database).

Because of the nature of the tasks that the Enterprise Manager performs, there are currently limitations in the number of CPUs that the Enterprise Manager can use effectively. A minimum of 2 CPUs are required for optimum performance. However, the use of 4 CPUs increases performance by allowing more of the following:

number of applications per Collector

number agents per Collector

number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager).

More than 4 CPUs do not enhance performance. However, CA Wily recommends faster CPUs because each of the threads can then examine the data much faster. For the maximum limits on 4 CPU Enterprise Managers for matched metrics, see Matched metrics limits on page 79.

24 EM Requirements and Recommendations

Page 25: Intro Scope Sizing Guide

Sizing and Performance Guide

Another difference between J2EE servers and Enterprise Managers is in how they perform data processing. J2EE servers largely perform batch processing, while Enterprise Managers largely perform real-time processing. J2EE applications are batch processors. Work queues up and is handled as quickly as possible. As the machine slows down, the batch processes take longer and longer. In contrast, the Enterprise Manager, which has some batch processing functions (for example, responding to historical data query requests), handles most data flow in real-time. This means that the Enterprise Manager can take whatever time it needs to process incoming data, as long as it finishes within the 15-second harvest duration period. Once the Enterprise Manager takes longer than that time frame, it starts to combine data. Sizing a real-time system can be difficult because you need to size for the maximum load, not the average load on the machine. If you only size for the average load, then during maximum load times you'll lose data.

More ways that Enterprise Managers perform additional work and have limitations that affect performance atypical of standard J2EE systems include:

Introscope Workstations provide different load characteristics than typical Web clients. Workstations allow users to view live data in real time. Depending on the feature or data requested, a Workstation can be a continual tax on the Enterprise Manager even if no user is watching the console, as the Enterprise Manager continues to serve data. In contrast, if a user stops interacting with a browser-based Web application, the data/refresh requests typically stop.

Workstations can perform historical queries for data, which cause the Enterprise Manager to retrieve data from storage. This can interfere with the Enterprise Manager's ability to effectively process and store incoming agent data due to disk contention. J2EE systems don't typically serve requests directly from databases or have disk contention issues.

The Enterprise Manager periodically reorders and reperiodizes stored data. Incoming metric data is written sequentially to a spool file, which is reorganized and indexed once every hour. This reorganization process is a resource expensive (CPU and disk I/O intensive) operation that can interfere with the Enterprise Manager's ability to process and store incoming data. J2EE servers don't typically perform periodic intense housekeeping operations such as reperiodization.

Agents can experience metric leaks over time, without the user knowing, which causes more data to be processed by the Enterprise Manager. Metric leaks occurs when the number of registered metrics being reported by agents is continually increasing. This means that a properly configured system can drift over time into a problem state.

An Enterprise Manager, for all configurations, should run AT MOST within 40% to 50% CPU utilization range in a steady state. This provides the additional headroom necessary for periodic operations, such as SmartStor spooling, reperiodizing, and user Workstation requests (alert requests) that may

Enterprise Manager overview 25

Page 26: Intro Scope Sizing Guide

CA Wily Introscope

saturate the CPU. Typically J2EE systems can be run much closer to saturation because there are no hidden operations that can consume CPU above and beyond steady state. In the event the system is saturated, the J2EE server refuses incoming requests to alleviate the pressure.

No other applications/processes should be running on an Enterprise Manager in order to avoid contention for system resources available to Enterprise Manager.

Enterprise Managers (both Collectors and MOM) queue up incoming data query requests and aggregate the data as it is read in from SmartStor.

About Introscope system size

Introscope system size is determined by workload and business logic.

Introscope workload is comprised of:

total applications monitored

total metrics monitored

total agents monitored

number of Enterprise Managers.

Introscope business logic handles the data collected in the monitoring operations and determines what will be done with the data. Introscope business logic operations include determining or handling the following:

total number of metrics groupings

maximum number of metrics in a metrics groupings

number of metrics persisted per minute

calculators

alerts

management modules containing a lot of dashboards, calculators, alerts, and so on

large numbers of reports

Top N graphs.

26 EM Requirements and Recommendations

Page 27: Intro Scope Sizing Guide

Sizing and Performance Guide

Enterprise Manager health You can monitor and assess Enterprise Manager health in two ways by viewing the:

Enterprise Manager Overview tab (see About the Enterprise Manager Overview tab, below)

Enterprise Manager health and supportability metrics (see About EM health and supportability metrics on page 28)

The Enterprise Manager generates and collects metrics about itself that are useful in assessing its health and determining how well it is performing under its workload. These are sometimes referred to as supportability metrics because these metrics help support the healthy functioning of the Enterprise Manager.

About the Enterprise Manager Overview tab

By viewing the Enterprise Manager Overview tab you can assess a number of Enterprise Manager health and performance-related statistics and components in one centralized location.

To view the Enterprise Manager Overview tab

1 Select the Enterprise Manager node under the Custom Metric Agent.

2 Click the Overview tab in the right pane.

Study these graphs as shown in the figure below.

EM Capacity (%)

EM CPU Utilization

Heap Utilization

Harvest, SmartStor, and GC Durations

Number of Metrics

EM Databases (MB)

Number of Agents

Enterprise Manager health 27

Page 28: Intro Scope Sizing Guide

CA Wily Introscope

Number of Workstations

About EM health and supportability metrics

Enterprise Manager metrics appear in the Investigator tree, under:

Custom Metric Host (Virtual)Custom Metric Process (Virtual)

Custom Metric Agent (Virtual)(SuperDomain)Enterprise Manager

In a clustered environment, the MOM's metrics also appear under the tree path shown above. However, in a clustered environment, Collector supportability metrics show up in the same Custom Metric Host (Virtual) and Custom Metric Process (Virtual) path location, but the last name includes (CollectorHostName@PortNumber).

28 EM Requirements and Recommendations

Page 29: Intro Scope Sizing Guide

Sizing and Performance Guide

The Investigator tree with the MOM and one Collector looks like this:

Custom Metric Host (Virtual)Custom Metric Process (Virtual)

Custom Metric Agent (Virtual)(SuperDomain)Enterprise Manager

Custom Metric Agent (Virtual)(Collector1@5001)(SuperDomain)Enterprise Manager

For more information, see the Introscope Configuration and Administration Guide.

When you deploy Enterprise Managers into your Introscope environment, you'll need to look at the Enterprise Manager health and supportability metrics to find out what's really happening in your monitoring solution.

Harvest duration, Collector Metrics Received Per Interval, SmartStor spool file conversion, and Overall Capacity (%) are several of the more significant indicators of problems in an Enterprise Manager.

For more information, see

Harvest Duration metric on page 29

Collector Metrics Received Per Interval metric on page 31

Converting Spool to Data metric on page 32

Overall Capacity (%) metric on page 33

Additional supportability metrics on page 38.

Harvest Duration metric

The Harvest Duration metric shows the time in milliseconds (during a 15-second time slice) spent harvesting data. It is generally a good indicator in determining whether or not the Enterprise Manager is keeping up with the current workload. You can find this metric at the following location in the Investigator tree, as shown in the figure below.

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Tasks | Harvest Duration (ms)

Enterprise Manager health 29

Page 30: Intro Scope Sizing Guide

CA Wily Introscope

The Harvest Duration metric value should be less than 3000 ms [3 seconds] and should not exceed 7,500 ms [7.5 seconds]. The harvest operation usually causes the CPU activity to spike for the full harvest duration and the CPU is often almost idle for the rest of the 15 seconds. If the harvest duration is too long, investigate reducing the metric load on the overloaded Enterprise Manager by having agents report to separate Enterprise Managers or consider moving the Enterprise Manager to a platform with faster CPUs.

Number of Collector Metrics

The Number of Collector Metrics metric shows the total number of metrics currently being tracked in the cluster. You can find the Number of Collector Metrics metric here in the Investigator tree:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | MOM | Number of Collector Metrics.

Here’s the Harvest Duration metric location.

30 EM Requirements and Recommendations

Page 31: Intro Scope Sizing Guide

Sizing and Performance Guide

Collector Metrics Received Per Interval metric

The Collector Metrics Received Per Interval metric is an extremely simple way of gauging how much load metric data queries are placing on the cluster. This metric is the total sum of Collector metric data points that the MOM has received each 15-second time period, including data queries. You can find the Collector Metrics Received Per Interval metric here in the Investigator tree:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | MOM | Collector Metrics Received Per Interval

» Tip Consult this metric regularly.

A large Collector Metrics Received Per Interval metric value, coupled with degradation of the cluster, indicates that the MOM has been asked to read too much metric data from the Collectors. This overloading is the result of some combination of the following:

too many Workstations connected

too many queries (especially historical queries) being run

user alerts and calculators set up to evaluate too many metrics

Although all resource loading issues combine to affect overall cluster performance, a large Collector Metrics Received Per Interval metric value, which reflects too many metric reads, is a different than a metric explosion (see Detecting metric explosions on page 84), which is the result of too many metric writes by the agents. This means, in particular, that reducing metric load on your Collectors may not solve issues on the MOM related to a high Collector Metrics Received Per Interval metric value.

If your Collector Metrics Received Per Interval value seems too high, check the number of Workstations attached, and that most are in Live mode. If this fails to solve the issue, you should check to make sure you do not have alerts set up to evaluate too many metrics in the system. You can do this by searching and sorting by the value all metrics named:

Enterprise Manager | Internal | Alerts: Number of Evaluated Metrics

Enterprise Manager health 31

Page 32: Intro Scope Sizing Guide

CA Wily Introscope

If Collector Metrics Received Per Interval value continues to remain high after carrying out the suggestions above, you can also set the introscope.enterprisemanager.query.datapointlimit property in the EnterpriseManager.properties file to specify a maximum number of metric data points the Enterprise Manager will return from any single query. This read clamp ensures that user queries that accidentally match too much metric data do not negatively impact system performance.

» Important Clamping the Collector metrics prevents cluster degradation, but queries and alerts that are clamped do not fully evaluate all metrics they match.

Converting Spool to Data metric

The Converting Spool to Data metric tracks whether or not the spool to data conversion task is running. You can find this metric at the following location in the Investigator tree:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Data Store | SmartStor | Tasks | Converting Spool to Data

When this task is running, the metric has a value of 1. When this task is not running, it has a value of 0. If this metric stays at a value of 1 for more than 10 minutes per hour, this indicates that reorganizing the SmartStor spool file is taking too long. This problem is often progressive. As the spooling time gets longer hour after hour, the Enterprise Manager usually becomes noticeably less responsive overall because the Enterprise Manager is putting more and more effort into reorganizing the spool file.

For better performance, add more physical memory (RAM) to the machine. Adding more RAM can help increase the size of OS disk file cache and should reduce the amount of time the conversion task takes. The amount of RAM that will help varies between operating systems, however a good general rule is to dedicate 1 GB RAM for the OS disk cache. In general at full load, you should configure a Collector to use 1.5 GB heap memory. If you are running a MOM near maximum capacity (for example, a 5 million metric cluster or 1 million subscribed MOM metrics), the MOM must run on a 64-bit JVM with a 12 GB heap size. The machine must have physical RAM of at least 14 GB. For more information, see Configuring a cluster to support 1,000,000 MOM metrics on page 61.

32 EM Requirements and Recommendations

Page 33: Intro Scope Sizing Guide

Sizing and Performance Guide

Additionally, a server host typically requires approximately 500 MB for the operating system (this varies based on hardware and OS). When SmartStor starts the re-spooling operation, the operating system starts reading the spool file into the file cache memory (which is part of the OS, not the Enterprise Manager Java virtual machine). If reading 200,000 metrics into memory, for example, the spool file will usually be over 1.5 GB. For optimum performance the file cache should be large enough to accommodate the entire spool file. So the host machine should have between 3 and 4 GB of physical RAM. Windows machines that are 32 bit use a fixed file cache limited to approximately 1 GB, whereas UNIX systems generally have a configurable file cache limit. This must be physical memory not virtual memory (swap space). Enterprise Manager performance degrades dramatically if the host machine starts paging to and from virtual memory.

For more information about the converting spool to data task, see About SmartStor spooling and reperiodization on page 40.

Overall Capacity (%) metric

The Enterprise Manager Overall Capacity (%) metric estimates the percentage of the Enterprise Manager’s capacity that is consumed. You can find it at this location in the Investigator tree:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager: Overall Capacity (%)

The Overall Capacity (%) metric is computed in part from the following metrics, which you can find at this location in the Investigator tree:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Health

CPU Capacity (%) (added into the computation in Release 8.0). See Additional

supportability metrics on page 38 Harvest Capacity (%) See Additional supportability metrics on page 38.

Heap Capacity (%) See Heap Capacity (%) metric, below.

Incoming Data Capacity (%) See Additional supportability metrics on page 38.

SmartStor Capacity (%). See Additional supportability metrics on page 38.

Enterprise Manager health 33

Page 34: Intro Scope Sizing Guide

CA Wily Introscope

The Overall Capacity (%) metric is more valuable over a long period of time rather than for a specific 15-second time slice. Since the Overall Capacity metric is based on real-time metrics, you may see the Overall Capacity value spike quite a bit higher than 100% because, for example, the hardware's I/O subsystem could be briefly overloaded. However, the Enterprise Manager tends to recover from these spike situations automatically if they are not long-lasting. In general, a spike (for example, to 200%) isn't cause for concern if it's only for a brief moment, but over a long period of time, the Overall Capacity should ideally average about 75%. However, generally if the Overall Capacity value is 50%, then you should be able to double the load (+/- 15%) to get see a 100% capacity value.

» Note SmartStor hourly and nightly conversion times are not factored into the Overall Capacity metric, however hourly and nightly operations do affect how much metric load the Enterprise Manager is capable of handling.

During time periods that the Overall Capacity (%) metric spikes to high values (for example 600%), at least one of the other metrics listed above should also show a spike. Investigating and understanding the source of the secondary spike might help pinpoint the root cause of the resource issue.

For example, the problem might be found by looking at the Heap Capacity (%) metric, which feeds into Overall Capacity (%) metric. See Heap Capacity (%) metric, below.

Heap Capacity (%) metric

The Heap Capacity (%) metric is determined by what percentage of heap the JVM is currently using (based on the GC Heap: In Use Post GC (mb) metric).

» Note A 25% buffer remains when the Heap Capacity (%) metric reports 100% and when the actual heap would be at 100%. For example, if the total heap is 1000 MB and the current heap usage is 750 MB, then this metric value is 100%. This buffer is included because Java needs heap space for normal operations.

Depending on how you’ve set and launched the JVM with heap options, the JVM may start with a very small heap but grow it over time. The Heap Capacity (%) metric is based on the current JVM heap size, not what the heap size could become. CA Wily recommends that you set the Introscope heap settings so that heap min equals heap max.

34 EM Requirements and Recommendations

Page 35: Intro Scope Sizing Guide

Sizing and Performance Guide

Troubleshooting Enterprise Manager health

Every 15 second the Enterprise Manager gathers and records health metrics about itself. There are two ways you can view these metrics to troubleshoot Enterprise Manager health performance:

examine the Enterprise Manager health and supportability metrics in the Investigator tree. For more information see About EM health and supportability metrics on page 28.

examine the perflog.txt file.

» Related Knowledge Base article(s):

Perflog Values in Introscope 7.1

The Investigator tree Enterprise Manager health and supportability metrics are easy to view and interpret, so this is first place you should look to understand your Enterprise Manager’s current health. Perflog.txt is often valuable to CA Wily Support.

Several examples of how you can use the perflog.txt file are provided in the topics below.

Harvest Duration

You can find the Harvest.HarvestDuration metric value in perflog.txt, as shown in the figure below.

» Note This figure shows perflog.txt output in verbose mode. By default, perflog.txt is generated in a compacted mode.

Enterprise Manager health 35

Page 36: Intro Scope Sizing Guide

CA Wily Introscope

SmartStor Duration

You can find the Smartstor.Duration metric value in perflog.txt as shown in the figure below.

» Note This figure shows perflog.txt output in verbose mode. By default, perflog.txt is generated in a compacted mode.

Events and Transaction Traces

The Enterprise Manager attempts to insert all incoming events into a Transaction Trace insert queue. The number of events in the queue at any time is shown in the Performance.Transactions.TT.Queue.Size metric.

If the Transaction Trace insert queue is not full, an incoming event is counted by the performance.transaction.num.inserts.per.interval metric.

If the Transaction Trace insert queue is full when a new event comes in, the event is dropped. For Introscope 8.0, you can view a new metric, Performance.Transactions.Num.Dropped.Per.Interval that shows the number of Transaction Traces that the Enterprise Manager could not handle during the interval and were dropped.

You can find these metric values in perflog.txt, as shown in the figure below.

36 EM Requirements and Recommendations

Page 37: Intro Scope Sizing Guide

Sizing and Performance Guide

If you want to know how many events the Enterprise Manager received from agents for an interval, add the performance.transaction.num.inserts.per.interval metric plus the Performance.Transactions.Num.Dropped.Per.Interval metric.

Although one would expect the values for the performance.transaction.num.inserts.per.interval metric and Performance.Transactions.TT.Queue.Size metric for an interval to be identical, that is generally not the case due to these factors:

metric counts are based on frequent samples of the system

samples of these two metrics are not taken at the same time

the system is very active (numeric counts vary quickly and greatly)

If, for example, at one sample time the number of inserted events is 500, this implies that the Transaction Trace insert queue should have a positive value and you would expect to see a value of 500 as well for the Performance.Transactions.TT.Queue.Size metric. However, by the time the Transaction Trace insert queue is sampled, it can be empty and record a sample number of zero.

Enterprise Manager health 37

Page 38: Intro Scope Sizing Guide

CA Wily Introscope

Additional supportability metrics

There are a number of supportability metrics to help you monitor the help of your system and Enterprise Manager. See the table below for brief descriptions. See the Introscope Configuration and Administration Guide for more information.

Supportability metric name

Investigator tree location Description

CPU Capacity (%) Enterprise Manager|Health Same as EM CPU Used (%) (see below). Duplicated to easily relate to Overall Capacity (%) metric, which now takes into account this metric.

Number of Agents Enterprise Manager | Connections The number of currently connected agents.

The Enterprise Manager's perflog.txt file records and reports the number of actual agents connected in the Agent.NumberOfAgents metric value.

EM CPU Used (%) Enterprise Manager|CPU The percent of the total available CPU was used by running Enterprise Managers during the time period specified.

Note: This number does not reflect other processes running on the server or overall server CPU in use, but rather how much CPU the particular Enterprise Manager used. This metric is acquired from the JVM using an API introduced in the JDK 1.5. Therefore, it is supported only on some platforms.

Harvest Capacity (%) Enterprise Manager|Health Percent of time needed for the data harvest in a 15000 ms (15 second) time slice, where 100% is the full 15 seconds. For example, if the data harvest takes 15000 ms, then this metric value is 100.

Incoming Data Capacity (%)

Enterprise Manager|Health The capacity of the Enterprise Manager to handle incoming data, based on an internal metric that indicates the number of incoming metrics yet to be processed. This internal metric is divided by twice the total number of metrics. For example, if 150,000 metrics are in the to-be-processed queue and the Enterprise Manager has a total of 300,000 metrics, the incoming data capacity will be 25%.

38 EM Requirements and Recommendations

Page 39: Intro Scope Sizing Guide

Sizing and Performance Guide

Number of Metrics Enterprise Manager|Connections The metric load on an Enterprise Manager. When an agent disconnects, this number drops.

SmartStor Capacity (%) Enterprise Manager|Health Percent of time needed for the SmartStor write process in a 15000 ms (15 second) time slice, where 100% is the full 15 seconds. For example, if the SmartStor write duration is 15000 ms, then this metric value is 100.

Write Duration (ms) Data Store|SmartStor|MetaData The duration of SmartStor Capacity (%) metric time (see above) spent writing metadata.

If this metric value doesn’t change proportionately as the SmartStor Capacity (%) metric value increases or decreases, there may be an issue with the file system.

Supportability metric name

Investigator tree location Description

Enterprise Manager health 39

Page 40: Intro Scope Sizing Guide

CA Wily Introscope

SmartStor overviewIntroscope 7.1 included significant optimizations in disk read/write synchronization that take advantage of a dedicated SmartStor disk. All performance improvements and sizing increases starting with Introscope 7.1 depend on those optimizations.

SmartStor writes to disk data supplied from agents sent to the Enterprise Manager/Collector first, and performs all other operation after that. For example, if 10 users are running large historical queries (over 1000 metrics/query) at the same time, an Enterprise Manager performs more slowly. The users experience sluggish Workstation response time is because SmartStor is simultaneously writing new agent metric data, running extensive user queries, doing reports, and converting files to the faster query file format. The Workstation queries are slow (or metric data is aggregated) due to the disk being overloaded.

About SmartStor spooling and reperiodization

SmartStor writes live incoming data to disk in a spool format that is fast to write, but slow to query. Every hour at the top of the hour SmartStor takes the spool file from the previous hour and reformats the file into a SmartStor data file. The SmartStor data file, which is faster and easier to search than the spool file, optimizes historic query responses. This Introscope process, which is referred to as spool to data conversion (or conversion), typically takes 10 minutes. However, conversion times on different hardware perform differently due to memory, CPU power, and disk read/write speeds. A conversion time longer than 10 minutes is a potential warning sign of an overloaded Enterprise Manager. Most importantly, the conversion time should not be getting longer every hour. This is a sure sign that the system is becoming overloaded and often indicates a metric creep, in which the number of registered metrics being reported by agents is continually increasing.

The most common cause of excessively long SmartStor spool to data conversion times is a file cache size that is too small to perform the required operations. This situation can be addressed by adding more physical memory. The conversion process is usually the first process to show problems if SmartStor is not using a dedicated I/O subsystem.

SmartStor reperiodization is the process by which archived data files are compressed to reduce the total size of the SmartStor directory. Reperiodization is performed in two stages after midnight by default. For information about how to configure this multi-tier reperiodization, see the Introscope Configuration and Administration Guide.

40 EM Requirements and Recommendations

Page 41: Intro Scope Sizing Guide

Sizing and Performance Guide

Reperiodization is both I/O and CPU intensive, as the data archive files are read, the data is compacted by aggregating multiple time slices, and then the resulting data is written back to SmartStor. This means that the period after midnight is the busiest time for an Enterprise Manager. The entire reperiodization process should not take more than two hours. During this time, no other Enterprise Manager operation such as report generation (see Report generation and performance on page 43) or OS-level operation should be scheduled.

» Note If the Enterprise Manager is stopped in the middle of reperiodization, it will, upon restart, delete the partially written files and restart reperiodization after 45 minutes. This restart may not occur during the regularly scheduled reperiodization time. The 45 minute delay allows the system to register all its agents and metrics before launching the restart of this compute-intensive reperiodization task.

SmartStor spooling and reperiodization can be verified in the Enterprise Manager log in verbose mode, which records that the spooling process starts at the top of the hour. Under standard conditions, within 10 minutes, a second recorded message reports that the spooling process has completed. In addition there are three SmartStor management metrics, which you can find at this location in the Investigator tree:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Data Store | SmartStor | Tasks.

As shown in the figure below, the three tasks that are monitored are:

Spool to Data Conversion

Data Appending

Reperiodization

SmartStor overview 41

Page 42: Intro Scope Sizing Guide

CA Wily Introscope

These tasks have metric values that oscillate from 0 to 1 when the respective task is running. You can see when those tasks are running and how long they are taking by selecting a task in the tree, then picking an appropriate time from the Time Range drop down list in the Viewer pane.

Top of the hour problems are generally related to slow SmartStor spooling. Early morning (after 6 A.M.) problems are usually due to reperiodization not being completed quickly enough. This usually implies that the Enterprise Manager is excessively loaded. For more information, see EM OS disk file cache memory requirements on page 47.

42 EM Requirements and Recommendations

Page 43: Intro Scope Sizing Guide

Sizing and Performance Guide

Report generation and performance

Generating Introscope reports is very expensive in terms of CPU and disk access. The cost is primarily based on two factors:

the number of graphs (total amount of data)

the report time period (historical range)

Reports that are either larger than 50 graphs or longer than 24 hours should not be scheduled during the hours when SmartStor is reperiodizing (usually midnight to 3:00 A.M.) because of high CPU activity and the large amount of disk activity.

Concurrent historical queries and performance

The best way to avoid disk performance problems from historical queries is to have most Introscope Workstation users view data in Live mode. Use Historical mode only for in-depth analysis, like troubleshooting and reports. On systems under heavy metric load, make sure that users are not all attempting to perform historical queries (which attempts to access the SmartStor historical archive) at the same time. CA Wily recommends a maximum of four concurrent historical queries, although this limit may differ depending on the performance of your hardware. You should also be aware that this limit decreases during spool-to-data file conversion at the top of each hour, and at midnight during reperiodization.

You can also set the introscope.enterprisemanager.query.datapointlimit property in the EnterpriseManager.properties file to specify a maximum number of metric data points the Enterprise Manager will return from any single query. This read clamp ensures that user queries that accidentally match too much metric data do not negatively impact system performance.

About SmartStor and flat file archiving

The flat file archiving is an alternate format that can be used for metric data storage instead of SmartStor. Unlike SmartStor, flat file format writes the data in readable ascii format, which is considerably more expansive than SmartStor’s format. If you use flat file archiving, when archiving, you have the option of configuring flat file data to be gzipped. This reduces the amount of disk space needed considerably, but is CPU intensive to write and extremely CPU intensive to read.

CA Wily has three recommendations about SmartStor and flat file archiving.

SmartStor overview 43

Page 44: Intro Scope Sizing Guide

CA Wily Introscope

First, avoid using SmartStor and flat file archiving at the same time. Flat file archiving duplicates some of the functionality of SmartStor. In addition, flat file archiving’s compression feature (if enabled) requires noticeable CPU resources that can adversely affect the Enterprise Manager’s performance when the compression feature periodically runs. In the event that flat file archiving must be used, configure the smallest possible number of metrics to be logged.

Second, do not use flat file archiving in production. Readable metric values are most useful in a QA debug environment.

Third, SmartStor should not be located on the same disk as a flat file archive. SmartStor should be on its own dedicated disk. For more information, see SmartStor settings and capacity on page 55.

MOM overviewMOMs are CPU intensive, in contrast to Collectors, which are I/O and CPU intensive. For more information about MOM requirements, see MOM and Collector EM requirements on page 51 and Collector and MOM settings and capacity on page 58.

Collector overviewCollectors are I/O intensive, and perform most of Introscope's difficult and intensive calculation processing work.

Cluster performance is dominated by the Collectors. Given the synchronous communication model between MOM and Collectors, the responsiveness of a MOM (in terms to data refresh to the Workstation) is related to responsiveness of the Collectors. Any performance problems causing response problems in the Collector will be magnified by the MOM. For more information see, Collector to MOM clock drift limit on page 71.

If upgrading a Collector from 6.x to 8.0, as long as there is a dedicated disk for SmartStor and Boundary Blame is turned on, there should be enough resources left over on the same host to handle the new functionality including metric baselining (heuristics) and creating virtual agents. If you need to migrate a 6.x Enterprise Manager to become an 8.0 Collector, see

» Related Knowledge Base article(s):

Migrating a 6.x Enterprise Manager to an 8.0 Collector (KB 1630)

44 EM Requirements and Recommendations

Page 45: Intro Scope Sizing Guide

Sizing and Performance Guide

Collector metric capacity and CPU usage

If a Collector is at maximum capacity, as shown in the Sample Introscope 8.0 Collector sizing limits table on page 119, you may look at the CPU and the system doesn't appear busy. See You may wonder why Introscope requirements don't allow adding more metrics or agents to the system.

The reason is that CPU monitoring tools show a snapshot. The behavior of the Collector is 100% CPU usage for 3-4 seconds (at full load), and then idle until the next agent data processing. This happens every 7.5 seconds, which is how the 45% average CPU utilization recommendation is derived. The initial 3-4 seconds is the harvest time, recorded as the Harvest Duration metric and it must be less than 4 seconds. For more information about the Harvest Duration metric, see Enterprise Manager health on page 27.

The time between harvests allows the Collector to service Workstations, perform Transaction Traces, and handle SmartStor spooling and reperiodization. Unless you're looking at a high resolution CPU/Memory/I/O trace of the Collector between 12:00 midnight and 3:00 A.M., you can't get a true picture of a Collector's resource usage.

At midnight the usage pattern of everything the Collector does changes dramatically because it's about to start reperiodization. At that point, the Collector gets very busy and typically CPU utilization jumps to 80% to 90%.

Also, if your CPU monitoring tool is sampling or averaging CPU snapshots over an interval longer than one second, you may not see the intense activity spikes that can cause the Collector to back up and run into problems.

There are certain operations that can easily saturate the Collector's CPU, such as Transaction Tracing, large numbers of connected Workstations, large numbers of events, large historical queries, and large reports. The Collector must have additional headroom in order to handle those peaks of activity, or else it will fall behind in its processing tasks, resulting in undesirable system behavior.

While a Collector's CPU usage may not look busy at one point in time, it will look busy if you turn on a large Transaction Trace or if you connect 10 more Workstations, or run a big historical query. That's why CA Wily recommends so much additional CPU headroom.

On average, you can't have any more than 40% steady state usage because there are too many other operations that can immediately cause the Collector to use 100% CPU. At that point you'll start to see Workstation sluggishness and combined time slices.

Collector overview 45

Page 46: Intro Scope Sizing Guide

CA Wily Introscope

About the CPU Overview tab

By viewing the CPU Overview tab you can assess agent CPU health and performance-related statistics in one centralized location.

To view the CPU Overview tab

1 Select the CPU node under the agent.

2 Click the Overview tab in the right pane.

Study the CPU Utilization graph as shown in the figure below.

46 EM Requirements and Recommendations

Page 47: Intro Scope Sizing Guide

Sizing and Performance Guide

Enterprise Manager basic requirementsThere are several basic requirements for every Enterprise Manager.

Typically an Enterprise Manager needs 2 to 4 CPUs depending on the hardware platform.

More CPUs will not improve performance. An Enterprise Manager with fewer CPUs than recommended results in the system performing poorly.

All Enterprise Managers need a minimum of 3 GB OS RAM to effectively run at anything close to full load.

EVERY Collector Enterprise Manager must have a dedicated disk I/O subsystem for SmartStor with no other processes competing for it.

After those basic requirements, system performance is determined by the speed of the CPUs, the speed of the I/O subsystems, and the file cache performance.

» WARNING The recommendations for maximum metrics/Enterprise Manager, agents/Enterprise Manager, physical memory, and so on, should be strictly followed. If you are seeing less CPU utilization than the recommended maximum threshold (at full metrics load), it is NOT a reason to add additional load (above CA Wily recommendations) to the Collector. In general, metrics load is highly I/O bound rather than CPU intensive, so even with CPU cycles available, the Enterprise Manager can get I/O bound on metric data and the whole system can start slowing down.

Enterprise Manager file system requirements

Make sure that the file system used for Enterprise Manager files baselines.db, and traces.db is a local disk and not a network file system (NFS). Otherwise, serious performance degradation can result.

EM OS disk file cache memory requirements

How much OS memory does each Enterprise Manager need? At full load, it's typically 1.5 GB of JVM heap space allocated to the Enterprise Manager process in JVM properties, but on top of that there must be OS memory - physical RAM -- for at least another 1 GB free over and above the requirements for the OS. The CA Wily recommendation is a minimum of 3 GB for a system running an Enterprise Manager; preferably 4 GB.

» Note If you are running a MOM near maximum capacity (for example, a 5 million metric cluster or 1 million subscribed MOM metrics), the MOM must run on a 64-bit JVM with a 12 GB heap size. The machine must have physical RAM of at least 14 GB. For more information, see Configuring a cluster to support 1,000,000 MOM metrics on page 61.

Enterprise Manager basic requirements 47

Page 48: Intro Scope Sizing Guide

CA Wily Introscope

If your hardware allows it, CA Wily recommends running the OS in 64-bit mode to take advantage of the large file cache. The file cache is important for the Enterprise Manager when doing SmartStor maintenance like spooling and reperiodization. This cache resides in physical RAM, and is dynamically adjusted by the OS during runtime based on available physical RAM. Therefore, our recommendation is for 4 GB RAM.

As general guidance, each Enterprise Manager should have about 1.5 GB of OS file cache available in its memory.

Top of the hour problems are usually related to SmartStor spooling which are best addressed by additional physical memory, especially disk file cache. The biggest single influencing factor for SmartStor spooling is the file cache size. Typically, 32-bit Windows allows a file cache just under 1 GB, and typically SmartStor spooling files for a full load are closer to 2 GB. That difference in size causes performance pressure. In providing a larger OS file cache, you are providing a large enough Enterprise Manager file cache to allow the OS to read the entire spool file into memory, then process the profile and dump it straight back out into the SmartStor archive as a data file.

Enterprise Manager heap sizing

The appropriate Enterprise Manager heap settings depend on your Enterprise Manager OS, hardware, and the metric load. The Enterprise Manager GC parallel flag you’ll need to set also depends on the Enterprise Manager OS version.

In the heap settings examples below, note that when the total number of metrics that the Enterprise Manager monitors changes, the heap settings also change.

Enterprise Manager Hardware(OS Version)

RAM (GB)

Total Metrics Monitored

Example Enterprise Manager GC Flag Settings

2x2.8Ghz Xeon HT

(Win 2K Adv Server)

2 90,000 lax.nl.java.option.additional=-server -Xms512m -Xmx512m -showversion -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:NewSize=128m -XX:MaxNewSize=128m -XX:PermSize=64m

2x2.8Ghz Xeon HT

(Win 2K Adv Server)

3 210,000 lax.nl.java.option.additional=-server -Xms800m -Xmx800m -showversion -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:NewSize=128m -XX:MaxNewSize=128m -XX:PermSize=64m

48 EM Requirements and Recommendations

Page 49: Intro Scope Sizing Guide

Sizing and Performance Guide

If you are operating a high-performance Introscope environment, contact CA Wily Professional services for the appropriate Enterprise Manager JVM heap settings.

SmartStor requirements

Each EM requires SmartStor on a dedicated disk or I/O subsystem

In Introscope 7, significant performance improvements were made in SmartStor that freed up CPU resources for other features such as virtual agents, calculators, Transaction Tracing and sampling, and applications with associated heuristic calculations (baselining). What matters to SmartStor is concurrent I/O throughput and how many disk spindles are servicing those requests. Having SmartStor on a second dedicated disk is required to take advantage of these enhancements.

Point the SmartStor location to a separate dedicated disk or disk-array than the Transaction Event database (traces.db) and metrics baseline (heuristics) database (baselines.db). Verify that the SmartStor file persistence is actually going to that different disk. Ensuring that the SmartStor data directory is on its own disk is the top solution to many Introscope performance issues.

When SmartStor is not on its own dedicated disk, the first indication that there is a problem is when there are SmartStor spooling problems. For more information, see About SmartStor spooling and reperiodization on page 40.

» Note For information about a spreadsheet to help you determine your SmartStor disk requirements, see the Introscope Configuration and Administration Guide.

2x2.8Ghz Xeon HT

(Win 2K Adv Server)

3 400,000 lax.nl.java.option.additional=-server -Xms1400m -Xmx1400m -showversion -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:NewSize=128m -XX:MaxNewSize=128m -XX:PermSize=64m

2x2.8Ghz Xeon HT

(Win 2K Adv Server)

4 500,000 lax.nl.java.option.additional=-server -Xms1500m -Xmx1500m -showversion -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:NewSize=128m -XX:MaxNewSize=128m -XX:PermSize=64m

Enterprise Manager Hardware(OS Version)

RAM (GB)

Total Metrics Monitored

Example Enterprise Manager GC Flag Settings

SmartStor requirements 49

Page 50: Intro Scope Sizing Guide

CA Wily Introscope

SmartStor Duration metric limit

The Enterprise Manager collects health metrics about itself. Starting with Introscope 7, the Smartstor Duration metric value is recorded, which tracks how long it takes SmartStor to write data during every 15-second metric harvest cycle.

As shown in the figure below, you can view the SmartStor Duration metric in this location in the Investigator tree:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Tasks | Smartstor Duration (ms).

Under standard Enterprise Manager conditions, the average Smartstor Duration value should be less than 3500 ms (3.5 sec). The Smartstor Duration value MUST be less than 15,000 ms (15 sec). If this metric value is greater than 15 seconds this indicates a critically overloaded EM. For more information, see Enterprise Manager health on page 27 and the Introscope Configuration and Administration Guide.

Here’s the SmartStor Duration metric location.

50 EM Requirements and Recommendations

Page 51: Intro Scope Sizing Guide

Sizing and Performance Guide

MOM and Collector EM requirementsSome sizing requirements and performance issues apply to any Enterprise Manager, be it a MOM or a Collector. However, some sizing requirements and performance issues apply specifically to the MOM or to the Collector because MOMs are CPU intensive and Collectors are I/O and CPU intensive. The following topics describe general Enterprise Manager requirements as well as EM- and Collector-specific requirements.

Local network requirement for MOM and Collectors

Whenever possible, a MOM and its Collectors should be in the same data center; preferably in the same subnet. Even when crossing through a firewall or passing through any kind of router, the optimal response time is difficult to maintain. If the MOM and Collector are across a router or, worse yet, a packet-sniffing firewall protection router, response time can slow dramatically. To keep the cluster operational, the MOM drops any Collector that has any of these conditions:

appears unresponsive through the network for more than 60 seconds (see information about the ping time threshold below)

indicates its system clock has skewed more than three seconds from the MOM’s clock (see Collector to MOM clock drift limit on page 71).

For optimal Workstation responsiveness, the ping metric, which is reported by the MOM for each Collector each time slice, should be less than 500 ms.

» Note The Introscope ping metric monitors only the lower boundary of the round-trip response time from the MOM to each Collector. This ping time is not the same as the network ping time, which is the sending of an ICMP echo request and getting an echo response.

To view the ping metric, use the Search tab to view metrics named “ping” in the supportability metric section of the Investigator tree. You will find a ping metric reported for each Collector.

Ping times of 10 seconds or longer indicate a slow Collector that may be overloaded. Ping times over the 10 second threshold cause the Enterprise Manager|MOM|Collectors|<host@port>:Connected metric to display a value of 2. You can adjust this threshold for your environment by changing the introscope.enterprisemanager.clustering.manager.slowcollectorthreshold property in the IntroscopeEnterpriseManager.properties file. For more information, see the Introscope Configuration and Administration Guide.

MOM and Collector EM requirements 51

Page 52: Intro Scope Sizing Guide

CA Wily Introscope

In Introscope 8.0, there is an additional ping time threshold of 60 seconds. If the ping time exceeds this value, the MOM automatically disconnects from the Collector associated with the slow ping time. This prevents the entire cluster from hanging, which is a side effect of when one Collector in a cluster is greatly underperforming.

A disconnected Collector causes the Enterprise Manager|MOM|Collectors|<host@port>:Connected metric to display a value of 3. You can adjust this threshold for your environment by changing the introscope.enterprisemanager.clustering.manager.slowcollectordisconnectthreshold property in the IntroscopeEnterpriseManager.properties file. For more information, see the Introscope Configuration and Administration Guide.

» Tip You can set an alert on the Enterprise Manager|MOM|Collectors|<host@port>:Connected metric value. For more information on creating and configuring alerts, see the Introscope Configuration and Administration Guide.

When a Collector disconnects from the MOM, the metric flow from that Collector to the MOM stops. This means you will see a data gap in the Workstation metric reporting. However, the Collector is still gathering and persisting the agent metrics. When the Collector reconnects to the MOM, you can run a historical query to see the metrics reported during the disconnected period.

When to run reports, custom scripts, and large queries

In order to avoid sluggish Workstation response times and general poor performance, it's important that reports, custom scripts, and large queries not be run when the Collector is scheduled to do heavy processing. Introscope schedules heavy CPU and disk I/O intensive tasks at the top of each hour (the spooling process) and during reperiodization, which is typically scheduled to occur daily between midnight and 3 A.M. During these times, do not run or schedule other heavy processing.

52 EM Requirements and Recommendations

Page 53: Intro Scope Sizing Guide

Sizing and Performance Guide

Introscope 8.0 EM settings and capacityThe topics below describe EM-related settings and capacity limits required to set up, maintain, and configure your Introscope 8.0 environment.

Estimating Enterprise Manager databases disk space needs

In planning for your Introscope environment, you may wonder “How much disk space do I need for all my Introscope databases?” To answer this question, you’ll need to calculate your disk space needs for the three databases in which Introscope stores data: SmartStor, traces.db, and baselines.db.

SmartStor, which should be on its own dedicated disk, is used to store metric data coming from agents. For more information, see SmartStor overview on page 40. The SmartStorSizing8.0.x.y.xls spreadsheet, which is located in the <Introscope_Home>/examples directory, can help you determine your SmartStor disk space requirements. For information about using the spreadsheet, see the Introscope Configuration and Administration Guide.

traces.db contains all Transaction Traces and events data, such as error snapshots. This database spans multiple files. One file is created per day and this data is kept for the number of days specified in the IntroscopeEnterpriseManager.properties file. In the example file snippet below, the daily file is stored for 14 days.

introscope.enterprisemanager.transactionevents.storage.max.data.age=14

baselines.db stores all of the Introscope metrics baselining (heuristic) data in a single file.

The traces.db and baselines.db databases collect and maintain data at different rates. Therefore, to determine the database disk space needs for your Enterprise Manager you will have to perform disk space calculations for traces.db and baselines.db separately, then sum the two calculations.

Introscope 8.0 EM settings and capacity 53

Page 54: Intro Scope Sizing Guide

CA Wily Introscope

traces.db disk space calculation example

Estimating the disk space needed for your Introscope traces.db file starts by answering the questions “How many events do I want to keep?” “How many days do I want to keep these events?”. Once you’ve answered these questions, determining the disk space needed involves the mathematical calculations shown in the example below. By substituting the Total days to store data and Events/day values with the values you’ve determined for your system, you can estimate your Enterprise Manager’s traces.db disk space requirements.

Introscope agent average bytes/event = 4096

» Note The approximate default average size of an event when stored on disk is 4 K.

Total days to store data = 36

Events/day = 1000 events/minute x 60 min/hr x 24 hrs/day = 1,440,000

» Note This is the maximum load number for events/day based on 1,000 events/min, which is an example maximum load for a Windows machine. See the Sample Introscope 8.0 Collector sizing limits table on page 119 for other maximum load examples.

Bytes required/day = 4096 (bytes/event) x 1,440,000 events/day = 5,898,240,000

GB required/day = 5,898,240,000 bytes required/day/(1024 x 1024 x 1024) =5.49 GB

Total disk space required = 36 (total days to store data) x 5.49 GB required/day = 198 GB

baselines.db disk space calculation example

The baselines.db file should rarely exceed 2 GB. Estimating the disk space needed for your Introscope baselines.db file starts by answering the questions “How many nodes are on my Enterprise Manager?” and “How many agents are reporting data to my Enterprise Manager?” Once you’ve answered those questions, determining the disk space needed involves the mathematical calculations shown in the example below. By substituting the number of nodes on your Enterprise Manager, which depend on application count and Blamed backend count, and the number of agents, you can estimate your Enterprise Manager’s baselines.db disk space requirements.

54 EM Requirements and Recommendations

Page 55: Intro Scope Sizing Guide

Sizing and Performance Guide

This baseslines.db calculation example makes the following assumptions:

» Note The number below are only examples; they are NOT provided as recommendations for any or all Introscope environment(s).

Nodes/each Overview dashboard = 100 (which is very big)

Heuristics/node = 3

Objects generated by each heuristic (in steady state) = 2 (objects/hr/heuristics) x 24 hrs/day x 7 days/week = 336 objects/week

100 nodes x 3 heuristics/node x 336 objects/week = ~100,000 baseline objects/agent/week

NOTE: Baselines roll over at weekly boundaries. Every baseline is stored for 30 minute increments across a week. Once you roll into the next week, the baseline data is loaded from the last week and then is updated with this week’s data.

# agents reporting data to the Enterprise Manager = 200

Baseline objects/agent/week = 100,000 (from the calculation above)

Bytes/baseline object = 100

MB/agent = 100,00 baseline objects/agent/week x 100 bytes/object = 10 MB/agent/week

The baselines.db file size = 10 MB/agent x 200 agents = 2 GB.

SmartStor settings and capacity

Setting the SmartStor dedicated controller property

Starting in Introscope 7.1, there is a dedicated controller property that tells the Collector that there is a dedicated SmartStor disk.

» Note Only the Collector requires a dedicated SmartStor database and dedicated controller. MOM machines also have a SmartStor instance, but due to the vastly smaller metrics load, are able to house the SmartStor instance on the same disk as other MOM components.

In IntroscopeEnterpriseManager.properties (the Enterprise Manager properties file), the property can be seen as:

introscope.enterprisemanager.smartstor.dedicatedcontroller=true

Providing a separate disk for each SmartStor AND setting the dedicated controller property to true affects the total number of metrics an Enterprise Manager can handle because these allow for better sharing of disk resources. This allows for a number of performance enhancements including:

SmartStor settings and capacity 55

Page 56: Intro Scope Sizing Guide

CA Wily Introscope

larger virtual agents can be created.

agents can report a larger number of applications.

more calculators can be used.

more Management Module logic is possible.

Workstation responsiveness is faster.

The dedicated controller property is set to false by default. You MUST provide a dedicated disk for SmartStor in order to set this property to true; it cannot be set to true if there is only a single disk for each Collector. The reason is that with a single disk for Collector operations AND SmartStor, context switching would be performed on the disk level (rather than the software level). This could cause severe Collector and possibly OS performance problems.

When the dedicated controller property is set to false, the Collector assumes that there is one disk for all Enterprise Manager operations, and therefore uses one disk-writing lock. This means that only one area at a time is written. For example, the Collector will write only to SmartStor or only to the heuristics database that supports the Investigator Overview dashboard.

Performance disadvantages to having the dedicated controller property set to false are:

Only one I/O task can be running at a time.

SmartStor writes are in shorter segments.

The disk's seek pointer is invalidated after each context switch.

If there is a second disk for SmartStor, but the property is set to false, there is no performance gain by having a second disk for SmartStor.

Collector sizing recommendations are reduced by 50%.

When the dedicated controller property is set to true, the Collector uses two locks: one lock is dedicated to SmartStor, and the second lock is for everything else. Performance advantages to setting the dedicated controller property to true include:

SmartStor I/O tasks can run concurrently with other I/O tasks, which improves the Enterprise Manager’s overall metric-handling ability.

SmartStor can write in larger segments.

The seek pointer remembers its last write placement.

The lock dedicated to SmartStor reduces interruption from the metrics (heuristics) database (baselines.db), which stores metrics baseline data.

For instructions about how to set the SmartStor dedicated controller property to true, see the Introscope Configuration and Administration Guide.

56 EM Requirements and Recommendations

Page 57: Intro Scope Sizing Guide

Sizing and Performance Guide

If Redundant Array of Independent Drives/Disks (RAID) configuration is desired, CA Wily recommends RAID 0 or RAID 5. Each SmartStor database MUST reside on its own dedicated RAID setup.

All the restrictions above apply to all the varied storage choices available (local disks, external storage solutions such as SAN, and so on). The SmartStor requirement for a separate disk/controller DOES NOT mean that a separate host adapter (such as fiber channel adapter, SCSI adapter, and so on) is required. It only means that a separate dedicated, physical disk or RAID setup is used for each SmartStor database.

To determine if a machine being considered for use as SmartStor is a single dedicated disk or drive, you may need to determine if the machine has multiple controllers (same as multiple hard drives). It's important to understand that multiple partitions on the same drive share a controller, which is not an appropriate environment for the SmartStor instance. You can use commands like du (for disk usage) on Unix/Linux or Windows Device Manager to determine whether two drives are logically different or physically different. It's critical that the drives are physically different.

Planning for SmartStor storage using SAN

If you plan to use SAN for SmartStor storage, then each logical unit number (LUN) requires a dedicated physical disk. If you have configured two or more LUNs to represent partitions or subsets of the same physical disk, this does not meet the requirements needed for SmartStor’s dedicated disk.

Planning for SmartStor storage using SAS controllers

If you plan to use a serial-attached SCSI (SAS) controller for SmartStor storage, what you are considering is using a host bus adapter (HBA) with multiple channels that operate simultaneously. Each full duplex channel is known as a SAS port; each port transfers data at 3 Gb (gigabits) per second. One SAS controller can be used for the Enterprise Managers that store both SmartStor as well as the traces.db and baseline.db data. What’s important is to have a dedicated disk for SmartStor; in this case meaning that SmartStor has its own dedicated SAS port.

Enterprise Manager thread pool and available CPUs

The Enterprise Manager has a pool of threads that do the work of harvesting metrics every 15 seconds. The size of the Enterprise Manager thread pools are based on hosting one Enterprise Manager per machine. However, if there is more than one Enterprise Manager running on a single host machine, this results in an excessive number of threads.

SmartStor settings and capacity 57

Page 58: Intro Scope Sizing Guide

CA Wily Introscope

For example, if you are running five Enterprise Managers on an 8 CPU quad core machine, each Enterprise Manager bases the size of its thread pools on the 32 available CPUs. This configuration can reduce throughput due to context switching as the threads from all five Enterprise Managers contend for the 32 available CPUs.

In Introscope 8.0, the Enterprise Manager properties file (IntroscopeEnterpriseManager.properties) includes the new available processors (CPUs) property to tell the Enterprise Manager how many processors (CPUs) it can expect to have available:

introscope.enterprisemanager.availableprocessors=

See the Introscope Configuration and Administration Guide for more information about setting this property.

To continue the example, in the case where there are five Enterprise Managers on a host machine with 32 CPUs, you would allocate six processors for each Enterprise Manager. You’d then set the available processors property to six as shown:

introscope.enterprisemanager.availableprocessors=6

Collector and MOM settings and capacityThe topics below describe the Collector and MOM settings and capacity limits required to set up, maintain, and configure your Introscope 8.0 environment.

MOM disk subsystem sizing requirements

The MOM requires more powerful hardware (CPUs) than Collector with the exception of the disk subsystem (the MOM usually performs little I/O). For some examples, see the Sample Introscope 8.0 MOM sizing limits table on page 122.

The MOM does have a SmartStor instance, which persists metrics generated by virtual agents, as well as alert states and calculator results from the calculated metrics and supportability metrics from its peer Collectors. However, the MOM doesn't need a second dedicated disk for SmartStor, because the lesser metric load being reported to the MOM doesn't require a second disk.

58 EM Requirements and Recommendations

Page 59: Intro Scope Sizing Guide

Sizing and Performance Guide

MOM hardware requirements

The maximum load for a MOM-Collector system is 5,000,000 (5 million) metrics. Although a MOM can connect to Collectors representing a total of 5,000,000 metrics, in each 15-second time slice, the MOM itself can only receive 1,000,000 (one million) metrics total. All or none of these 1,000,000 MOM metrics can be associated with calculators and alerts (metrics associated with calculators and alerts are called subscribed metrics). For more information about subscribed metrics, see About metrics groupings and metric matching on page 78.)

In order to handle this load, the MOM requires more powerful hardware (CPUs) than Collector with the exception of the disk subsystem (the MOM usually performs little I/O). For some examples, see the Sample Introscope 8.0 MOM sizing limits table on page 122. For more information about subscribed metrics, see About metrics groupings and metric matching on page 78. If you are running a MOM near maximum capacity (for example, a 5 million metric cluster or 1 million subscribed MOM metrics), the MOM must run on a 64-bit JVM with a 12 GB heap size. The machine must have physical RAM of at least 14 GB. For more information, see Configuring a cluster to support 1,000,000 MOM metrics on page 61.

MOM to Collectors connection limits

CA Wily recommends using the smallest number of Collectors needed to accommodate the number of agents providing metrics. The maximum load depends on the hardware used, as shown in the Sample Introscope 8.0 MOM sizing limits table on page 122. The more Collectors to which a MOM is connected, the more complicated the system becomes and the greater likelihood for instability or failure. Performance issues that arise as Collectors are connected to a MOM include the following:

MOM to Collector clock sync issues may be more difficult to manage. For more information, see Collector to MOM clock drift limit on page 71.

» Important You must run time server software to synchronize at regular intervals the clocks of all the machines in the cluster. Time server software synchronizes the time on a machine with either internal time servers or internet time servers.

the system may take longer to start up.

increased likelihood that a misbehaving Collector affects the entire cluster.

» Note In a clustered environment, a single Collector that is performing poorly can make it appear as if the entire cluster is performing poorly.

For these reasons, a single MOM should be connected to a maximum of 10 Collectors.

Collector and MOM settings and capacity 59

Page 60: Intro Scope Sizing Guide

CA Wily Introscope

It is important to ensure that every Collector is running smoothly because any individual nonresponsive Collector causes the entire system to lock up until the Collector either responds, drops its connection, or the MOM times it out (see Local network requirement for MOM and Collectors on page 51). This is because SmartStor data is held on the Collectors, not on the MOM. So to retrieve query or alert information, the MOM must wait for every Collector to respond with its portion of the result before sending the combined query or alert data response back to the Workstation. The Workstations, in turn, are delayed waiting for the MOM's compiled data to display. The responsiveness of a cluster is therefore the response of its slowest connected Collector. In contrast, a single standalone Enterprise Manager has no outside dependencies.

MOM to Workstation connection limits

For information about this topic, see Workstation to MOM connection capacity on page 102.

Metric load limit on MOM-Collector systems

The maximum load for a MOM-Collector system is 5,000,000 metrics. Although a MOM can connect to Collectors representing a total of 5,000,000 metrics, in each 15-second time slice, the MOM itself can only receive 1,000,000 metrics total (hardware dependent). All or none of these 1,000,000 MOM metrics can be associated with calculators and alerts (Metrics associated with calculators and alerts are called subscribed metrics. For more information about subscribed metrics, see About metrics groupings and metric matching on page 78.) If that number of calculators and alerts is exceeded, the Collector cluster start up time can become slow.

In planning for your system’s metric load, CA Wily recommends that a MOM not be connected to more than 10 Collectors. For more information, see MOM to Collectors connection limits on page 59. In addition, each MOM can handle a maximum metrics load no greater than that recommended in the Sample Introscope 8.0 MOM sizing limits table on page 122.

For example, let’s say you are planning an environment comprised of a MOM and nine Windows-based Collectors. According to the Sample Introscope 8.0 Collector sizing limits table on page 119, each Collector can handle a maximum of 500,000 metrics. However, if you choose to max out each Collector at 500,000 metrics each, you can connect ten Collectors to the MOM to total 5,000,000 metrics in the MOM-Collector system.

If however, your environment will typically generate 1,000,000 metrics, you could set up one Collector to handle 200,000 metrics and the remaining eight Collectors to handle 100,000 each (totalling 800,000 metrics), for a MOM-Collector system total of 1,000,000 metrics.

60 EM Requirements and Recommendations

Page 61: Intro Scope Sizing Guide

Sizing and Performance Guide

Or you could set up four Collectors to handle 200,000 metrics each (totalling 800,000 metrics) and the remaining five Collectors to handle 40,000 metrics each (totaling 200,000 metrics), for a MOM-Collector system total of 1,000,000 metrics.

In order for Introscope to support 1,000,000 metrics on the MOM, you need to configure the MOM and meet specific JVM requirements on each clustered Collector. See Configuring a cluster to support 1,000,000 MOM metrics on page 61.

Configuring a cluster to support 1,000,000 MOM metrics

In order for Introscope to support 1,000,000 metrics on the MOM, you need to configure the IntroscopeEnterpriseManager.properties files on each Collector in the cluster. In addition you need to meet specific JVM requirements on the MOM and all the clustered Collectors.

To configure the cluster to support 1,000,000 MOM metrics

1 Run the MOM on a 64-bit JVM with 12 GB heap size. The machine must have physical RAM of at least 14 GB.

2 On each Collector in the cluster, configure the IntroscopeEnterpriseManager.properties file.

a On each Collector machine in the cluster, go to the <Introscope_Home>/config directory and open the IntroscopeEnterpriseManager.properties file.

b Add this transport property value as shown:

transport.outgoingMessageQueueSize=4000

c Save and close the IntroscopeEnterpriseManager.properties file.

3 Run each Collector in the cluster on a 32-bit JVM with 1.5 GB heap size.

The required Collector configuration as well as MOM and Collector JVM sizing requirements are complete. For MOM sizing examples, see Sample Introscope 8.0 MOM sizing limits table on page 122. For Collector sizing examples, see Sample Introscope 8.0 Collector sizing limits table on page 119.

Collector and MOM settings and capacity 61

Page 62: Intro Scope Sizing Guide

CA Wily Introscope

MOM hot failover

If the MOM gets disconnected or goes down due to, for example a hardware or network failure, you can configure a second MOM to take over using hot failover. In configuring MOM hot failover you have the choice of setting up two MOMs to both act in a primary role (peers), or you can configure one MOM each to act in primary and secondary (backup) roles in case of failure.

In this network topology, the two MOMs share a single Introscope installation, and the MOMs and the Introscope installation are on three different machines. The Introscope installation can reside on a Storage Area Network (SAN) file system or can be shared using a Network Attached Storage (NAS) protocol such as Network File System (NFS) or Server Message Block (SMB).

For information on configuring MOM hot failover, see the Introscope Configuration and Administration Guide.

» Note Hot failover is intended for primarily for the MOM because the MOM is a single point of failure in the Introscope clustering architecture. The agent load balancing feature provides fault tolerance for the Collectors in a cluster. In the case of a Collector failure, agents reconnect to the MOM and are redirected to other Collectors. See Agent load balancing on MOM-Collector systems on page 63.

Alternative method of configuring agent to MOM hot failover

When trying to connect to an Enterprise Manager, the agent tries all the IP addresses for a given host name. Therefore, if you have defined a host in DNS with the IP addresses of the primary and secondary MOMs, then you don’t need to configure failover in the agent. You can instead specify the host name and the agent connects to whichever MOM is running.

Configuring Workstation log-in for hot failover

When trying to connect to an Enterprise Manager, the Workstation tries all the IP addresses for a given host name. Therefore, if you have defined a host in DNS with the IP addresses of the primary and secondary MOMs, then you can specify the host name and the Workstation connects to whichever MOM is running. For information about administering Workstation, see the Introscope Workstation User Guide.

62 EM Requirements and Recommendations

Page 63: Intro Scope Sizing Guide

Sizing and Performance Guide

Agent load balancing on MOM-Collector systems

In Introscope 8.0, the MOM uses agent load balancing to balance the metric load between Collectors in a clustered environment. The MOM equalizes the metric count among the Collectors by ejecting participating 8.0 agents from over-burdened Collectors. The ejected agents reconnect to the MOM, then are reallocated to under-burdened Collectors.

» Note Agent load balancing is not the same as a metric clamp. Agent load balancing is carried out by the MOM, which disconnects and connects agents to specific Collectors based on the current metric load. A metric clamping is a limit, or clamp, on the number of metrics on the agent and the Enterprise Manager help to prevent spikes in the number of reported metrics (metric explosions) on the Enterprise Manager. For more information on metric clamps, see Metric clamping on page 96.

Setting up agent load balancing requires these high-level steps, which include three key configurations: the metric weighting factor, the metric load threshold, and how often the MOM rebalances agents:

Step 1 Determining how MOMs assign agents to Collectors on page 64

Step 2 Setting up the agent load balance metric weight load on page 64

Step 3 Setting the agent load balance metric threshold on page 65

Step 4 Setting the agent load balance interval property on page 66

To configure agent load balancing, see the Introscope Configuration and Administration Guide.

Collector and MOM settings and capacity 63

Page 64: Intro Scope Sizing Guide

CA Wily Introscope

Determining how MOMs assign agents to Collectors

Several factors determine how the MOM assigns an agent to a Collector.

Connection type

An agent is only assigned to a Collector that supports the same connection type that the agent uses to connect to the MOM. For example, if the agent connects to the MOM using HTTP, then the Collector must have enabled HTTP connections.

Configuration done in the loadbalancing.xml file

You fill out the loadbalancing.xml file to restrict agents to a specific set of Collectors, or exclude agent from a specific set of Collectors. For more information, see the Introscope Configuration and Administration Guide.

Agent connection history with a specific Collector

To prevent an explosion of SmartStor data as 8.0 agents are transferred from one Collector to another in the cluster, if an 8.0 agent has connected to a Collector previously, the MOM favors that Collector for future connections unless there is an alternative Collector that is underloaded or a favored Collector is overloaded.

» Note Pre-8.0 agents do not connect to the MOM; instead, they must connect directly to a Collector.

Setting up the agent load balance metric weight load

When agent load balancing is configured, the MOM allots 8.0 agents to Collectors

based on weight-adjusted load. You can adjust the weighting factor of individual Collectors in a cluster to improve performance. The number of metrics that a Collector can handle, which determines a Collector’s relative power in a cluster, is determined by a number of factors including CPU power and memory, number of applications, and network speed. Therefore, when setting up agent load balancing you can use the weighting factor to ensure that the MOM assigns fewer metrics to less powerful Collectors. You can help avoid cluster performance problems by setting up the metric weight load so that the more powerful Collectors handle the bigger metric loads.

You set the weight load introscope.enterprisemanager.clustering.login.em1.weight property in the IntroscopeEnterpriseManager.properties file as described in the Introscope Configuration and Administration Guide.

» Note In the introscope.enterprisemanager.clustering.login.em1.weight property name, em1 is an arbitrary identifier. Each Collector has a unique identifier. Provide an appropriate identifier for your environment.

64 EM Requirements and Recommendations

Page 65: Intro Scope Sizing Guide

Sizing and Performance Guide

The value of the introscope.enterprisemanager.clustering.login.em1.weight property is a positive number that controls the relative load of the Collector. If the factors affecting how the MOM assigns agents to a Collector (see Determining how MOMs assign agents to Collectors on page 64) are not providing a different agent connection decision, then the weight load on a specific Collector divided by the total weight load of the cluster is the percentage of the metric load assigned to that Collector.

The MOM then uses weight-adjusted metric counts when assigning agents to Collectors and when rebalancing the agent metric load. For example, a MOM connects to three Collectors that all have zero metrics currently being reported. If Collector A has a weight of 150, Collector B has a weight of 100 and Collector C has a weight of 50, then the MOM assigns metrics to Collectors A, B, and C approximately in the ratio of 3:2:1.

Setting the agent load balance metric threshold

You must set the introscope.enterprisemanager.loadbalancing.threshold property in the IntroscopeEnterpriseManager.properties file, to configure the cluster’s tolerance, or threshold, for imbalance. This is another agent load balancing property that affects performance.

The default introscope.enterprisemanager.loadbalancing.threshold property setting is 20,000 metrics, which means that a Collector would have to be 20 K metrics out of balance before the MOM rebalances the agents. A Collector is out of balance if it is either under or over the weight-adjusted cluster average by more than the threshold. See the Agent load balancing usage scenarios on page 66 for some examples.

To properly configure the introscope.enterprisemanager.loadbalancing.threshold property, choose a number of metrics that prevents the MOM from constantly reallocating agents. When the MOM disconnects an agent from one Collector and assigns it to another, overhead is added to the cluster. When load rebalancing is needed, the added overhead is fine. However, unnecessary rebalancing adds unnecessary overhead to the cluster, which can diminish system performance. When a cluster is a little unbalanced, there is not a negative effect on performance because there is a certain amount of flux. An appropriate introscope.enterprisemanager.loadbalancing.threshold property level is a value at which the MOM brings agents into balance by making fewer, but larger, adjustments, which is better for system performance than more, but smaller, adjustments.

Collector and MOM settings and capacity 65

Page 66: Intro Scope Sizing Guide

CA Wily Introscope

Setting the agent load balance interval property

You must set the introscope.enterprisemanager.loadbalancing.interval property in the IntroscopeEnterpriseManager.properties file to tell the MOM how often to check the cluster for possible rebalancing. The default is 600 seconds (ten minutes) and the minimum is 120 seconds (two minutes). Consider the actual needs of your system when setting the interval value. Every time the MOM checks for rebalancing, it must take an action. Then the system needs time to adjust to the changes that occurred due to the rebalancing. If the MOM is set to rebalance again too soon, the system won’t have adapted to the previous changes. For this reason and because the MOM must handle a basic workload that requires time to carry out, CA Wily does not recommend you use the minimum load balance interval of 120 seconds.

Agent load balancing usage scenarios

Here are some examples to help you understand how agent load balancing operates in a clustered environment.

Agent load balancing when a Collector is added

A MOM connects to Collectors A and B. There are 36,000 metrics being reported to Collector A and 30,000 metrics reported to Collector B. You set the metric threshold to 10,000 metrics using the introscope.enterprisemanager.loadbalancing.threshold property in the IntroscopeEnterpriseManager.properties file. Collector C, which has 24,000 metrics being reported to it, is added to the cluster. The MOM does not rebalance the metric load since there is an average of 30,000 metrics per Collector (36,000 + 30,000 + 24,000)=90,000 metrics/3 Collectors) and none of the Collectors differs from 30,000 by more than the 10,000 metric threshold.

Agent load balancing when a Collector fails

A MOM connects to Collectors A, B, and C. There are 36,000 metrics being reported to Collector A, 30,000 metrics to Collector B, and 24,000 metrics to Collector C. Collector A fails and the agents that reported to Collector A re-connect to the MOM. The MOM redirects approximately 15,000 metrics to Collector B and 21,000 to Collector C. Now Collectors B and C both have 45,000 metrics being reported to them.

66 EM Requirements and Recommendations

Page 67: Intro Scope Sizing Guide

Sizing and Performance Guide

Agent load balancing when a Collector restarts

When Collector A recovers after failure, the cluster is unbalanced. There should be an average of 30,000 metrics per Collector being reported to each, yet Collectors B and C have 45,000 metrics being reported to both of them, which is 15,000 metrics above average. This exceeds the threshold of 10,000 metrics, so the MOM ejects agents with a total of 15,000 metrics each from Collectors B and C and redirects all 30,000 metrics to Collector A. This results in 30,000 metrics being reported each to Collectors A, B, and C.

Agent load balancing with weights

A MOM connects to Collectors A, B, and C. The threshold is set to 10,000. There are 24,000 metrics reporting to Collector A, 30,000 to Collector B, and 36,000 to Collector C, making a total of 90,000 metrics. The cluster has an average of 30,000 metrics per Collector. You have set the weight of Collector A to 150, Collector B to 100, and Collector C to 50. The average weight is 100 (the sum of the weights divided by the number of Collectors or 150 + 100 + 50/3).

Each Collector should have a metric load proportional to its relative weight. Since Collector A has a weight of 150, it should therefore have 45,000 metrics (its weight is 50% above average so its metric load should be 50% above the 30,000 metric average). Collector B has an average weight and therefore should have the average metric load, or 30,000 metrics. Collector C has a weight 50% of the average and therefore should have 50% of the average metric load, or 15,000 metrics.

Based on these relative weights and metric averages, the cluster is unbalanced. Collector A is underloaded because it is under the weight-adjusted average by more than the threshold (24,000 - 45,000= -21,000). Collector B is perfectly balanced because at 30,000 metrics, its metric load is equal to its weight-adjusted average. Collector C is overloaded because it is over the weight-adjusted average by more than the 10,000 metric threshold (36,000 - 15,000= 21,000). The MOM rebalances the cluster by ejecting agents with 21,000 metrics from Collector C and redirecting them to Collector A.

Collector and MOM settings and capacity 67

Page 68: Intro Scope Sizing Guide

CA Wily Introscope

Agent load balancing when a Collector is added to the cluster

You determine that your cluster of three Collectors A, B, and C is overloaded.

You dynamically add Collector D by adding a set of connection properties to the MOM’s IntroscopeEnterpriseManager.properties file. In addition, you have set the metric threshold for all the Collectors in this cluster to 10,000 metrics using the introscope.enterprisemanager.loadbalancing.threshold property in the IntroscopeEnterpriseManager.properties file.

The average metric load is 22,500 metrics reporting each to Collectors A, B, and C, while Collector D has zero metrics being reported to it. The MOM rebalances the cluster because the difference between the average load of 22,500 metrics and zero metrics is greater than the threshold of 10,000 metrics. The MOM rebalances 7,000 metrics each from Collectors A, B, and C and redirects all 21,000 metrics to be reported to Collector D. This results in 23,000 metrics being reported to Collector A, 23,000 metrics to Collector B, 23,000 metrics to Collector C, and 21,000 metrics to Collector D.

Agent load balancing when agents prefers a specific Collector

If you have an agent that prefers a particular Collector, connect the agent directly to that favored Collector. This prevents the MOM from ejecting the agent, because the MOM only rebalances agents that were redirected to the Collector by the MOM.

Avoid Management Module hot deployments» WARNING Do not perform Management Module hot deployments on

production Collectors or MOMs.

A Management Module hot deployment locks the system and also prevents the metric data from being reported. Hot deployment of virtual agents and Management Modules is very CPU intensive and can lock up Collectors for a couple of minutes during which metric harvest does not happen. This can happen if you change the virtual agent definitions or redeploy Management Modules in the MOM or Collector; the consequence can be that the cluster stops responding to Workstation users for extended periods.

CA Wily strongly recommends not doing Management Module hot deployments on production Collectors and MOMs. You may perform a hot deployment during development and when developing Management Modules. However, if you are working with a large fully loaded Enterprise Manager or a large cluster, avoid performing a Management Module hot deployment, as it is likely that the system will freeze.

For more information about virtual agents, see the Introscope Java Agent Guide.

68 EM Requirements and Recommendations

Page 69: Intro Scope Sizing Guide

Sizing and Performance Guide

Collector applications limits

In Introscope 8.0 under full metric load, an individual Collector can accommodate the total number of applications from all agents (depending on hardware) as shown in Sample Introscope 8.0 Collector sizing limits table on page 119. An overloaded Enterprise Manager starts to combine metrics, so once you approach the maximum number of applications limit, add a new Collector and break up the metric load.

The calculation of application heuristics are very CPU intensive on the Collector. CA Wily recommends x86 architectures for two reasons: the higher clock speed and the ability to execute individual threads faster (faster response times). In contrast, RISC architectures are better at executing threads in parallel (greater throughput).

Introscope architecture changed greatly in version 7. The new architectural paradigm dictates that the Collector limit for applications monitored with the Overview dashboard is now stated as a total number rather than an average per agent. For information about finding applications in your Introscope environment, see Enterprise Manager overview on page 17. For information about the Overview dashboard, see the Introscope Workstation User Guide.

Collector metrics limits

For metrics limit examples by hardware type, see the Sample Introscope 8.0 Collector sizing limits table on page 119.

One indication that an Enterprise Manager is overloaded is that it starts to combine metric time slices. When this happens, a message appears in the Enterprise Manager log at the Warning level saying that SmartStor isn't keeping up with live data. In addition, there is a another message in the Enterprise Manager log in verbose mode stating the down-sampled period for any combined time slices.

In Introscope, a downsampled period is a time period that is disproportionately large for the associated SmartStor data storage tiering level. For example, in Data Tier 1 (relatively current data), each data point for reported metric data represents a 15 second period. If SmartStor gets slow, and the Enterprise Manager can’t keep up, instead of saving two points of 15-second data, SmartStor stores only one point every 30 seconds, halving the amount of data it needs to write to disk. At Data Tier 2 (older but not the oldest data), all the 15-second period data is reperiodized to cover 60 seconds. Typically for Data Tier 2 data, reperiodization means that four metric data points, representing 60 seconds, are combined into a single 60-second data point. The same process is done once more to combine Data Tier 2 data into the oldest set of data, which is Data Tier 3 data.

Collector and MOM settings and capacity 69

Page 70: Intro Scope Sizing Guide

CA Wily Introscope

So when your Collector approaches the metrics limit and you see the warning messages described above, add a new Collector to your system.

» Note If you are running a standalone Enterprise Manager that is approaching the metrics limit, you may need to implement a clustered environment.

Collector events limits

The Collector treats ChangeDetector events, Transaction Traces, errors, stall events, and so on as event objects, and persists them in object databases attached to the Enterprise Manager. For example, Transaction Traces and errors are usually stored in the traces.db file. The maximum events limit represents the total number of events a Collector can receive and persist from all agents.

There is one limit for steady-state event persistence and another for burst capacity. Steady state means 24/7 ongoing event activity. Burst capacity means that the Collector can sustain a heavy events load for no more than a couple of hours, but not 24 hours. For burst limit examples, see the Sample Introscope 8.0 Collector sizing limits table on page 119.

If you want to know how many events are actually being received in your system, you can count the number of Transaction Traces per time slice. That number is seen in the Investigator tree Enterprise Manager health and supportability metrics as Data Store | Transactions:Number of Inserts Per Interval.

The only Introscope events that are potentially high volume are Transactions Traces and ChangeDetector events. The other types of events are not as common, and the number of errors and stall events should be fairly small. This is because there are throttles at the agent side to prevent large numbers of errors being sent to the Enterprise Manager.

As an example (see Sample Introscope 8.0 Collector sizing limits table on page 119), the steady state limit for all events on an AMD Opteron-processor based hardware is about 1,000 events per minute. The burst limit is five times that. So for the Opteron, in this example, the burst limit is 5,000 events per minute.

Collector agent limits

Introscope 7.x changed the agent connection architecture from multiple threads per connection to a pooled thread non-blocking architecture. The 6.x architecture imposed very strict limits on the number of agents that could be connected to an Enterprise Manager. Introscope 8.x agents use the same connection pool mechanism as 7.x, but 6.x agents do not. The new higher limit, as shown in the Sample Introscope 8.0 Collector sizing limits table on page 119, only applies if ALL agents are 7.x or 8.x.

70 EM Requirements and Recommendations

Page 71: Intro Scope Sizing Guide

Sizing and Performance Guide

In Introscope 8.0, the Enterprise Manager can take advantage of additional CPUs to increase the maximum agents limit. The Enterprise Manager must be using 4 CPUs or Cores to take advantage of the increased Collector agents capacity (see the Sample Introscope 8.0 Collector sizing limits table on page 119). This limits are dependent on the specific hardware in use.

The number of currently connected agents is available as the Number of Agents Enterprise Manager health and supportability metric, which you can find in the Investigator tree at this location:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Connections | Number of Agents.

An overloaded Enterprise Manager starts to combine metrics, so once you approach the agent limit, add a new Collector.

An inappropriately configured agent can create thousands of metrics in quick succession and overload the Enterprise Manager. To prevent this, the Enterprise Manager uses a metric clamp. For information about metric clamping, see Metric clamping on page 96.

Collector hardware requirements

The hardware required to run a Collector at maximum load is primarily dependent on the machine's CPU speed and dedicated disk I/O subsystem. Additional CPUs beyond 2 to 4 CPUs (hardware platform dependent) will not increase the capacity of a given Collector on specific hardware. Faster CPUs will. Check the Sample Introscope 8.0 Collector sizing limits table on page 119 for examples of appropriate hardware platform, OS, and CPU.

Collector with metrics alerts limits

For information about this topic, see About alerted metrics and slow Workstation startup on page 81.

Collector to MOM clock drift limit

MOM and Collector clocks need to be synchronized to within AT MOST three seconds. If the clocks drift by more than that amount, the MOM releases the connection with the Collector. Then the MOM attempts to reconnect periodically (every minute) and checks to see if the clocks are in sync. If not, the connection fails. In addition, any clock drift between the Collectors and the MOM, even within the required 3-second limit, will cause disproportionate delays in Workstation responses.

Collector and MOM settings and capacity 71

Page 72: Intro Scope Sizing Guide

CA Wily Introscope

» Important You must run time server software to synchronize at regular intervals the clocks of all the machines in the cluster. Time server software synchronizes the time on a machine with either internal time servers or internet time servers.

Workstation sluggishness or unresponsiveness is rarely caused by a problem in the Workstation or MOM. It is usually caused by a single unresponsive Collector, which propagates to the MOM and then the Workstation, and is magnified when Collectors are clustered.

One way to determine which Collector is slowing the system down is to look at the round-trip response time from MOM to each Collector. Each Collector has a ping metric that represents the MOM to Collector round-trip response time, and for optimal Workstation response time, should be less than 500 ms on average. This is equivalent to the GetEvent metric in Introscope 7.0. The ping metric shows how quickly the Collectors are responding to messages from the MOM.

» Note The Introscope ping metric monitors only the lower boundary of the round-trip response time from the MOM to each Collector. This ping time is not the same as the network ping time, which is the sending of an ICMP echo request and getting an echo response.

The ping metric is a good way to diagnose which Collector is responding slowly. Ping times of 10 seconds or longer indicate a slow Collector that may be overloaded. If the ping time is above the 10 second threshold for extended periods of time, investigate the overall health of the Collector that is reporting the slower ping time. Check for obvious signs that this Collector is overloaded, such as the Collector is combining time slices or receiving very large numbers of events. For more information about Collector health, the ping time threshold, and the ping metric, see Local network requirement for MOM and Collectors on page 51.

Reasons Collectors combine slices

If a Collector combines time slices throughout the day and appears to respond slowly despite being at or below the maximum recommended capacity limits, one of these four reasons is likely to be causing the problem:

Other processes are running on the Collector.

72 EM Requirements and Recommendations

Page 73: Intro Scope Sizing Guide

Sizing and Performance Guide

The sizing guideline provided for any hardware configuration assumes that no other processes are running on the host. If for example, the Sample Introscope 8.0 Collector sizing limits table on page 119 states that a Collector running on a 2 CPU Xeon can handle 500,000 metrics, that assumes there is no other server or database process running on the machine. This is true for any background process, but is especially important for processes that might impact the disk I/O performance or have a large memory footprint. The Collector doesn't like contention for its disk or the memory resources. This is a significant factor in many performance problems.

I/O contention with SmartStor and other processes including the Enterprise Manager itself; SmartStor is not located on a separate disk or I/O subsystem

The virtual agent is poorly configured.

Very large virtual agents or poorly configured virtual agents with a lot of metrics will start to use up the CPU resources. The two biggest CPU drains are metrics baseline (heuristics) and virtual agents because of the large amount of calculation involved in both.

Large Transaction Traces are running continuously.

The process of accepting and persisting events like Transaction Traces involves deserialization and indexing, which are very CPU intensive. A very large number of Transaction Traces uses a lot of Collector CPU resources.

Increasing Collector capacity with more and faster CPUs

Adding two additional CPUs beyond the minimum 2 CPUs required increases the maximum agents per Collector limit, doubles the maximum number of applications, and increases number of metrics that can be placed in metric groupings. However, the maximum metrics limit remains the same. However, faster CPUs may help improve the Enterprise Manager performance, for example with faster Workstation query response times. See the limits shown in the Sample Introscope 8.0 Collector sizing limits table on page 119.

» Note In this guide, 2 CPUs is interchangeable with Dual Core and 4 CPUs is interchangeable with Quad Core.

Collector and MOM settings and capacity 73

Page 74: Intro Scope Sizing Guide

CA Wily Introscope

Standalone EM hardware requirements example

Here is a standalone Enterprise Manager hardware requirements example. This should help you understand the various components and requirements you'll need to consider if you are deploying a standalone Enterprise Manager in your Introscope environment.

» Important The Enterprise Manager described below is only an example; it is NOT provided as the only recommended Enterprise Manager for any or all Introscope environment(s).

Running multiple Collectors on one machine

All Collectors need a minimum of two CPUs to perform their key operations. Adding an additional 2 CPUs for a total of 4 CPUs helps increase these limits:

number of applications per Collector

number agents per Collector

number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager).

You can run multiple Collectors on one machine as long as you follow these requirements:

Run the OS in 64-bit mode to take advantage of a large file cache.

The file cache is important for the Collectors when doing SmartStor maintenance, for example spooling and reperiodization. File cache resides in the physical RAM, and is dynamically adjusted by the OS during runtime based on the available physical RAM. CA Wily recommends having 3 to 4 GB RAM per Collector.

EM Component Example Requirement

Number of EM instances/Server 1

Server Type and Model Windows Server 2003

Operating System Windows (running in 64-bit mode for optimum file cache size)

CPU Two to four Intel Xeon CPUs @ 2.8 GHz

Physical RAM 4 GB

Disk I/O Subsystem The OS resides on a separate physical disk.

RAID 0 or RAID 5 configuration.

Drive Speed: 10k RPM or greater

74 EM Requirements and Recommendations

Page 75: Intro Scope Sizing Guide

Sizing and Performance Guide

There should not be any disk contention for SmartStor, meaning you use a separate physical disk for each SmartStor instance.

If there is contention for SmartStor write operations, the whole system can start to fall behind, which can result in poor performance such as combined time slices and dropped agent connections.

The Baseline.db and traces.db files from up to four Collectors can reside on a separate single disk. In other words, up to four Collectors can share the same physical disk to store all of their baseline.db and traces.db files.

Collector and MOM settings and capacity 75

Page 76: Intro Scope Sizing Guide

CA Wily Introscope

76 EM Requirements and Recommendations

Page 77: Intro Scope Sizing Guide

CHAPTER

3

Metrics Requirements and Recommendations

This chapter provides background and specifics to help you understand sizing and performance-related metrics requirements, settings, and limits for your Introscope system. In this chapter you’ll find the following topics:

Metrics background . . . . . . . . . . . . . . . . . . 78

About metrics groupings and metric matching. . . . . . . . . . 78

8.0 metrics setup, settings, and capacity . . . . . . . . . . . 79

Matched metrics limits . . . . . . . . . . . . . . . . . 79

Inactive and active metric groupings and EM performance . . . . . . 80

Performance and metrics groupings using the wildcard (*) symbol . . . 80

SmartStor metrics limits . . . . . . . . . . . . . . . . 80

Virtual agent metrics match limits . . . . . . . . . . . . . 80

About alerted metrics and slow Workstation startup . . . . . . . . 81

About aggregated metrics and Management Module hot deployments . . 81

Detecting metrics leaks . . . . . . . . . . . . . . . . . 81

Metrics leak causes . . . . . . . . . . . . . . . . . . 82

Finding a metrics leak . . . . . . . . . . . . . . . . . 82

Metrics for diagnosing a metrics leak . . . . . . . . . . . . 83

Detecting metric explosions . . . . . . . . . . . . . . . 84

Metric explosion causes . . . . . . . . . . . . . . . . . 84

Finding a metric explosion . . . . . . . . . . . . . . . . 85

How Introscope prevents metric explosions . . . . . . . . . . 91

SQL statements and metric explosions . . . . . . . . . . . . 92

SQL statement normalizers . . . . . . . . . . . . . . . 94

Enterprise Manager dead metric removal . . . . . . . . . . . 96

Metric clamping . . . . . . . . . . . . . . . . . . . 96

SmartStor metadata files are uncompressed . . . . . . . . . . 98

Metrics Requirements and Recommendations 77

Page 78: Intro Scope Sizing Guide

CA Wily Introscope

Metrics backgroundEvery 15 seconds, the metrics harvest cycle takes place on the Enterprise Manager. During this process, the two sets of metrics data reported by agents are aggregated by the Enterprise Manager. This time slice data is processed to perform calculations, check alerts, update heuristics, update Workstation views and are persisted to disk by SmartStor. Typically at load levels close to the limits recommended in the Sample Introscope 8.0 Collector sizing limits table on page 119, the harvest duration time takes no more than about 3 to 4 seconds.

The metrics limit that an individual Collector can handle is influenced by the CPU speed. As discussed in EM basic requirements on page 20, CA Wily recommends two to four dedicated CPUs per Collector (depending on hardware platform). Additional dedicated, physical CPUs won't increase the number of metrics and agents a Collector can handle. However, faster CPUs may help increase the Collector's maximum capacity.

Introscope business logic is handled by the following:

total number of metrics groupings

maximum number of metrics in a metrics groupings

number of metrics persisted per minute.

Understanding metric groupings and metric matching, then following the guidelines discussed in Matched metrics limits on page 79 can be helpful in avoiding performance problems.

About metrics groupings and metric matching

Introscope metrics, which measure application performance that Introscope tracks and records, are identified by strings. Every metric in Introscope has a string identifier name that includes its host, process, agent, and resource name such as JSP|_Shopping_Cart_JSP:Average Response Time (ms). The structure of your Investigator tree reflects the resource name.

Introscope's business rules are built around the concept of metrics groupings. A metrics grouping is a logical grouping of metric resource names or identifiers that you’ve defined using a regular expression. Whenever you create a regular expression, you are creating a metrics grouping. If a regular expression returns one or more results, those results are the metrics grouping. Metrics groupings are stored in Management Modules. A metric can match no groups, one group, or many groups.

Metrics groupings are used by Introscope as described in the following process:

78 Metrics Requirements and Recommendations

Page 79: Intro Scope Sizing Guide

Sizing and Performance Guide

1 Introscope business logic monitors including include Alerts, Dashboards, and the Workstation Investigator tree want data from the Enterprise Manager.

2 Introscope business logic monitors request metric data using a metric group. For example an Enterprise Manager gets the Workstation request, “Give me the data for the “Servlets” metric group.”

3 When the data query is submitted, the Enterprise Manager scans all metrics to see which match the metric group “Servlets”. Those metrics are then subscribed to.

4 Every 15-second harvest cycle, the metrics that are subscribed to have their 15-second time slice data routed to the subscribing Introscope business logic monitor.

The total number of metrics that the Collector must assess during each time slice can easily become so big that it can’t process all the business logic you’ve defined for all your metrics in the 15 second harvest cycle period. This situation can lead to performance problems. Therefore, CA Wily recommends that the total number of metrics to be placed in metrics groupings is no more than 15% of the metrics limit if you are running a 2 CPU Collector. And no more than 30% of the metrics limit if are running a 4 CPU Collector. For example metrics limits, see Sample Introscope 8.0 Collector sizing limits table on page 119.

8.0 metrics setup, settings, and capacityThe topics below describe metrics-related settings and capacity limits required to set up, maintain, or configure your Introscope 8.0 environment.

Matched metrics limits

The limiting factors for metrics are:

SmartStor I/O activity

the number of metric groups defined

It doesn’t matter how many alerts that you’ve set. What matters is how many metrics are matched. CA Wily recommends that the total number of metrics to be placed in metrics groupings is no more than 15% of the metric limit if the Enterprise Manager is using its minimum requirement of 2 CPUs. If an Enterprise Manager is using 4 CPUs, this limit increases to 30%. For example limits, see Sample Introscope 8.0 Collector sizing limits table on page 119.

» Note If you are using standalone Enterprise Managers, you define metrics groupings on the Enterprise Manager. However, if you are using clustered Collectors, set up metrics groupings on the MOM.

8.0 metrics setup, settings, and capacity 79

Page 80: Intro Scope Sizing Guide

CA Wily Introscope

Inactive and active metric groupings and EM performance

Enterprise Manager performance can be dependent on whether the Enterprise Manager is handling inactive or active metrics groupings.

An inactive metrics grouping is metric grouping data that is not being requested by a Workstation. In this case, the Enterprise Manager looks at the calculations it needs to perform and alerts it needs to carry out (such as send out an e-mail message), and doesn’t need to do extra work to send metric grouping information to a Workstation.

An active metrics grouping is metric grouping data that is being requested by a user logged in at a Workstation. For example, an Introscope Administrator wants to look at a graph that is representing a metric grouping. In this case, the Enterprise Manager has to provide all the data for the graph to the Workstation in addition to performing the calculations and handling the alerts.

Performance and metrics groupings using the wildcard (*) symbol

Do not create metrics groupings or regular expressions that use only the wildcard or asterisk (.*) symbol and no other specifiers. Running the search term (.*) creates a metric grouping that matches all metrics in the system. Subsequent business operations on that grouping, such as adding alerts, then puts unnecessary overhead on the Enterprise Manager.

SmartStor metrics limits

Starting with Introscope 7.1, the metric capacity for a SmartStor increased by 25% over that provided in Introscope 7.0. The new limits are imposed by I/O throughput rather than CPU (as was the case in Introscope 7.0).

The recommended SmartStor metrics limit is the same for both the MOM and Collector as shown in the Sample Introscope 8.0 Collector sizing limits table on page 119.

Virtual agent metrics match limits

See Virtual agent metrics match limits on page 109 and examples in the Sample Introscope 8.0 Collector sizing limits table on page 119.

80 Metrics Requirements and Recommendations

Page 81: Intro Scope Sizing Guide

Sizing and Performance Guide

About alerted metrics and slow Workstation startup

If you launch your MOM and are logged in, but don't see any metrics in the Workstation Investigator tree for a long time, it's because the MOM is taking a long time to begin sending data to the Workstation. Large numbers (100,000 or more) of metrics alerts (a metric group to which you've attached an alert) in individual Collectors cause a great deal of network and CPU overhead in the MOM as the Collectors register these alerts in the MOM. During startup, the MOM takes the CPU for 15 to 20 minutes during which you can see the Investigator tree, but none of the metrics are being reported back to you. Since the slow processing is due to the MOM handling query exchange, this situation has very little impact on the Collectors and data collection progresses.

Limiting the number of Collectors and alerts should make the startup process less time consuming.

If the startup time is unacceptable, reduce the number of alerted metrics, or consider a machine with faster individual CPUs.

About aggregated metrics and Management Module hot deployments» WARNING Do not perform Management Module hot deployments on

production Collectors or MOMs.

The cost of Management Module hot deployments can be 48 seconds per 1500 aggregated metrics (on virtual agents), and during the Management Module hot deployment, the Collector CPU utilization is almost 100% for the entire duration. The effect on a MOM is even more pronounced. For more information about Management Module hot deployments, see Avoid Management Module hot deployments on page 68.

Detecting metrics leaksIf the number of agents connected to your Enterprise Manager is within the guidelines recommended in the Sample Introscope 8.0 Collector sizing limits table on page 119, yet the Enterprise Manager behaves like it is handling a much larger load than it is actually managing, the cause may be a metrics leak. Symptoms may be, for example, that historical queries for even relatively small numbers of metrics take far too long to return, causing dashboards to render in slow motion and Investigator browsing to be extremely slow or impossible.

Detecting metrics leaks 81

Page 82: Intro Scope Sizing Guide

CA Wily Introscope

A metrics leak happens when a metric produces data for a very short period of time, and then never produces data again. This happens when part of the metric name includes something transient, like a session key or a SQL parameter.

» Note A metric explosion happens when an agent is inadvertently set up to report more metrics than the system can handle. In this case, Introscope is bombarded with such a large number of metrics that performance gets very slow or the system cannot function at all. For more information, see Detecting metric explosions on page 84.

Metrics leak causes

The Enterprise Manager’s SmartStor keeps both a skeleton of metric names (metrics metadata) and the actual meat or content of the historical data (which is recorded every 15 seconds) behind the metric names. Each time a new metric is introduced to Introscope, the metadata is updated in SmartStor so that subsequent historical queries can get to the data about that metric later.

If an agent is misconfigured such that it builds a new metric for each transaction against the application (for example, a broken SQL statement normalization), or if new metrics are created for a large body of metrics that are supposed to be repeated with each restart (for example a unique ID in the JMX or WebSphere PMI metrics collected by the agent), then the metric metadata continues to grow continuously. If the growth is fast enough, the Enterprise Manager might shut down the agent at its metric limit (the default limit is 50,000 metrics). If the metrics leak growth is slow, however, the problem may be invisible initially.

When the skeleton grows large enough, however, routine operations with the SmartStor such as querying historical data for a metric become extremely slow. Counting metrics in the agent will likely highlight the problem, but if the metric growth occurs slowly enough between agent restarts, then the problem may not be visible except from the count of metrics in the metadata skeleton.

Finding a metrics leak

The most obvious symptom of a metrics leak is when messages about saving metadata appearing continuously and with extremely high latencies in the IntroscopeEnterpriseManager.log.

The typical amount of time SmartStor needs to save metadata ranges from about one tenth of a second to one half second (100 to 500 ms). The saving of metadata should generally only happen when new metrics are being added to an agent (for example during startup or when a new portion of the application has been exercised for the first time). If an agent has been in production at length under load, SmartStor is not expected to continue to carry out the metadata save operation.

82 Metrics Requirements and Recommendations

Page 83: Intro Scope Sizing Guide

Sizing and Performance Guide

The SmartStor metadata save time is recorded and stored in the Enterprise Manager log, as shown in the log snippet below. In this case, it took 86209 ms (86 seconds) to save this piece of metadata. This long saved metadata time is a strong indication of a metrics leak problem.

[INFO] [Manager.SmartStor] Saved metadata - took 86209

Metrics for diagnosing a metrics leak

Introscope provides a number of metrics to help you perform a high level diagnosis of a metrics leak problem. These metrics are described in the table below.

You will not solve your metrics leak until you identify the cause of the leaking metrics and plug it. Contact CA Wily Support if you are unsure about how proceed with fixing your metrics leak.

Metric Description

Enterprise Manager | Data Store | SmartStor | MetaData:Metrics with Data

Replaces the previous metadata metric and renames it to better convey the notion that this is the number of metrics known to SmartStor.

Enterprise Manager | Data Store | SmartStor | MetaData:Agents with Data

The number of agents that the metadata knows about that have data in SmartStor.

Enterprise Manager | Data Store | SmartStor | MetaData:Agents without Data

The number of agents that the metadata knows about that have no data in SmartStor.

Enterprise Manager | Data Store | SmartStor | MetaData:Partial Metrics with Data

The number of partial metrics (metrics under an agent node in the Investigator) that the metadata knows about that have data in SmartStor.

Enterprise Manager | Data Store | SmartStor | MetaData: Partial Metrics without Data

The number of partial metrics (metrics under an agent node in the Investigator) that the metadata knows about that have no data in SmartStor.

Detecting metrics leaks 83

Page 84: Intro Scope Sizing Guide

CA Wily Introscope

Detecting metric explosionsA metric explosion happens when an agent is inadvertently set up to report more metrics than the system can handle. In this case, Introscope is bombarded with such a large number of metrics that performance gets very slow or the system cannot function at all. Contact CA Wily Technical Support if you think your Introscope system performance issues may be due to a metric explosion.

Metric explosion causes

Metric explosions can be caused by a number of factors including:

broken SQL Agent normalization. See SQL statements and metric explosions on page 92.

a large number of unique SQL statements. See How poorly written SQL statements create metric explosions on page 92.

sockets being opened on random ports. See Finding a metric explosion on page 85.

JMX serverid

Metric explosion due to JMX serverid occurs when there are JMX filter strings given to WebLogic that produce metric names that include a serverid= <int>, where the integer is a unique number for each WebLogic run. This can result in thousands of new metrics with each server restart. In this situation, for example, after several weeks the SmartStor metadata can be in excess of 500 K dead metrics, although the actual metric count should have been no more than 25K. See Metrics leak causes on page 82 and Finding a metric explosion on page 85.

The JDBC URL is formatted into the SQL metric names database name formatting. For more information, see Knowledgebase Article 1240.

If a URL grouping that uses the introscope.agent.urlgroup properties is not used, then every unique URL generates a different node of metrics. See Knowledgebase Article 1112.

84 Metrics Requirements and Recommendations

Page 85: Intro Scope Sizing Guide

Sizing and Performance Guide

Finding a metric explosion

The most obvious indication of a metric explosion is poor Enterprise Manager performance with these conditions and symptoms:

Low number of agents and a reasonable metric count, but the Enterprise Manager performance is sluggish and similar to an overloaded application

Small historical queries are extremely slow, taking many seconds or even minutes

High CPU utilization (often above 50%)

Disk usage is not necessarily higher than usual

a very large number of agent metrics being generated, for example more than 7,000 metrics per JVM

extremely long (longer than 30 seconds) SmartStor metadata save times

A warning in the Enterprise Manager log that the agent metrics limit has been reached, and that no more metrics will be accepted.

If you have these symptoms, chances are that you have a metric explosion situation. The SmartStor metadata save time is recorded and stored in the Enterprise Manager log, as shown in the log snippet below. In this case, it took 31701 ms (31 seconds) to save this piece of metadata. This long saved metadata time is a strong indication of a metric explosion problem.

7/13/06 09:31:08 AM PDT [INFO] [Manager.SmartStor] Saved metadata - took 31701

When it takes 30 seconds or longer to save metadata, you are probably storing a massive amount of metric names (250K or more). If you also see that you are saving the metadata often (every few minutes or less), you are leaking metrics. When too many metrics are leaking very quickly, it causes a metric explosion.

Investigator metrics and tab for diagnosing metric explosions

You can view several metrics and use the Enterprise Manager Overview tab to determine current and historical metrics counts. When you see excessively high metric count numbers, this indicates you have a metric explosion situation.

Metric Count metric

The Metric Count metric tracks the number of metrics that the Enterprise Manager currently thinks are live, meaning actively reporting data from a specific agent. If this value is exceptionally high, it means an agent is reporting too many metrics.

Detecting metric explosions 85

Page 86: Intro Scope Sizing Guide

CA Wily Introscope

You can find this metric under the Custom Metric Agent (Virtual) node in the Investigator tree; it will look similar to this:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)| Agent | Metric Count

» Note Before Introscope 8.0, the Metric Count metric node was located under the Agent Stats node.

86 Metrics Requirements and Recommendations

Page 87: Intro Scope Sizing Guide

Sizing and Performance Guide

You can also configure the EM Capacity Dashboard to access the current metric count and the metric count from the top five agents.

» Note You must configure the EM Capacity dashboard before use, as it does not automatically contain links to underlying data. For information about creating and editing custom links, see the Introscope Configuration and Administration Guide.

Example EM Capacity dashboard configuration

To view the current metric count and Top 5 agent metric counts from the EM Capacity dashboard:

1 Choose Workstation > New Console.

2 When the Console window opens, select EM Capacity from the Dashboard: list.

3 In the Metrics graph, click on a metric bar.

An Investigator window opens displaying the Number of Metrics metric.

4 Return to the EM Capacity dashboard.

5 In the Agents graph, click on a metric bar.

An Investigator window opens displaying the metric count for one of the top 10 agents.

Detecting metric explosions 87

Page 88: Intro Scope Sizing Guide

CA Wily Introscope

Historical Metric Count metric

The Historical Metric Count metric shows the total number of metrics from an agent that are either live or recently active. The Enterprise Manager uses this metric to decide whether to start clamping more metrics from the agent.

If the Historical Metric Count is high while the Metric Count metric is in range, it means that the agent has too many metrics that it is intermittently reporting data on or is constantly renaming its metrics.

You can find this metric under the Custom Metric Agent (Virtual) node in the Investigator tree; it will look similar to this:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)| Agent | Historical Metric Count.

88 Metrics Requirements and Recommendations

Page 89: Intro Scope Sizing Guide

Sizing and Performance Guide

Number of Historical Metrics metric

The Number of Historical Metrics metric returns the total number of metrics the Enterprise Manager is tracking across all agents. The true limit of the Enterprise Manager’s performance is defined by this number. While there is no specific limit to the number of agents, or specific number of metrics per agent, if the combination becomes too high in total, the Enterprise Manager starts to performing poorly. You can find this metric at the following location in the Investigator tree:

Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Connections | Number of Historical Metrics.

Detecting metric explosions 89

Page 90: Intro Scope Sizing Guide

CA Wily Introscope

Enterprise Manager Overview tab

In the Enterprise Manager Overview tab Number of Metrics graph, shown in the figure below, you can view the number of live metrics as well as historical metrics reporting to the Enterprise Manager. This can help you determine if an Enterprise Manager is experiencing metric overload. For more information, see About the Enterprise Manager Overview tab on page 27.

Here are the Historical metrics.Here are the Live metrics.

When you hover the cursor over the data points, you can get more metric information, as shown in this Historical metrics example.

90 Metrics Requirements and Recommendations

Page 91: Intro Scope Sizing Guide

Sizing and Performance Guide

How Introscope prevents metric explosions

Introscope includes several capabilities to prevent metric explosions:

Agent metric aging, in which unused metrics are regularly removed. This results in little or no build up of unused metrics on the agent and Enterprise Manager. See About agent metric aging on page 91.

SQL statement normalizers. See SQL statements and metric explosions on page 92.

Unused metrics are regularly removed from the Enterprise Manager. See Enterprise Manager dead metric removal on page 96.

SmartStor metadata file is uncompressed. See Metric clamping on page 96.

Metric clamping. See Metric clamping on page 96.

About agent metric aging

Starting in Introscope 8.0, by default agent metric aging periodically removes dead metrics from the agent memory cache. A dead metric is a metric that has no new data reported in a given amount of time. Agent metric aging runs on a heartbeat in the agent. During each heartbeat, a certain set of metrics are checked. The heartbeat is the time interval when metrics are checked for removal, in seconds. A metric is a candidate for removal if the metric has not received new data after certain period of time. For more information and instructions on configuring agent metric aging properties including the number of metrics checked each heartbeat and the metric removal time period, see the Java Agent Guide or the .NET Agent Guide.

When dead metrics are not removed, performance issues can occur due to an excessive number of metrics being sent to the Enterprise Manager. This can result in both greater CPU utilization and a slower response time as the Enterprise Manager works harder to perform its tasks.

Agent metric aging can increase Workstation and Enterprise Manager response time if many dead metrics are added to the agent memory cache. To see the current metric count for an agent, see the Metric Count node under the Custom Metric Agent node.

Agent metric aging can also reduce performance in two ways. First, if agent metric aging happens too frequently, that is metrics are removed and then are turned on again, you may see an increase in CPU utilization and increased response times. For example, if every hour the same metrics are removed from the cache and then added back again, there is increased performance overhead due to the metrics and accumulators being cached repeatedly.

Detecting metric explosions 91

Page 92: Intro Scope Sizing Guide

CA Wily Introscope

In this case, you can update your agent metric aging properties so that they use less system overhead. Update the introscope.agent.metricAging.numberTimeslices property and increase its value. In addition, avoid reporting metrics that need to be removed and then turned on again. For example, you could stop reporting a SQL statement metric that gets invoked every two hours when the associated dead metric ages out every hour.

Second, if Introscope checks too many metrics during each heartbeat, this can reduce performance. In this case, you might not see agent metrics being aged and removed, however, during each heartbeat metric review, Introscope checks metrics for possible metric removal. This adds to performance overhead. In this case, update the introscope.agent.metricAging.dataChunk property to a lower number so that Introscope checks fewer metrics for metric removal during each heartbeat metric review. You can also decrease the heartbeat frequency by reducing the value of the introscope.agent.metricAging.heartbeatInterval property, so that Introscope checks for metric removal less often.

For information about configuring agent metric aging properties, see the Java Agent Guide or the .NET Agent Guide.

SQL statements and metric explosions

Metric explosions can be caused by a number of factors, including a large number of unique SQL statements.

How poorly written SQL statements create metric explosions

If your SQL Agent is showing a large and increasing number of unique SQL metrics even though your application uses a small set of SQL statements, the problem could be in how the SQL statement was written.

In general, the number of SQL Agent metrics should approximate the number of unique SQL statements. An common reason this becomes a problem is because of how comments are used in SQL statements. For example, in this statement,

"/* John Doe, user ID=?, txn=? */ select * from table..."

the SQL Agent creates the following metric:

"/* John Doe, user ID=?, txn=? */ select * from table..."

92 Metrics Requirements and Recommendations

Page 93: Intro Scope Sizing Guide

Sizing and Performance Guide

Note that the comment is part of the metric name. While the comment is useful for the database administrator to see who is executing what query, the SQL Agent does not parse the comment in the SQL statement. Therefore, for each unique user ID, SQL Agent creates a unique metric, potentially causing a metric explosion. The database that executes the SQL statements does not see these metrics as unique because it ignores the comments.

This problem can be avoided is by putting the SQL comment in single quotes, as shown:

"/*' John Doe, user ID=?, txn=? '*/ select * from table..."

The SQL Agent then creates the following metric where the comment no longer causes a unique metric name:

"/* ? */ select * from table..."

Some applications may generate an extremely large number of unique SQL statements. If technologies like EJB 3.0 or Hibernate are in use, the likelihood of long unique SQL statements increases. For more information about Hibernate, see http://www.hibernate.org/.

Example 1

In looking in Investigator at this path under an agent nodeBackends|{backendName}|SQL|{sqlType}|sql

you notice that temporary tables are being accessed like this:

SELECT * FROM TMP_123981398210381920912 WHERE ROW_ID = ?

All the additional digits on the TMP_ table name are unique and steadily growing causing a metric explosion.

Example 2

You have been alerted to a potential metric explosion and your investigation brings you to a review of this SQL statement:

#1 INSERT INTO COMMENTS (COMMENT_ID, CARD_ID, CMMT_TYPE_ID, CMMT_STATUS_ID, CMMT_CATEGORY_ID, LOCATION_ID, CMMT_LIST_ID, COMMENTS_DSC, USER_ID, LAST_UPDATE_TS) VALUES (?, ?, ?, ?, ?, ?, ?, "CHANGE CITY FROM CARROLTON, TO CAROLTON, _ ", ?, CURRENT)

In studying the code, you notice that "CHANGE CITY FROM CARROLTON, TO CAROLTON, _ " recurs as a dizzying array of cities.

Detecting metric explosions 93

Page 94: Intro Scope Sizing Guide

CA Wily Introscope

Example 3

You have been alerted to a potential metric explosion and your investigation brings you to a review of this SQL statement:

CHANGE COUNTRY FROM US TO CA _ CHANGE EMAIL ADDRESS FROM TO BRIGGIN @ COM _ "

In studying the code, you notice CHANGE COUNTRY results in an endless list of countries. In addition, the placement of the quotes for countries results in people's e-mail addresses getting inserted into SQL statements. Here’s the source of metric explosion as well as other negative consequences.

SQL statement normalizers

To address many unique SQL statements, the SQL Agent includes these statement normalizers:

For more information about working with Introscope SQL statement normalization capabilities, see the Java Agent Guide or the .NET Agent Guide.

The two examples below can help you understand how to implement the regular expression SQL statement normalizer.

Normalizer Type Description

Default SQL statement Normalizes text within single quotation marks ('xyz')

Custom SQL statement The SQL Agent allows users to add extensions for performing custom normalization

Regular expression SQL statement normalizer

A SQL Agent extension that normalizes SQL statements based on configurable regular expressions (regex).

Note: CA Wily recommends that you use this normalizer first, as it allows you to configure regular expressions and normalize any characters or sequence of characters in the SQL statement.

Command-line SQL statement normalizer

If the regular expression SQL normalizer is not in use, and code includes SQL statements that enclose values in the where clause with double quotes (" "), allows a command-line command to normalize SQL statements

94 Metrics Requirements and Recommendations

Page 95: Intro Scope Sizing Guide

Sizing and Performance Guide

Example 1

Here’s a SQL query before regular expression SQL statement normalization:

INSERT INTO COMMENTS (COMMENT_ID, CARD_ID, CMMT_TYPE_ID, CMMT_STATUS_ID, CMMT_CATEGORY_ID, LOCATION_ID, CMMT_LIST_ID, COMMENTS_DSC, USER_ID, LAST_UPDATE_TS) VALUES(?, ?, ?, ?, ?, ?, ?, ‘’CHANGE CITY FROM CARROLTON, TO CAROLTON, _ ", ?, CURRENT)

Here’s the desired normalized SQL statement:

INSERT INTO COMMENTS (COMMENT_ID, ...) VALUES (?, ?, ?, ?, ?, ?, ?, CHANGE CITY FROM ( )

Here’s the configuration needed to the IntroscopeAgent.profile file to result in the normalized SQL statement shown above:

introscope.agent.sqlagent.normalizer.extension=RegexSqlNormalizer

introscope.agent.sqlagent.normalizer.regex.matchFallThrough=true

introscope.agent.sqlagent.normalizer.regex.keys=key1,key2

introscope.agent.sqlagent.normalizer.regex.key1.pattern=(INSERT INTO COMMENTS \\(COMMENT_ID,)(.*)(VALUES.*)''(CHANGE CITY FROM \\().*(\\))

introscope.agent.sqlagent.normalizer.regex.key1.replaceAll=false

introscope.agent.sqlagent.normalizer.regex.key1.replaceFormat=$1 ...) $3 $4 $5

introscope.agent.sqlagent.normalizer.regex.key1.caseSensitive=false

introscope.agent.sqlagent.normalizer.regex.key2.pattern='[a-zA-Z1-9]+'

introscope.agent.sqlagent.normalizer.regex.key2.replaceAll=true

introscope.agent.sqlagent.normalizer.regex.key2.replaceFormat=?

introscope.agent.sqlagent.normalizer.regex.key2.caseSensitive=false

Example 2

Here’s a SQL query before regular expression SQL statement normalization:

SELECT * FROM TMP_123981398210381920912 WHERE ROW_ID =

Here’s the desired normalized SQL statement:

SELECT * FROM TMP_ WHERE ROW_ID =

Here’s the configuration needed to the IntroscopeAgent.profile file to result in the normalized SQL statement shown above:

introscope.agent.sqlagent.normalizer.extension=RegexSqlNormalizer

Detecting metric explosions 95

Page 96: Intro Scope Sizing Guide

CA Wily Introscope

introscope.agent.sqlagent.normalizer.regex.matchFallThrough=true

introscope.agent.sqlagent.normalizer.regex.keys=key1

introscope.agent.sqlagent.normalizer.regex.key1.pattern=(TMP_)[1-9]*

introscope.agent.sqlagent.normalizer.regex.key1.replaceAll=false

introscope.agent.sqlagent.normalizer.regex.key1.replaceFormat=$1

introscope.agent.sqlagent.normalizer.regex.key1.caseSensitive=false

Enterprise Manager dead metric removal

Starting with Introscope 8.0, when a metric has not produced data for more than eight minutes (default), it is removed from the Investigator tree. The Enterprise Manager metrics removal reduces the number of potential metric explosions, and also results in improved performance.

You may notice the following performance improvements due to dead metric removal:

Reduced Enterprise Manager RAM consumption

Increased responsiveness of the Enterprise Manager while tracking live metrics

Metric clamping

Several properties that limit, or clamp, the number of metrics on the agent and the Enterprise Manager help to prevent spikes in the number of reported metrics (metric explosions) on the Enterprise Manager. No new metrics are displayed in the Workstation after a clamp has occurred.

Metric clamping is enabled through four new properties:

» Note The default values for each property are used by Enterprise Manager if the line for the property is commented out in the EnterpriseManager.properties file.

Property Name Description

introscope.enterprisemanager.agent.metrics.limit Limits the number of live and historical metrics an agent will report. The default is 50,000.

introscope.enterprisemanager.metrics.live.limit Limits the number of live metrics reporting from agents per Enterprise Manager. The default is 500,000.

96 Metrics Requirements and Recommendations

Page 97: Intro Scope Sizing Guide

Sizing and Performance Guide

For more information about these properties, see the Introscope Configuration and Administration Guide.

When the Enterprise Manager starts up, the values of these properties are logged. When an Enterprise Manager hits a clamp value based on the total number of metrics that it can process in total or when an agent hits the agent clamp, a log message appears in the Enterprise Manager log. If clamping is no longer necessary due to a change in the limits, then another log message is logged in the Enterprise Manager log. All supported agents obey these clamps, though the custom metric agent and agent clusters (virtual agents) are not subject to the clamps.

introscope.enterprisemanager.metrics.historical.limit Limits limits the total metrics per agent (both live and historical) per Enterprise Manager. The default is 1,200,000.

introscope.enterprisemanager.query.datapointlimit Limits the maximum metric data points each Collector or standalone Enterprise Manager returns from any one query. The clamp is per query, not all concurrent queries.

Queries to MOMs are only indirectly clamped by the data limit on each Collector. Default=0 (No limit).

Note: The value you choose for this property depends on a number of different factors such as heap size, number of total metrics on the Enterprise Manager, and whether the Enterprise Manager is a MOM or a Collector. Determine the limit based on the load and hardware in your Introscope environment.

Property Name Description

Detecting metric explosions 97

Page 98: Intro Scope Sizing Guide

CA Wily Introscope

Metric clamp supportability metrics

The Metric Count metric, which is an Enterprise Manager supportability metric seen in the Investigator tree under the Enterprise Manager node for each agent still reports live metrics only (as opposed to both live and historical metrics). For more information, see Metric Count metric on page 85.

In addition, under the Enterprise Manager node for each agent, there is a new Is Clamped supportability metric that has the value of 0 when an agent is not clamped and 1 when an agent is clamped.

Metric clamp scenario

Here’s a scenario describing what would happen if you set both the Enterprise Manager and agent metric clamps. Lets say in the IntroscopeEnterpriseManager.properties file you set the following values for these properties:

introscope.enterprisemanager.agent.metrics.limit=10000

introscope.enterprisemanager.metrics.live.limit=800

Then you start the Enterprise Manager and two agents.

You’d see that the Enterprise Manager gets clamped when 800 metrics have been reported, even though the agent clamp number of 10,000 metrics has not yet been reached. This means there are now no new metrics from the agent getting reported. In addition, the agent logs state that the Enterprise Manager clamp has been reached and no more metrics will be reported to the Enterprise Manager.

If you increase the Enterprise Manager clamp value, you’d see that new metrics from the agent start to be reported.

SmartStor metadata files are uncompressedBefore Introscope 8.0, SmartStor stored metadata using built-in Java compression. Starting in Introscope 8.0, to increase SmartStor’s speed in reading stored metadata files, all new metadata files are written in an uncompressed format. However, Introscope 8.0 still retains the capability of reading compressed data generated by previous versions.

The uncompressed metadata files are bigger than the compressed files, however the uncompressed files are still extremely small when compared to metric data .data files. The amount of disk space used by the metadata files is about eight to ten times greater than the compressed format. However, not using compression speeds up SmartStor access time by five times.

98 Metrics Requirements and Recommendations

Page 99: Intro Scope Sizing Guide

CHAPTER

4

Workstation and WebView Requirements and Recommendations

This chapter provides background and specifics to help you understand sizing and performance-related Workstation and WebView requirements, settings, and limits for your Introscope system. In this chapter you’ll find the following topics:

Workstation and WebView background . . . . . . . . . . . . 100

8.0 Workstation and WebView requirements . . . . . . . . . . 100

OS RAM requirements for Workstations running in parallel . . . . . . 100

WebView and Enterprise Manager hosting requirement . . . . . . . 100

8.0 Workstation and WebView setup, settings, and capacity . . . . . 101

Workstation to standalone EM connection capacity . . . . . . . . 101

Workstation to MOM connection capacity . . . . . . . . . . . 102

WebView server capacity . . . . . . . . . . . . . . . . 103

WebView server guidelines . . . . . . . . . . . . . . . . 103

Top N graph metrics limit per Workstation . . . . . . . . . . . 103

Workstation and WebView Requirements and Recommendations 99

Page 100: Intro Scope Sizing Guide

CA Wily Introscope

Workstation and WebView backgroundYou control Introscope and access performance metrics through the Introscope Workstation. You can set alerts for individual metrics or logical metric groups, view performance metrics, and customize views to represent your unique environment.

Introscope WebView presents Introscope's customizable dashboards and Investigator tree views to authorized users in a browser interface so that critical information can be viewed without the aid of the Workstation. To a MOM, WebView just looks like another Workstation client; from the MOM's point of view it's just another busy Workstation.

The Workstation in Introscope 8.0 uses less memory on average than in 7.x releases. It operates within the heap footprint specified in the Workstation.lax file. For information about the Workstation.lax file, see the Introscope Configuration and Administration Guide.

This applies to the Command Line Workstation (CLW), as well. For CLW queries that do not return large amounts of data, the 128 MB heap size specified in the Introscope Configuration and Administration Guide is adequate. However, larger queries require setting the heap size to 256 MB or greater.

Workstation sluggishness or unresponsiveness is rarely caused by a problem in the Workstation or MOM. It is usually caused by a single unresponsive Collector, which propagates to the MOM and then the Workstation. For more information, see MOM to Collectors connection limits on page 59.

8.0 Workstation and WebView requirementsThe topics below describe Workstation and WebView-related basic requirements.

OS RAM requirements for Workstations running in parallel

To run multiple Workstations on the same machine, the OS must have physical RAM for each Workstation running in parallel, above the memory required for the OS itself.

For example, in order to run three Workstations on a single Windows machine, the machine must have 512 MB + (3 x 256 MB) = 1.2 GB of physical memory (minimum).

WebView and Enterprise Manager hosting requirement

WebView should not be running on the same host as the Enterprise Manager to avoid contention for CPU resources.

100 Workstation and WebView Requirements and Recommendations

Page 101: Intro Scope Sizing Guide

Sizing and Performance Guide

8.0 Workstation and WebView setup, settings, and capacity

The topics below describe Workstation and WebView-related settings and capacity limits required to set up, maintain, and configure your Introscope 8.0 environment.

WebView generates a performance load equal to about that of three Workstations. You will need to take this into account when calculating the total number of Workstations and WebView instances you deploy in your Introscope environment.

Workstation to standalone EM connection capacity

A standalone Enterprise Manager should not have more than number of connected Workstations shown in the Sample Introscope 8.0 Collector sizing limits table on page 119. Each standalone EM can have multiple channels to provide data to and from Workstations and WebView instances, as shown in the figure below.

8.0 Workstation and WebView setup, settings, and capacity 101

Page 102: Intro Scope Sizing Guide

CA Wily Introscope

Workstation to MOM connection capacity

A MOM should not have more than the number of Workstation connections shown in Sample Introscope 8.0 MOM sizing limits table on page 122. In a clustered situation, each Collector has one channel of data flow to the MOM. Collectors, when connected to a MOM, should not have any direct Workstation or WebView connections. Instead, all Workstation and WebView connections should be made to the MOM, as shown in the figure below.

» Important Although in a MOM environment, data collection is spread across a number of Collectors, there is a case where Workstation performance problems can occur in a clustered environment. This happens if all the Workstation connections involve active users, and all their queries are based on data coming from a single Collector. In that case, the users may experience sluggish performance due to the Collector’s own internal limitations on simultaneous historical queries.

102 Workstation and WebView Requirements and Recommendations

Page 103: Intro Scope Sizing Guide

Sizing and Performance Guide

WebView server capacity

A single WebView server instance can not serve more than 10 to 15 concurrent users or 25 passive users. A passive user is someone who issues a query, then walks away from the browser window or doesn’t close the window when finished. In this case, the Enterprise Manager must keep sending data to the browser, which refreshes the web page every 15 seconds whether or not someone is actually needing or using the data.

Exceeding the number stated above generally results in slower response times for all browser clients. Use additional WebView instances that are co-located if your user requirement is larger.

WebView server guidelines

WebView servers do not require a dedicated I/O subsystem.

The CPU resource load is primarily dependent on the activity of each WebView server instance. The activity includes the concurrent user count and the number of dashboards that can be viewed through the WebView server instance. The more dashboards that there are, the more potential metrics that have to be requested from Enterprise Manager and processed to create graphs, and so on. Dashboards with large metric counts (over 1000) slow down WebView processing.

Example WebView server configuration

A 4-way 3 GHz AMD Opteron or Intel Xeon with 8 GB RAM server running Linux can be used to host multiple WebView server instances on the same machine. This hardware should be able to house about three WebView servers.

Top N graph metrics limit per Workstation

Top N is a way to qualify a graph on an Introscope dashboard so that only the Top N (where you pick the N) metrics show up. It's a way to further filter data based on the actual content of the data. For example, you can set up a metric group that matches all servlets. Say there are 100,000 servlets in your system. On a dashboard, you have a graph display to show the top five slowest servlets. The Enterprise Manager has to subscribe to and process the data for all 100,000 servlets in order to determine the top five slowest. That's why processing Top N graphs is resource expensive for the Enterprise Manager.

8.0 Workstation and WebView setup, settings, and capacity 103

Page 104: Intro Scope Sizing Guide

CA Wily Introscope

At all times, the sum of all metrics (metrics and metrics groupings) for every TOP N graph viewed by every Workstation instance (all Workstations total) should not exceed 100,000 metrics. Try to use Top N sparingly because whenever a Top N request is made, all the data is provided in real time, which puts a large resource demand on your Introscope system. And when used, have as few viewers as possible actively view Top N graphs.

If in a single moment in time Introscope system users are actively viewing dashboards and graphs representing more than 100,000 metrics, performance problems can occur. For example, dashboards can have very slow refresh times.This can occur when a number of users log in at the same time to view a dashboard containing a Top N graph.

For example, imagine that there are ten dashboards defined in a system, and two of the ten of dashboards include 10 graphs on them that are Top N graphs. The other eight dashboard have 10 standard (not Top N) graphs. And let’s say that each of the ten Top N graphs has a metric grouping that matches 1,000 metrics. This means a total of 10,000 metrics is requested when the dashboard containing the Top N graphs is requested to be displayed.

Now imagine that 10 Introscope users at different machines decide to log in and all at the same time look at one of the dashboards containing the Top N graphs. This requires the system to request and handle 10,000 metrics x 10 user instances as output to Workstations = 100,000 metrics requested at once. In this situation, it’s highly likely the users would experience slow Workstation performance as they click on the dashboard elements.

104 Workstation and WebView Requirements and Recommendations

Page 105: Intro Scope Sizing Guide

CHAPTER

5

Agent Requirements and Recommendations

This chapter provides background and specifics to help you understand sizing and performance-related agent requirements, settings, and limits for your Introscope system. In this chapter you’ll find the following topics:

Agent background . . . . . . . . . . . . . . . . . . 106

Agent sizing setup, settings, and capacity . . . . . . . . . . . 107

Agent metrics reporting limit . . . . . . . . . . . . . . . 107

Transaction Trace component clamp . . . . . . . . . . . . . 108

Agent maximum load when disabling Boundary Blame . . . . . . . 109

Configuring agent heuristics subsets. . . . . . . . . . . . . 109

Virtual agent metrics match limits . . . . . . . . . . . . . 109

Virtual agent reported applications capacity . . . . . . . . . . 110

Agents limits per Collector . . . . . . . . . . . . . . . . 110

Agent heap sizing. . . . . . . . . . . . . . . . . . . 110

High agent CPU overhead from deep nested front-end transactions . . . 111

Dynamic instrumentation . . . . . . . . . . . . . . . . 112

Agent Requirements and Recommendations 105

Page 106: Intro Scope Sizing Guide

CA Wily Introscope

Agent backgroundIn an Introscope deployment, the agent collects application and environmental metrics and relays them to the Enterprise Manager. Agent features that affect overhead are Boundary Blame, Transaction Trace sampling, and URL normalization.

The agent allows Introscope to collect minute details about how your applications are performing. What types of data the agent collects depends on which ProbeBuilder Directives (PBDs) files you choose to implement. Several standard PBDs are included when you install the Java or .NET agent, as well as specific PBDs for your application server. The instrumenting process is performed using CA Wily’s ProbeBuilding technology, in which tracers, defined in ProbeBuilder Directives (.pbd) files, identify the metrics an agent will gather from applications and the JMS virtual machines at run-time. ProbeBuilder Directive (.pbd) files tell the ProbeBuilder how to add Probes, such as timers and counters, to .NET or Java components that Introscope-enable the application. ProbeBuilder Directive files govern what metrics agents report to the Introscope Enterprise Manager. Custom directives can also be created to track classes and methods unique to specific applications.

About virtual agents

You can configure multiple physical agents into a single virtual agent, which enables an aggregated, logical view of the metrics reported by multiple agents.

A virtual agent is useful if you manage clustered applications with Introscope. A virtual agent comprised of the agents that monitor different instances of the same clustered application appears in Introscope as a single agent. This allows metrics from multiple instances of a clustered application to be presented at a logical, application level, as opposed to separately for each application instance.

For more information about virtual agents, see the Introscope Java Agent Guide.

You can check the total number of metrics matched by virtual agents by navigating to the following point in the Investigator tree:

SuperDomain*|Custom Metric Host (Virtual)|Custom Metric Process (Virtual)|Custom Metric Agent (Virtual)|Agents|Custom Metric Host (Virtual)|Custom Metric Process (Virtual)

Each of the virtual agents has a metric count. Sum all of these counts to determine the total number of metrics matched.

106 Agent Requirements and Recommendations

Page 107: Intro Scope Sizing Guide

Sizing and Performance Guide

Virtual agents are a significant drain on the CPU. For example, a 1500-metric virtual agent can result in a 10% increase in CPU usage. If the recommended number of metrics matched by the virtual agents is exceeded, there is significant impact on the CPU. There is some trade-off between the total number of applications (baselined heuristics) and virtual agents, since they are both dependent exclusively on CPU resources. In general, if the total number of monitored applications is significantly less than the limit, the metric match limit for virtual agents can be increased. However, metric match limit for virtual agents should never exceed 150% of the limit set in the guidelines.

A virtual agent deployed on a MOM only creates load on the Collectors, which do the aggregation and pass the result back to the MOM.

» Note Be aware that the Collector does most of the work in performing the calculations needed for virtual agents; the MOM is not performing the calculations.

Agent sizing setup, settings, and capacityThe topics below describe agent-related settings and capacity limits required to set up, maintain, or configure your Introscope 8.0 environment.

Agent metrics reporting limit

An agent should not report more than 15,000 metrics. To view a pie chart and table showing the metric counts for an agent, see About the Metric Count tab, below.

An inappropriately configured agent can create thousands of metrics in quick succession and overload the Enterprise Manager. To prevent this, the Enterprise Manager uses a metric clamp. For information about metric clamping, see Metric clamping on page 96.

About the Metric Count tab

By viewing the Metric Count tab you can assess the number and distribution of agent metrics in one centralized location.

To view the Metric Count tab

1 Select any node under the agent node.

2 Click the Metric Count tab in the right pane.

Study the pie chart and table of Resources metric count data as shown in the figure below. Mouse over an area of the pie chart to display a tool tip with metric count and percentage.

Agent sizing setup, settings, and capacity 107

Page 108: Intro Scope Sizing Guide

CA Wily Introscope

Transaction Trace component clamp

In the case of an infinitely expanding transaction—for example when a servlet executes hundreds of object interactions and backend SQL calls—Introscope clamps the Transaction Trace, resulting in a truncated trace. This helps prevent the JVM from running out of memory. By default, the Transaction Trace component clamp is set to limit the Transaction Trace at 5,000 components. When this limit is reached, warnings appear in the log, and the trace stops. In addition, clamped Transaction Traces are marked as truncated in the Workstation Transaction Trace Viewer. See the Introscope Workstation User Guide.

You can change the Transaction Trace component clamp value in the introscope.agent.transactiontrace.componentCountClamp property, which is found in the IntroscopeAgent.profile file. See the Introscope Configuration and Administration Guide or the Java Agent Guide.

» WARNING If the Transaction Trace component clamp size is increased, the memory required for Transaction Traces may increase. Therefore, the maximum heap size for the JVM may need to be adjusted accordingly, or else the managed application may run out of memory. See Agent heap sizing on page 110 and the Introscope Configuration and Administration Guide.

108 Agent Requirements and Recommendations

Page 109: Intro Scope Sizing Guide

Sizing and Performance Guide

Agent maximum load when disabling Boundary Blame

If you disable Boundary Blame on your 7.x or 8.0 agents, agents will generate more metrics than before Boundary Blame was disabled. For example, a system that generates 200,000 total metrics from Boundary Blame-enabled agents may generate 300,000 metrics after disabling Boundary Blame. While the resulting metrics may not incur the same processing cost per metric that Boundary Blame metrics do (Boundary Blame metrics incur additional baseline calculations overhead in the Enterprise Manager, while non-Boundary Blame-enabled metrics do not), be careful to ensure that the increased metric load is redistributed so that the Enterprise Manager maximum capacity doesn’t exceed the metrics limit.

» Important If in your Introscope system a 6.x agent is connected to a 7.x or 8.0 Enterprise Manager, it means that agent Boundary Blame is not enabled because 6.x agents weren’t capable of that feature. In this case, the maximum number of agents connected to the 7.x/8.0 Enterprise Manager, regardless of agent version, must adhere to the 6.x Enterprise Manager maximum number of agent limits.

For more information about Boundary Blame, see the Introscope Workstation User Guide.

Configuring agent heuristics subsets

You can alter the property introscope.enterprisemanager.heuristics.agentspecifier=.*

which is a regular expression that matches the agents for which heuristics are enabled. The default “.*” matches all agent names. Limiting this property to a subset of agents you are interested in can improve performance, largely without limiting the ability to analyze the Enterprise Manager. For more information, see the Introscope Configuration and Administration Guide.

Virtual agent metrics match limits

Given the very high impact of even a small virtual agent cluster, CA Wily recommends that for high-load Collectors, the total number of matched metrics in virtual agents be less than what’s shown in Sample Introscope 8.0 Collector sizing limits table on page 119 for a fully-loaded system. If your system isn't fully loaded (metrics or applications), you can have a higher number of matched metrics.

Agent sizing setup, settings, and capacity 109

Page 110: Intro Scope Sizing Guide

CA Wily Introscope

Virtual agent reported applications capacity

If the number of reported applications is significantly lower than the capacity limit for your platform, there should be enough CPU resources to increase this number. However, CA Wily does not recommend increasing this number by more than 50%.

Agents limits per Collector

The recommended number of agents per Collector is hardware dependent, as shown in the Sample Introscope 8.0 Collector sizing limits table on page 119.

If Introscope 6.0 agents are connected to an 8.0 Collector, the maximum number of agents should be kept at the Introscope 6.0 Enterprise Manager limits. For more information, see Agent maximum load when disabling Boundary Blame on page 109.

In Introscope 8.0, the MOM more effectively balances the metric load between Collectors in a clustered environment. To configure agent load balancing, see the Introscope Configuration and Administration Guide. To understand how agent load balancing affects Introscope performance, see Agent load balancing on MOM-Collector systems on page 63.

Agent heap sizing

You can view the agent GC heap usage in the CG Heap overview. See the Introscope Workstation User Guide.

The agent uses Java heap memory to store collected data. If your application’s heap is highly utilized, you may need to increase the heap allocation when you install the agent. See the Introscope Configuration and Administration Guide. The 8.0 agent, on average, uses slightly more memory than the 7.x agent because of performance improvements to CPU and response times overhead. You may see an increase of up to 100 MB over your 7.x average runtime heap usage.

If monitored applications are characterized by very deep or long lasting transactions, the agent’s Transaction Trace sampling may require more heap memory than previous Introscope versions. See Transaction Trace component clamp on page 108.

If you are operating a high-performance Introscope environment, contact CA Wily Professional services for the appropriate agent JVM heap settings.

110 Agent Requirements and Recommendations

Page 111: Intro Scope Sizing Guide

Sizing and Performance Guide

High agent CPU overhead from deep nested front-end transactions

Servlets are configured by Introscope to be seen as front-ends. A typical transaction starts with a servlet, which may call an EJB, which calls a back-end. It’s possible for servlets to call other servlets in a nested way, which Introscope sees as nested front-ends. In most cases, this does not add to agent CPU overhead.

However, deep transactions having nested front-end levels (for example 40 levels deep) may result in high CPU overhead. If this occurs, your Transaction Trace Tree View may look like this:

Notice in this case a servlet continuously calling itself (recurring call). This is just one example. This can also happen when a servlet continuously calls other servlets. In either case, you may see an increase in agent CPU overhead. If the overhead is unacceptable, contact CA Wily Technical Support.

Notice the recurring servlet calls. In this case, a servlet keeps calling itself, resulting in a 2125 ms transaction time for this deep nested transaction

Agent sizing setup, settings, and capacity 111

Page 112: Intro Scope Sizing Guide

CA Wily Introscope

Dynamic instrumentationIntroscope uses dynamic instrumentation (also called dynamic ProbeBuilding) to implement new and changed PBDs without restarting managed applications or the Introscope agent. This is useful for making corrections to PBDs, or to temporarily change data collection levels during triage or diagnosis without interrupting application service. For more information about dynamic instrumentation, see the Java Agent Guide or the .NET Agent Guide.

Dynamic instrumentation affects CPU utilization, memory, and disk utilization. This is because dynamic instrumentation includes redefining the monitored classes, which is a resource intensive process.

To avoid performance problems after you enable dynamic instrumentation, CA Wily highly recommends that you:

use configuration to minimize the classes that are being redefined (see the Java Agent Guide or the .NET Agent Guide)

incrementally change PBDs (don’t change a large number at one time)

do not change a large number of PBDs that affect many classes.

112 Agent Requirements and Recommendations

Page 113: Intro Scope Sizing Guide

APPENDIX

A

Introscope 8.0 Sizing and Performance FAQs

Frequently asked questions about Introscope sizing and performance are listed in the table below. Typical answers or solutions to each question are provided, with the most common being provided first, the second most common listed second, and so on.

Question Most Common Answers/Solutions

General Performance Questions

Can I handle the same number of metrics that I used to in 7.x versions of Introscope?

What about 6.x versions?

If you are upgrading from 7.x to 8.0, then number of metrics that Introscope 8.0 can handle is double then 7.2 limits. So, for example, if a given 7.x system used to handle 250 K metrics, that limit is now 500 K without requiring any changes to the hardware.

For more information, see 8.0 metrics setup, settings, and capacity on page 79 and Virtual agent metrics match limits on page 80.

My Collector is at maximum recommended capacity. I'm looking at the CPU, and the system doesn't appear busy. Why can't I add more metrics or agents to this Collector?

CPU monitoring tools show a snapshot. The behavior of the Collector is 100% CPU usage for 3-4 seconds (at full load), and then idle until the next metric harvest from the agents. This happens every 7.5 seconds, which is how CA Wily arrives at the 45% average CPU utilization recommendation.

For more information, see Collector metric capacity and CPU usage on page 45.

Introscope 8.0 Sizing and Performance FAQs 113

Page 114: Intro Scope Sizing Guide

CA Wily Introscope

What were the Introscope 8.0 sizing and performance improvements?

Significant scalability and performance improvements were made including the following:

1 In Introscope 8.0, SmartStor improvements resulted in the Collector metric limits doubling from the Introscope 7.x limits (based on the same hardware).

For examples, see the Sample Introscope 8.0 Collector sizing limits table on page 119.

2 Significant improvements to the MOM allow each MOM to connect to a five million metric cluster (10 collectors, 500 K metrics per collector), which is a five-fold increase in clustered Enterprise Manager scale

For examples, see Sample Introscope 8.0 MOM sizing limits table on page 122.

3 Adding an additional 2 CPUs to a Collector to make a total of 4 CPUs helps increase these limits:* number of applications per Collector* number agents per Collector* number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager).

For more information, see Increasing Collector capacity with more and faster CPUs on page 73.

4 Support for 50 concurrent Workstation connections.

For examples, see Sample Introscope 8.0 Collector sizing limits table on page 119.

Note: The limits may differ substantially depending on the specific platform and hardware used in your environment.

Question Most Common Answers/Solutions

114 Introscope 8.0 Sizing and Performance FAQs

Page 115: Intro Scope Sizing Guide

Sizing and Performance Guide

Component-related Questions

My Collector is combining time slices throughout the day and appears to respond slowly, but I'm at or below the maximum capacity limits. What could be wrong?

1 Other processes are running on the machine.

2 I/O contention with SmartStor and other processes. SmartStor is not located on a separate disk or I/O subsystem.

3 Poorly configured virtual agent.

4 Large Transaction Traces are running continuously.

For more information, see Reasons Collectors combine slices on page 72.

What hardware is required to run the Collector at maximum load?

1 This is primarily dependent on the CPU speed and dedicated disk I/O subsystem.

For more information, see Collector hardware requirements on page 71.

2 See examples of the appropriate hardware platform, OS, and CPU.

For more information, see Sample Introscope 8.0 Collector sizing limits table on page 119.

Does the Workstation use more memory than previous releases?

No, the Workstation uses less memory on average. It operates within the heap footprint specified in the .lax file.

For more information, see Workstation and WebView background on page 100.

Can I run multiple Workstations on the same machine?

Yes. However, you must be certain that the OS has dedicated physical RAM for each Workstation running in parallel, above the memory required for the OS itself.

For more information, see OS RAM requirements for Workstations running in parallel on page 100.

Can I run multiple Enterprise Managers on the same machine?

Yes. However, you must be certain to follow the CA Wily requirements when setting this up.

For more information, see Running multiple Collectors on one machine on page 74.

Question Most Common Answers/Solutions

115

Page 116: Intro Scope Sizing Guide

CA Wily Introscope

I launched my MOM and logged in, but I'm not seeing any metrics in the Investigator tree for a long time. Why does the MOM take a long time to begin sending data?

Large numbers of metric alerts in individual Collectors will cause a great deal of overhead in the MOM as the Collectors register these alerts in the MOM at startup. If the startup time is unacceptable, you will have to reduce the number of alerted metrics, or get a machine with faster individual CPUs.

For more information, see About alerted metrics and slow Workstation startup on page 81.

Can I connect more agents to a 8.x Collector than a 6.x or 7.x Collector?

Yes but only if there are 4 CPUs/Cores per EM available. See Sample Introscope 8.0 Collector sizing limits table on page 119 for some examples.

For more information, see Agents limits per Collector on page 110.

Why can't my MOM connect to more than 10 collectors?

The more Collectors that the MOM connects to, the more complicated the system becomes and the greater likelihood for instability or failure. For example, clock sync issues may be more difficult to manage, the system can take longer to start, and there's a higher likelihood that a misbehaving Collector can affect the entire cluster.

For more information, see MOM to Collectors connection limits on page 59.

Why is it so important to ensure that every Collector is running smoothly?

Any individual Collector can cause the entire system to appear slow and lock up, due to the synchronous mechanism the MOM uses to poll information from Collectors.

For more information, see Collector to MOM clock drift limit on page 71.

The requirements state that I can have X metrics in the virtual agents Can I exceed that number?

Yes, however, this impacts CPU significantly (not I/O or memory), so you must decrease the Collector's capacity.

Note: The Collector does the processing work needed for virtual agent operations, not the MOM.

For more information, see Agent background on page 106.

Question Most Common Answers/Solutions

116 Introscope 8.0 Sizing and Performance FAQs

Page 117: Intro Scope Sizing Guide

Sizing and Performance Guide

Will additional dedicated physical CPUs increase the number of metrics and agents that my Collector can handle?

Adding an additional 2 CPUs to a Collector to make a total of 4 CPUs helps increase these limits:* number of applications per Collector* number agents per Collector* number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager). In addition, faster CPUs may help increase the Collector's maximum capacity and improve performance.

For more information, see Collector hardware requirements on page 71 and the examples in Sample Introscope 8.0 Collector sizing limits table on page 119.

My system has 16 SPARC CPUs. Why can't a single Collector on this platform handle any more load than a 4 CPU Xeon machine?

Although the Collector is heavily multi-threaded, there are certain operations that require synchronization and cannot effectively leverage more than 4 CPUs. The Collector, therefore, does not scale well with additional CPUs beyond 4, depending on the hardware platform. Individual processor speed is the most important success factor for a Collector.

For more information, see Increasing Collector capacity with more and faster CPUs on page 73.

What are the main performance considerations for the MOM?

The MOM requires more powerful CPUs and better network connections than Collectors, but does not require fast disk access (the MOM performs little disk I/O).

For more information, see MOM disk subsystem sizing requirements on page 58.

I changed the virtual agent definitions in my MOM/Collector and everything came to a halt. What happened?

Note: In a clustered environment, deploy Management Modules and virtual agents only on the MOM, not on a Collector.

Hot deployment of virtual agents and Management Modules is very CPU intensive and can lock up the MOM for a couple of minutes during which metrics harvesting doesn't happen. CA Wily strongly recommends not performing Management Module hot deployments on production Collectors and MOMs.

For more information, see Avoid Management Module hot deployments on page 68.

Question Most Common Answers/Solutions

117

Page 118: Intro Scope Sizing Guide

CA Wily Introscope

Do Collectors and MOM have to be on the same subnet?

For best Workstation responsiveness, when a MOM requests data from a Collector, the round-trip response must be less than 500 ms. Whenever possible, a MOM and its Collectors should be in the same data center; preferably in the same subnet.

For more information, see Local network requirement for MOM and Collectors on page 51.

What is the limit for ChangeDetector events, Transaction traces, errors, stall events, and so on? How do I determine that limit? Is that for each?

The Collector effectively treats all of these as event objects. As of Introscope 7.1, the Maximum Number of Events limit represents the total number of events a Collector can receive and persist from all agents.There is one limit for steady state event persistence and another for burst capacity. Steady state means 24/7. Burst capacity means that the Collector can sustain this load for no more than a couple of hours.

For more information, see Collector events limits on page 70.

Question Most Common Answers/Solutions

118 Introscope 8.0 Sizing and Performance FAQs

Page 119: Intro Scope Sizing Guide

APPENDIX B

Sample Introscope 8.0 Collector and MOM Sizing Limits by OS

Sample ope ing tabluiremenmponenr, eithernt.

ant Tharall

The Maetrics r

nd virtund the tumber olso Addi

IntroscThe followsizing reqvarious coa Collectoenvironme

» Import

» Note * mafiNa

8.0 Collector sizing limits tablee shows example Introscope environment configurations and ts by operating system. This should help you understand the ts and requirements you'll need to consider if you are deploying as standalone machine or in a clustered Introscope

e machine configurations shown below are only examples; they e NOT provided as the only recommended Collector/s for any or Introscope environment(s).

ximum Number of Metrics numbers in the table includes the eported by agents as well as the metrics created by calculators al agents, which should be a reasonably small number. You can otal number reported as Enterprise Manager | Connection: f Metrics in the Enterprise Manager supportability metrics. See tional supportability metrics on page 38.

Page 120: Intro Scope Sizing Guide

Sizing and Performance Guide

Collector sizing limits table 120

S

OpSy

x # Work-tion nnections r

andalone

Max # EMs/machine

Max metrics in metric groupings/Standalone

EM

So 20 1 15% of Max # metrics

(60,000)

So 40 1 30% of Max # metrics

(120,000)

So 20 2 15% of Max # metrics

(60,000)

ReLin

50 1 15% of Max # metrics

(75,000)

Sample Introscope 8.0

erating stem

Hardware Physical RAM

JVM Heap

Max # Agents/EM

Max # Metrics*

Max # Applications/EM

Max # Events/minute

Max # Virtual Agent Matched Metrics

MastaCope

StEM

Steady State

Burst

laris 2 CPU

UltraSPARC III,

Clock speed ~ 1.2 GHz

4 GB 1.5 GB

200 400,000 900 700 3500 3000

laris 4 CPU

UltraSPARC III,

Clock speed ~ 1.2 GHz

4 GB 1.5 GB

250 400,000 1800 700 3500 3000

laris 4 CPU

UltraSPARC III,

Clock speed ~ 1.2 GHz

8 GB 1.5 GB

200 400,000 900 700 3500 3000

d Hat ux

2 CPU

Xeon or Opteron,

Clock Speed

~ 3 GHz

4 GB 1.5 GB

300 500,000 1500 1000 5000 5000

Page 121: Intro Scope Sizing Guide

CA Wily Introscope

d

Max # Work-station Connections per Standalone EM

Max # EMs/machine

Max metrics in metric groupings/Standalone

EM

50 1 15% of Max # metrics

(60,000)

50 1 15% of Max # metrics

(75,000)

50 1 30% of Max # metrics

(150,000)

50 215% of Max #

metrics

(75,000)

121 Sample Introscope 8.0 Collector and MOM Sizing Limits by OS

Operating System

Hardware Physical RAM

JVM Heap

Max # Agents/EM

Max # Metrics*

Max # Applications/EM

Max # Events/minute

Max # VirtualAgent MatcheMetrics

Steady State

Burst

AIX 5.3 4 CPU

Power 5,

Clock speed ~ 2.2 GHz

4 GB 1.5 GB

250 400,000 1500 850 4000 3500

Windows 2000/2003

2 CPU

Xeon or Opteron,

Clock Speed ~ 3 GHz

4 GB 1.5 GB

300 500,000 1500 1000 5000 5000

Windows 2000/2003

4 CPU

Xeon or Opteron,

Clock Speed ~ 3 GHz

4 GB 1.5 GB

400 500,000 3000 1000 5000 5000

Windows 2000/2003

4 CPU

Xeon or Opteron,

Clock Speed ~ 3 GHz

8 GB 1.5 GB

300 500,000 1500 1000 5000 5000

Page 122: Intro Scope Sizing Guide

Sizing and Performance Guide

e 8.0 MOM sizing limits table 122

nd sizing requirements by operating irements you'll need to consider if

; they are NOT provided as the only ent(s).

cs

erts

ster

Maximum # Workstation Connections

per

MOM-Collector Cluster

20

50

50

25

50

Sample Introscop

Sample Introscope 8.0 MOM sizing limits tableThe following table shows example Introscope environment configurations asystem. This should help you understand the various components and requyou are deploying a MOM in a clustered Introscope environment.

» Important The machine configurations shown below are only examplesrecommended Collector/s for any or all Introscope environm

Operating System

Hardware Physical RAM

JVM Heap

Maximum # Metriwith Associated Calculators and Al

per

MOM-Collector Clu

Solaris 2 CPU

UltraSPARC III,

Clock speed ~ 1.2 GHz

14 GB 12 GB 250,000

Red Hat Linux

4 CPU

Xeon or Opteron,

Clock Speed ~ 3 GHz

14 GB 12 GB 1,000,000

AIX 5.3 4 CPU

Power 5,

Clock speed ~ 2.2 GHz

14 GB 12 GB 500,000

Windows 2000/2003

2 CPU

Xeon or Opteron,

Clock Speed ~ 3 GHz

14 GB 12 GB 500,000

Windows 2000/2003

4 CPUXeon or Opteron,

Clock Speed ~ 3 GHz

14 GB 12 GB 1,000,000

Page 123: Intro Scope Sizing Guide

e 8.0 MOM sizing limits table 123

Sample Introscop
Page 124: Intro Scope Sizing Guide

CA Wily Introscope

124 Sample Introscope 8.0 Collector and MOM Sizing Limits by OS

Page 125: Intro Scope Sizing Guide

Sizing and Performance Guide

IndexSymbols*, See wildcard symbol

Aactive

metrics groupings, defined 80users and WebView servers limit 103

agentsCollector connection history and future con-nections 64

connection architecture 70heartbeat, defined 91how MOM assigns to Collectors 64increased CPU overhead from deep transac-tions involving multiple front-ends 111

load balancingand cluster fault tolerance for Collectors

62configuring frequency on MOM 66defined 63differentiated from metric clamping 63example scenarios 66metric counts after weight adjusting 65setting metric weight load 64setting threshold for imbalance 65

memory cache, removing dead metrics 91metric aging

defined 91performance problems related to 91properties to configure 92

Agents with Data metric 83Agents without Data metric 83alerts, and slow cluster start-up time 60applications, clustered and usefulness of virtual

agents 106asterisk, See wildcard symbol

Bbaselines.db

about 53calculating disk space needed 54

burst limitdefined 70events 70

business logic, Introscopehandled by 78monitors 79

Ccalculators, and slow cluster start-up time 60causes of slow start-up time, in cluster 60clamp

metrics See metrics clampTransaction Trace component 108

clock drift, performance problems due to 71cluster

agent load balancing examples 66applications and virtual agents 106cause of slow start-up time 60configuring

to support 1 million metrics on MOM 61tolerance for imbalance 65

determining when to implement if using astandalone EM 70

environment, explained 18fault tolerance for Collectors 62hanging prevented by MOM disconnecting un-der performing Collector 52

how MOM balances the metric load 63improving performance by adjusting Collectorweighting factors 64

likely cause of Workstation sluggishness in72

location of MOM and Collector metrics 28metric for total number of metrics currentlytracked in 30

overhead 65performance problems due to hot Manage-ment Module deployment 68

performance, based on Collectors 44poor performance due to a single Collector

59setting up metrics groupings in 79slow response time due to network bandwidthproblems 21

time synchronization 59, 72when MOM drops a Collector 51Workstation and WebView connections 102Workstation performance problems in 102

CLW, See Command Line Workstation

Index 125

Page 126: Intro Scope Sizing Guide

CA Wily Introscope

Collectorcluster performance 44CPU

requirements 74speed and disk I/O system 71steady state usage 45usage for high resource operations 45viewing high usage, 45

CPU usage, described 45diagnosing slow response to MOM using pingmetric 72

effect of faster CPUs 71file cache requirements 74good performance in individual 60hardware requirements 71how agents are assigned to by MOM 64increasing capacity with faster CPUs 73JVM requirements in 500 K metric MOM clus-ter 61

limits, increasing with more CPUs 73location of supportability metrics 28migrating from 6.x to 8.0 44persisting event objects 70ping time from MOM 51reperiodization 45run only Introscope process 72running multiple on one machine 74sign of overloaded 51sizing limits examples 119SmartStor minimum requirement 47synchronizing clocks with MOM 71under performing and cluster performance

59unresponsive 72upgrading 44using loadbalancing.xml to restrict agents tospecific 64

Workstation/WebView connections to in clus-tered environment 102

Command Line Workstation, heap size needed100

concurrent queriesrecommended number of historical 43

configuringagent failover when host defined in DNS 62agent metric aging 92how often MOM rebalances cluster agent load

66

MOM-Collector cluster 61RAID setting 57Workstation log-in when host defined in DNS

62connection

history, agent to Collector and future connec-tions 64

type MOM uses to assign agents to Collectors64

converting spool to data metric, defined 32CPU Overview tab 46CPUs

Collector requirements 74Enterprise Manager

basic requirement 47guidelines 25

fasterand Collector capacity 71using to increase Collector capacity 73

high usageCollector 45

increasing and Collector limits 73large MOM overhead and alerted metrics 81resource contention with WebView and EM

100speed, Collector 71usage

Collector 45Collector, described 45Collector, steady state 45during heuristics calculations 69reports 43scheduling of heavy processing 52

virtual agents using large resources 73WebView server load 103

custom scripts, scheduling 52

126 Index

Page 127: Intro Scope Sizing Guide

Sizing and Performance Guide

Ddashboard, using EM Capacity to determine

metric explosions 87dashboards, cause of slow WebView processing

103data, about historical queries and performance

problems 25dead metrics See metrics, deaddedicated controller property for SmartStor 55deployments, hot, Management Module cost 81disk

drive, determining number of controllers 57file cache size, SmartStor 48OS file cache 32space

estimating for baselines.db 54estimating for traces.db 54

DNSagent config for MOM failure 62Workstation log-in config for MOM failure 62

dynamicinstrumentation

defined 112performance problems related to 112

ProbeBuilding See dynamic instrumentation

EEM Capacity dashboard, using to determine

metric explosions 87em.db, See baselines.dbEnterprise Manager 48

capacity and metrics limits 21configuring for heap memory 32CPU

basic requirement 47resources and running WebView 100utilization guidelines 25

determining capacity 20finding problems using specific metrics 29heap settings 48metrics

grouping limits 79location 28

migrating from 6.x to 8.0 44OS disk file cache requirements 47overloaded and combining

metrics 69time slices 69

Overview tab 27processing of time slice data 78RAM minimum requirement 47running multiple Collectors on one machine

74standalone

connections to Workstations 101hardware requirements example 74

supportability metrics 51symptoms of metric explosion 85using EM Capacity dashboard 87when to grow from standalone to cluster 70

eventsdetermined number received 37high volume 70limit

burst 70maximum, defined 70steady state 70

objects, in Collector databases 70explosion, metrics See metrics explosion

Ffaillover, planning for MOM 62failure, planning for using MOM, See failoverfile

cache, requirements for Collector 74system, general requirements 47

flat file archiving, using with SmartStor 44front-ends, multiple and transaction problems

111

GGetEvent metric, See ping metricgraph, Top N, defined 103groupings, metrics, defined 78

Hhardware requirements

Collector 71MOM 58, 59, 60

harvest cycle, metrics 78Harvest Duration metric 29, 35, 45

Index 127

Page 128: Intro Scope Sizing Guide

CA Wily Introscope

heapcapacity (%) metric 34Command Line Workstation size needs 100settings, Enterprise Manager 48size

Enterprise Manager 32Workstation 100

heartbeat, agent, defined 91heuristics

CPU usage for calculations 69database, See baselines.db

Historical modeusing for viewing data in Workstation 43

historical queriesand EM agent data storage 25and MOM overloading 31poor performance caused by 40recommended number of concurrent 43running 43

hostdefined in DNS

and agent failover 62and Workstation log-in 62

hot deployments, of Management Modules, per-formance problems 81

II/O

contention, reason for SmartStor problems73

disksystem for Collector 71

throughput, SmartStor 49inactive metrics groupings, defined 80instrumentation, dynamic

and performance problems 112defined 112

Introscopeagent connection architecture 70business logic 78

defined 26monitors 79

improving slow startup time 81metric explosion prevention 91no other processes may run on Collector 72workload, defined 26

Is Clamped metric, about 98

JJVM

Collector requirements in cluster with 500 Kmetric MOM 61

heap settings, Enterprise Manager 48

Lleaks, metrics, symptoms 81limits

Collector, examples 119MOM, examples 122

limits, metrics, definition 20Live mode

viewing Workstation data in 43load

balancing for agent, defined 63reducing metrics 30

loadbalancing.xml, using to restrict agents tospecific Collectors 64

MManagement Module

cost of hot deployments 81hot deployment

and cluster problems 68and virtual agents 68

problems with hot deployment 68maximum events limit, defined 70memory, Workstation requirement 100metadata

file, about uncompressed 98SmartStor, using to find metric explosion 85

metricaging, agent defined 91clamp, differentiated from agent load balanc-ing 63

count metric, defined 98Metric Count metric 85Metric Count tab 107metrics

Agents with Data 83Agents without Data 83alerts, large numbers and performance prob-lems 81

baselining database, See baselines.dbchecked during agent heartbeat 91

128 Index

Page 129: Intro Scope Sizing Guide

Sizing and Performance Guide

clampabout related supportability metrics 98defined 96properties to enable 96scenario 98

cluster load balancing 63combined as symptom of overloaded EM 69converting spool to data 32counts, weight-adjusted for agent load bal-ancing 65

deadabout 91definedperformance problems related to 91removal 96

Enterprise Manager supportability 28explosion

and SmartStor metadata save time 85causes 84configuring 87defined 82, 84due to poorly-written SQL statements 92how Introscope prevents 91symptoms of 85

explosion, defined 84groupings

active, defined 80defined 78Enterprise Manager limits 79inactive, defined 80performance problems when using wild-

card symbol 80relationship to regular expression 78

groupings, setting up in a cluster 79harvest cycle 78Harvest Duration 29, 35, 45Heap Capacity (%) 34Is Clamped 98leaks

defined 82diagnosing using SmartStor metadata

save time 83, 85symptoms 81, 82

limitsand Enterprise Manager capacity 21definition 20MOM-Collector system 60related to Top N graphs 104

loadreducing 30

metadataabout 82problems with continuous growth 82symptoms of metrics leaks 82

Metric Count 85, 98Metrics with Data 83Number of Agents 71Number of Inserts Per Interval 70Overall Capacity (%) 33Partial Metrics with Data 83Partial Metrics without Data 83ping 51, 72SmartStor

capacity 80management 41

SmartStor Duration 36, 50subscribed

defined 79limits 59MOM limits 60

supportabilityIs Clamped 98Metric Count 98

using to find Enterprise Manager problems29

weight loadsetting for agent load balancing 64

Metrics with Data metric 83migrating, 6.x Enterprise Managers to 8.0 Col-

lectors 44MOM

alerted metrics and large CPU overhead 81configuring cluster to handle 1 million MOMmetrics 61

disconnected due to ping time threshold 52failure planning 62hardware requirements 58, 59, 60hardware requirements and subscribed met-rics limit 59

hot failover 62how assigns agents to Collectors 64limits on subscribed metrics 60location of supportability metrics 28ping time to Collector 51reasons for overload 31secondary backup for hot failover 62

Index 129

Page 130: Intro Scope Sizing Guide

CA Wily Introscope

sizing limits examples 122SmartStor instance, about 58synchronizing clock with Collectors 71to Collector

connection limit 59system metrics limit 60

WebView appears as Workstation client 100Workstation connections allowed 102

Nnetwork bandwidth problem, and slow cluster

response times 21Number of Agents metric 71Number of Inserts Per Interval metric 70

OOS

disk file cache requirements, EM 47memory requirements for Workstation 100RAM and disk file cache 32

Overall Capacity (%) metricdefined 33spiking 34

PPartial Metrics with Data metric 83Partial Metrics without Data metric 83passive users

defined 103WebView servers limit 103

performancecluster, and single under performing Collector

59dedicated controller property 55, 56improving cluster by adjusting Collectorweighting factor 64

in cluster causing MOM to drop Collectors 51individual Collector responsiveness 60load, WebView 101poor due to large historical queries 40problems

due to MOM to Collector clock drift 71due to recurring servlet calls 111from large continuous Transaction Traces

73in cluster due to Management Module hot

deployment 68metrics metadata continuous growth 82

related to agent metric aging 91related to large numbers of metrics alerts

81with Management Module hot deployment

68related to MOM-Collector connections 58sluggish in Workstation, typical cause 100WebView slow response times, cause of 103Workstation problems in cluster 102

pingmetric 51

about 72diagnosing a slow-responding Collector

72time

Introscope 72network 72threshold for Collector overload 51threshold that disconnects MOM 52

production Collector and MOMs, ManagementModule hot deployments in 68

Qqueries

historical See historical queriesscheduling large 52

RRAID

configurationrecommended 57setting 57

RAID 0 57RAID 5 57

RAMadding to

improve spooling time 32increase OS disk file cache 32

EM minimum requirement 47regular expression, relationship to metrics

groupings 78reperiodization 52

about 41Collector 45SmartStor, defined 40

130 Index

Page 131: Intro Scope Sizing Guide

Sizing and Performance Guide

reportsCPU usage 43scheduling large or long 43when to schedule 52

SSAN

using for SmartStor storage 57SAS controllers

using for SmartStor storage 57scheduling

custom scripts 52large queries 52reports 52

secondary backup MOM for hot failover 62servlets

performance problems from recurring calls111

recurring calls and high agent CPU overhead111

seen as Introscope frontends 111sizing

Collector limits examples 119MOM limits examples 122

SmartStorabout 40Collector minimum requirement 47dedicated controller property

about 55and performance 55, 56

default installation directory 20determining if drives are physically different

57flat file archiving recommendations 44I/O throughput 49management metrics, about 41metadata

files, about uncompressed 98save time and metric explosion 85

metricsabout metadata 82capacity 80metadata save time and metrics leaks

85metadata save time related to metrics

leaks 83MOM instance, about 58

problemsindications of 49with I/O contention 73

recommended RAID configuration 57reperiodization

defined 40verifying 41

requirements 48setting

RAID configuration 57up 49

spooling 52about 40disk file cache size requirements 48verifying 41

storageSAN guidelines 57SAS controllers guidelines 57

SmartStor Durationmetric 36, 50metric value 36

spool to data conversion task 40spooling

SmartStor 40time, lengthening 32

SQLAgent

Introscope statement normalizers 94showing many unique SQL metrics 92

statementscausing metric explosions 92normalizers 94

standalone Enterprise Managerhardware requirements example 74Workstation connections allowed to 101

startup time, improving slow Introscope 81steady-state, events limit 70subscribed metrics See metrics, subscribedsupportability metrics

Is Clamped 98Metric Count 98related to metric clamp 98

synchronizing, clock on clustered machines 59,72

system performance, determining general 47

Index 131

Page 132: Intro Scope Sizing Guide

CA Wily Introscope

Ttabs

CPU Overview 46Enterprise Manager Overview 27Metric Count 107

threshold, ping timefor Collector overload 51that disconnects MOM 52

time server software, use to synchronize ma-chine clocks in cluster 59, 72

time slicescombined, symptom of overloaded EM 69data processing in Enterprise Manager 78

tool tip 107Top N graph

defined 103metrics limit 104

traces.dbabout 53calculating disk space needed 54

Transaction Event database, See traces.dbTransaction Trace

component clamp 108dropped events metric 36events 36insert queue 36performance problems related to 73queue size 36

transactions, deep, involving multiple front-ends 111

Uupgrading, Collector 44using time server software 59, 72

Vvirtual agents

and Management Module hot deployments68

useful for clustered applications 106using large CPU resources 73

WWebView

cause of slow client response times 103connections in clusters 102dashboards and slow processing 103how MOM sees as Workstation 100performance load 101running on EM and CPU resource contention

100servers

CPU resource load 103user limits 103

wildcard symbol, performance issues in metricsgroupings 80

Workstationconnections

allowed to MOM 102allowed to standalone Enterprise Manager

101in clusters 102

heap footprint 100memory requirement 100OS memory requirements 100performance problems in cluster 102sluggishness

cause in a cluster 72typical cause 72, 100

viewing data in Live mode 43

132 Index