iaa data analytics and performance monitoring overview · pdf file1 gb 1,024 mb or 1,0243...

Hitachi Infrastructure Analytics AdvisorData Analytics and Performance

MonitoringOverview

MK-96HIAA004-01October 2016

© 2016 Hitachi, Ltd. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or by any means, electronicor mechanical, including copying and recording, or stored in a database or retrieval system forcommercial purposes without the express written permission of Hitachi, Ltd., or Hitachi Data SystemsCorporation (collectively “Hitachi”). Licensee may make copies of the Materials provided that any suchcopy is: (i) created as an essential step in utilization of the Software as licensed and is used in noother manner; or (ii) used for archival purposes. Licensee may not make any other copies of theMaterials. “Materials” mean text, data, photographs, graphics, audio, video and documents.

Hitachi reserves the right to make changes to this Material at any time without notice and assumesno responsibility for its use. The Materials contain the most current information available at the timeof publication.

Some of the features described in the Materials might not be currently available. Refer to the mostrecent product announcement for information about feature and product availability, or contactHitachi Data Systems Corporation at https://support.hds.com/en_us/contact-us.html.

Notice: Hitachi products and services can be ordered only under the terms and conditions of theapplicable Hitachi agreements. The use of Hitachi products is governed by the terms of youragreements with Hitachi Data Systems Corporation.

By using this software, you agree that you are responsible for:1. Acquiring the relevant consents as may be required under local privacy laws or otherwise from

authorized employees and other individuals to access relevant data; and2. Verifying that data continues to be held, retrieved, deleted, or otherwise processed in

accordance with relevant laws.

Notice on Export Controls. The technical data and technology inherent in this Document may besubject to U.S. export control laws, including the U.S. Export Administration Act and its associatedregulations, and may be subject to export or import regulations in other countries. Reader agrees tocomply strictly with all such regulations and acknowledges that Reader has the responsibility to obtainlicenses to export, re-export, or import the Document and any Compliant Products.

Hitachi is a registered trademark of Hitachi, Ltd., in the United States and other countries.

AIX, AS/400e, DB2, Domino, DS6000, DS8000, Enterprise Storage Server, eServer, FICON,FlashCopy, IBM, Lotus, MVS, OS/390, PowerPC, RS/6000, S/390, System z9, System z10, Tivoli,z/OS, z9, z10, z13, z/VM, and z/VSE are registered trademarks or trademarks of InternationalBusiness Machines Corporation.

Active Directory, ActiveX, Bing, Excel, Hyper-V, Internet Explorer, the Internet Explorer logo,Microsoft, the Microsoft Corporate Logo, MS-DOS, Outlook, PowerPoint, SharePoint, Silverlight,SmartScreen, SQL Server, Visual Basic, Visual C++, Visual Studio, Windows, the Windows logo,Windows Azure, Windows PowerShell, Windows Server, the Windows start button, and Windows Vistaare registered trademarks or trademarks of Microsoft Corporation. Microsoft product screen shots arereprinted with permission from Microsoft Corporation.

All other trademarks, service marks, and company names in this document or website are propertiesof their respective owners.

2IAA Data Analytics and Performance Monitoring Overview

https://support.hds.com/en_us/contact-us.html

Contents

Preface................................................................................................. 5Product version........................................................................................................6Intended audience................................................................................................... 6Related documents.................................................................................................. 6Document conventions............................................................................................. 6Conventions for storage capacity values.....................................................................7Accessing product documentation............................................................................. 8Getting help.............................................................................................................8Comments...............................................................................................................8

1 Introduction..................................................................................... 9Product overview................................................................................................... 10Key features.......................................................................................................... 11

Unified infrastructure monitoring dashboard....................................................... 11Advanced reporting...........................................................................................12SLO management............................................................................................. 12System and Resource Events............................................................................. 13End-to-end monitoring...................................................................................... 14Problem identification and root cause analysis.................................................... 14Storage IO controls...........................................................................................14

Logging on to Infrastructure Analytics Advisor .........................................................15Accessing Data Center Analytics.............................................................................. 15

2 Performance monitoring using advanced threshold settings................17Threshold profiles.................................................................................................. 18Advanced threshold settings................................................................................... 19Determining the threshold type for your environment............................................... 19Dynamic thresholds................................................................................................19

Advantages of dynamic thresholds..................................................................... 19Determining if the computed value is correct...................................................... 20Automatic calculation of baseline values............................................................. 20Setting dynamic thresholds using monitoring profiles...........................................22

Static thresholds.................................................................................................... 24Setting static thresholds using monitoring profiles...............................................25


For system resources...................................................................................27

3 End-to-end performance troubleshooting.......................................... 29Identifying performance problems........................................................................... 30Infrastructure components and key performance metrics.......................................... 30Troubleshooting high response times....................................................................... 32Troubleshooting workflow....................................................................................... 32Detecting performance problems.............................................................................33Analyzing performance bottleneck........................................................................... 34

Analyzing in E2E view....................................................................................... 34Analyzing in Verify Bottleneck window................................................................35Analyzing in Sparkline view................................................................................36Analyzing in Detail view.....................................................................................37

Analyzing the root cause of the bottleneck...............................................................38Identify affected resources................................................................................ 38Analyze shared resources.................................................................................. 38Analyze related changes....................................................................................39

Solving performance problems................................................................................ 41

4 Optimizing infrastructure resources with storage IO controls.............. 45IO controls for optimization of Infrastructure resources.............................................46IO control settings for a SLO...................................................................................46IO controls for optimizing IO performance after the bottleneck analysis..................... 46

5 Flexible reporting and analysis using Data Center Analytics................ 49

6 Monitoring and quick troubleshooting with Data Center Analytics........53

7 Strategic planning using trend analysis in Data Center Analytics......... 57


PrefaceThis preface includes the following information:

□ Product version

□ Intended audience

□ Related documents

□ Document conventions

□ Conventions for storage capacity values

□ Accessing product documentation

□ Getting help

□ Comments

Preface 5IAA Data Analytics and Performance Monitoring Overview

Product versionThis document revision applies to Infrastructure Analytics Advisor 2.1 or later.

Intended audienceThis document provides an overview of the Hitachi Infrastructure AnalyticsAdvisor software. This document is intended for storage administrators andinfrastructure administrators.

Related documentsThe following documents are referenced or contain more information aboutthe features described in this overview.

• Hitachi Infrastructure Analytics Advisor Installation and ConfigurationGuide, MK-96HIAA002

• Hitachi Infrastructure Analytics Advisor User Guide, MK-96HIAA001• Hitachi Infrastructure Analytics Advisor REST API Reference Guide,

MK-96HIAA003• Hitachi Data Center Analytics User Guide, MK-96HDCA002• Hitachi Data Center Analytics REST API Reference Guide, MK-96HDCA006• Hitachi Data Center Analytics Query Language User Guide, MK-96HDCA005

Document conventionsThis document uses the following typographic conventions:

Convention Description

Bold • Indicates text in a window, including window titles, menus, menu options,buttons, fields, and labels. Example:Click OK.

• Indicates emphasized words in list items.

Italic • Indicates a document title or emphasized words in text.• Indicates a variable, which is a placeholder for actual text provided by the

user or for output by the system. Example:pairdisplay -g group(For exceptions to this convention for variables, see the entry for anglebrackets.)

Monospace Indicates text that is displayed on screen or entered by the user. Example:pairdisplay -g oradb

< > angle brackets Indicates variables in the following scenarios:

6 PrefaceIAA Data Analytics and Performance Monitoring Overview

Convention Description

• Variables are not clearly separated from the surrounding text or fromother variables. Example:Status-<report-name><file-version>.csv

• Variables in headings.

[ ] square brackets Indicates optional values. Example: [ a | b ] indicates that you can choose a,b, or nothing.

{ } braces Indicates required or expected values. Example: { a | b } indicates that youmust choose either a or b.

| vertical bar Indicates that you have a choice between two or more options or arguments.Examples:

[ a | b ] indicates that you can choose a, b, or nothing.

{ a | b } indicates that you must choose either a or b.

This document uses the following icons to draw attention to information:

Icon Label Description

Note Calls attention to important or additional information.

Tip Provides helpful information, guidelines, or suggestions for performingtasks more effectively.

Caution Warns the user of adverse conditions and/or consequences (forexample, disruptive operations, data loss, or a system crash).

WARNING Warns the user of a hazardous situation which, if not avoided, couldresult in death or serious injury.

Conventions for storage capacity valuesPhysical storage capacity values (for example, disk drive capacity) arecalculated based on the following values:

Physical capacity unit Value

1 kilobyte (KB) 1,000 (10 3) bytes

1 megabyte (MB) 1,000 KB or 1,0002 bytes

1 gigabyte (GB) 1,000 MB or 1,0003 bytes

1 terabyte (TB) 1,000 GB or 1,0004 bytes

1 petabyte (PB) 1,000 TB or 1,0005 bytes

1 exabyte (EB) 1,000 PB or 1,0006 bytes

Logical capacity values (for example, logical device capacity) are calculatedbased on the following values:

Preface 7IAA Data Analytics and Performance Monitoring Overview

Logical capacity unit Value

1 block 512 bytes

1 cylinder Mainframe: 870 KB

Open-systems:• OPEN-V: 960 KB• Others: 720 KB

1 KB 1,024 (210) bytes

1 MB 1,024 KB or 1,0242 bytes

1 GB 1,024 MB or 1,0243 bytes

1 TB 1,024 GB or 1,0244 bytes

1 PB 1,024 TB or 1,0245 bytes

1 EB 1,024 PB or 1,0246 bytes

Accessing product documentationProduct user documentation is available on Hitachi Data Systems SupportConnect: https://knowledge.hds.com/Documents. Check this site for themost current documentation, including important updates that may havebeen made after the release of the product.

Getting helpHitachi Data Systems Support Connect is the destination for technical supportof products and solutions sold by Hitachi Data Systems. To contact technicalsupport, log on to Hitachi Data Systems Support Connect for contactinformation: https://support.hds.com/en_us/contact-us.html.

Hitachi Data Systems Community is a global online community for HDScustomers, partners, independent software vendors, employees, andprospects. It is the destination to get answers, discover insights, and makeconnections. Join the conversation today! Go to community.hds.com,register, and complete your profile.

CommentsPlease send us your comments on this document to [email protected] the document title and number, including the revision level (forexample, -07), and refer to specific sections and paragraphs wheneverpossible. All comments become the property of Hitachi Data SystemsCorporation.

Thank you!

8 PrefaceIAA Data Analytics and Performance Monitoring Overview

https://knowledge.hds.com/Documents

https://knowledge.hds.com/

https://support.hds.com/en_us/contact-us.html

https://community.hds.com/welcome

https://community.hds.com/welcome

mailto:[email protected]

1Introduction

This module introduces Infrastructure Analytics Advisor.

□ Product overview

□ Key features

□ Logging on to Infrastructure Analytics Advisor

□ Accessing Data Center Analytics

Introduction 9IAA Data Analytics and Performance Monitoring Overview

Product overviewWith Infrastructure Analytics Advisor, you can define and monitor storageservice level objectives (SLOs) for resource performance. You can identifyand analyze historical performance trends to optimize storage systemperformance and plan for capacity growth.

Using Infrastructure Analytics Advisor, you register resources (storagesystems, hosts, servers, and volumes) and set service-level thresholds. Youare alerted to threshold violations and possible performance problems(bottlenecks). Using analytics tools, you find which resource has a problemand analyze its cause to help solve the problem.

The following figure describes how the Infrastructure Analytics Advisorensures the performance of your storage environment based on real-timeservice level objectives (SLOs).

The system administrator uses Hitachi Infrastructure Analytics Advisor (HIAA)to manage and monitor the IT infrastructure based on SLOs, which match theservice-implementation guidelines that are negotiated under a service levelagreement (SLA) with consumers.

Infrastructure Analytics Advisor monitors the health of the IT infrastructureusing performance indicators and generates alerts when SLOs are at risk.

Having data center expertise, the service administrator uses InfrastructureAnalytics Advisor to assign resources, such as VMs and storage capacity fromregistered storage systems, to consumer applications. The purpose of doingthis is to manage critical SLO violations and to ensure that serviceperformance meets the service level agreements.

10 IntroductionIAA Data Analytics and Performance Monitoring Overview

Key featuresThe key features of Infrastructure Analytics Advisor are described in thissection.

Unified infrastructure monitoring dashboardHitachi Infrastructure Analytics Advisor dashboards are visual representationsof the performance metrics of your infrastructure resources. The consolidatedview allows you to quickly interpret the performance metrics and identifyperformance problems.

The consolidated dashboard view allows for the unified management of theserver, storage, and network infrastructure resources. You can ensure thehealth of your data center by proactively monitoring the consumer groups,storage components, volumes, VMs, servers, and network devices. Theadvanced visual analytics aids in visualizing the performance data in easy-to-use graphs and charts. The visual cues allow for intuitive performancemanagement.

The functions of the Infrastructure Analytics Advisor dashboard are asfollows:• Displays performance metrics summaries for the monitored resources.• Displays warnings and critical alerts that need immediate action.• Displays performance trends.• Drill down from summary reports to detailed reports.• Ability to navigate to the E2E topology view for detailed analysis.


Advanced reportingInfrastructure Analytics Advisor reporting capabilities enable you to monitorthe infrastructure resources and assess their current performance, capacityand utilization. Reporting data provides you the information you need tomake informed business decisions and plan for future growth.

Infrastructure Analytics Advisor supports both standard and custom reportingcapabilities.

Standard reports:• Default reports. The first time you log on to Infrastructure Analytics

Advisor, the Dashboard shows the following reports by default: SystemStatus Summary, Event Trends, System Resource Status , and ResourceEvents. You can customize which reports display by default.

• Critical reports. Critical reports show resources in your storageinfrastructure that exceeded their thresholds. Critical reports are availablefor consumers, VMs, volumes, hosts, and system resources.

• Summary reports. Summary reports give you a high-level view of storageinfrastructure resources. These reports are available for consumers, VMs,volumes, and system resources. Each summary report shows the numberof resources with critical and warning alerts.

• Other reports. Infrastructure Analytics Advisor provides additional reportsabout hypervisors, switches, and system and resource events.

Custom reports:

By integrating with Data Center Analytics, you can create custom reports byrunning queries on performance data that is collected from monitoredresources. You can also create real-time and historical reports that arespecific to your business needs.

SLO managementSLOs are measurable parameters which are defined for monitoring theperformance of user resources. With Infrastructure Analytics Advisor you canevaluate, define, and customize the service level objectives defined for themonitored resources such as volumes and VMs. By monitoring the SLOs youcan determine if your infrastructure provides enough performance to meetthe end user requirements specified in the SLA.

Infrastructure Analytics Advisor offers the capability to establish and monitorstorage service level objectives for business-critical applications and logicalstorage devices. When a service level threshold is exceeded, integrateddiagnostic aids facilitate in identifying the root cause. For storage operations,you can use the IO Control settings feature to set upper limits across a rangeof consumers by grade based on an SLO.


System and Resource EventsYou can view the latest events in one place and manage the events based onthe status.

The Events tab allows you to display details about significant events in yourmonitored environment.

There are two categories of events:• System Events

The System Events tab displays Management and Event Action eventsgenerated when system settings must be verified or configured.

• Resource EventsThe Resource Events tab displays Performance events generated when adevice or component (server, storage system, network device, and so on)does not perform optimally.You can analyze the Resource events by using the end-to-end networktopology view to identify the resource that generated the event.

The All Events tab displays both System and Resource events. Each eventindicates the level of the alert, the date and time of the alert message,category, device name, and component name. Click a message in theMessage column to display the Event Detail window.

Use the Event Detail window to display more event details, such as thedevice type and component type. You can scroll through the list for moreevents. For Resource events, you can click Show E2E View to view thenetwork topology.

The Event levels classifications are as follows:• Critical: Event that requires immediate attention


• Warning: Event that might become critical in the future• Informational: No immediate action required

End-to-end monitoringThe E2E topology view provides detailed configuration of the infrastructureresources and lets you view the relationship between the infrastructurecomponents. You can manually analyze the dependencies between thecomponents in your environment and identify the resource causingperformance problems. By using the topology maps, you can easily monitorand manage your resources. You can use this view to monitor resources inyour data center from applications, virtual machines, server, network tostorage.

In the E2E view, each node represents a resource and the connecting linksrepresent the relationship between the infrastructure components. You cananalyze a resource which is the target of analysis and all the associatedresources. You can also view the alerts associated with all the relatedresources and trace the problem at the root level. The node based E2E viewhelps you analyze the problem on the affected node and its impact on therest of the infrastructure resources.

Problem identification and root cause analysisThe performance problems might occur because of varying system loads,applications updates, capacity upgrades, configuration changes and inefficientmanagement of resources in the shared infrastructure.

The Infrastructure Analytics Advisor advanced diagnostic engine aids inrapidly diagnosing, troubleshooting, and finding the root cause ofperformance bottlenecks.

Storage IO controlsStorage IO controls allows you to set and modify limits on volumes.

In the Data Center, some types of resources often require higherperformance than other. For example, production servers such as databaseand application servers used to perform daily tasks of business organizationsusually require high performance. However, if production servers experiencedecreased performance, productivity in business activities is negativelyaffected.

To prevent this from happening, the storage administrator needs to maintainthe high performance of production servers. A drop in development serverperformance does not have as much of a negative effect to the entireorganization as a drop in production server performance. In this case, you setupper limits to give higher priority to IO activity from the production serverover IO activity from the development server to manage and control theimpact of development activities.


Storage IO controls are available in Infrastructure Analytics Advisor whenServer Priority Manager is installed on your storage systems. You can invokethis function through Hitachi Automation Director after establishing aconnection between the two servers. Alternately, if Hitachi AutomationDirector is not installed on your storage system, you can use the CM RESTAPI to create a script, which serves as a template that you modify forselected volumes to run the Server Priority Manager operation.

Using the IO control setting, you can enable upper limits for the storage IOactivity of volumes that belong to consumers. The storage administratordisables the IO control setting when the traffic between the server andstorage system drops to acceptable levels. Furthermore, you have the optionof limiting the data transfer rate on volumes affecting critical resources.

Set IO control limits for the following:• To achieve overall optimization of infrastructure resources during periods

of IO-intensive activity• To maintain a quality of service benchmark for an SLO• To prioritize IO activity to optimize performance

Logging on to Infrastructure Analytics AdvisorAccess the Infrastructure Analytics Advisor web interface from a supportedbrowser.

Procedure

1. Open a web browser.2. Enter the URL for Infrastructure Analytics Advisor in the address bar:

http://host-name-or-ip-address-of-the-server-where-InfrastructureAnalytics Advisor-is-installed:port-number/Analytics/login.htm

where port-number is the port number of the Infrastructure AnalyticsAdvisor management server. The default port number is 22015.

To access Infrastructure Analytics Advisor in secure mode, enter: https://host-name-or-ip-address-of-the-server-where-Infrastructure AnalyticsAdvisor-is-installed:port-number/Analytics/login.htm

The default port number for secure mode is 22016.3. Type a user ID and password to log on.4. Click Log In.

Accessing Data Center Analytics


Use Data Center Analytics to conduct historical trend analysis across a wideset of infrastructure statistics, create advanced monitoring custom reports,and to interactively do additional troubleshooting and diagnostics.

Access Data Center Analytics from the Tools menu.

Use the Data Center Analytics online help to view details about reportingtasks and features.


2Performance monitoring using

advanced threshold settingsInfrastructure Analytics Advisor ensures health of your data center bymeasuring, monitoring, and optimizing the performance of your infrastructureresources.

□ Threshold profiles

□ Advanced threshold settings

□ Determining the threshold type for your environment

□ Dynamic thresholds

□ Static thresholds

Performance monitoring using advanced threshold settings 17IAA Data Analytics and Performance Monitoring Overview

Threshold profilesYou can define the monitoring parameters for target resources in thethreshold profiles. Monitoring parameters vary depending on the type ofresources being monitored.

The profile details page contains information about the profile name,description, and if the profile uses the preset parameters defined in themonitoring template.

Two types of threshold profiles are available for monitoring purposes:• User Resource Threshold Profiles

You can define monitoring parameters for user resources such as volumes,and VMs using User Resource Threshold profiles.You can perform the following tasks for monitoring user resources:• Monitoring plans: You can create plans for monitoring performance of

resources whose workloads vary at different times of the day or week.For example, you can create separate monitoring plans for managingvarying workloads that occur at different time periods, such asweekdays and weekends, peak workload periods and off workloadperiods, and so on.

• Threshold settings: You can configure the threshold settings for userresources. The threshold settings determine when an alert should betriggered. You can monitor user resources using dynamic or staticthresholds.

• Automated resource assignment: You can create rules and conditions toautomate resource assignment to monitoring profiles. Using these rules,the newly discovered user resources are automatically assigned to theexisting user resource threshold profiles. The resources associated witha threshold profile are monitored based on the parameters defined inthe profile. You can also manually assign resources to the monitoringprofiles.

• System Resource Threshold ProfilesYou can define monitoring parameters for system resources such asSwitches, Hypervisors, and Storage Systems using System ResourceThreshold profiles.You can perform the following tasks for monitoring system resources:• Threshold settings: You can configure the threshold settings for system

resources. The threshold settings determine when an alert should betriggered. You can monitor system resources using static thresholds.

• Manual resource assignment: You can manually assign resources to thesystem resource monitoring profiles. The resources associated with athreshold profile are monitored based on the parameters defined in theprofile.

18 Performance monitoring using advanced threshold settingsIAA Data Analytics and Performance Monitoring Overview

Advanced threshold settingsInfrastructure Analytics Advisor supports monitoring of the performancemetrics defined for your infrastructure resources using dynamic and staticthresholds.

Determining the threshold type for your environmentDetermining the appropriate threshold is essential for monitoringperformance and ensuring compliance.

As a system administrator, you must configure your environment to meet theSLO requirements. However, over time, system performance can change. Toadapt to these changes while continuing to maintain the SLOs, you mustmonitor the system closely and periodically you might be required to changethe performance thresholds.

Two types of performance metric thresholds available for monitoring. Thetype you choose depends on various factors, such as monitored environment,monitored resources, business objectives and others.• Dynamic thresholds: Dynamic thresholds are system computed values,

which keeps evolving depending on the performance of your system. Thesystem analyzes the historical performance trends and computes anappropriate baseline value.

• Static thresholds: Static threshold values are user-defined static valuesthat are used for monitoring a system with a predictable performancepattern.

Dynamic thresholdsDynamic thresholds are calculated automatically by analyzing the loadpattern from the historical data. These values are adaptive in nature andchanges over a period of time depending on the performance of yourresources, workload changes and so on. You can monitor only the userresources, such as volumes, VMs, and hosts by using dynamic thresholds.

The scenarios when you would use dynamic threshold values for monitoringyour environment are as follows:• When SLOs and other performance parameters are not established with

the customer

• When you want to monitor your environment for stable performance anddetect irregular behavior

Advantages of dynamic thresholds


With changing business requirements and performance goals, monitoringperformance of your environment using predefined static thresholds mightnot be a feasible solution. The static values are calculated through trial anderror, which is often time-consuming. These values become out of context inthe long-term and the settings must be re-evaluated to ensure compliance.

Manually altering the thresholds each time there is a change in the systemdynamics is a futile effort. By automating the threshold setting you gainbetter visibility into your environment and performance trend patterns.Dynamic thresholds adapt to your environment and proactively sends alertsbefore the performance bottleneck occurs.

Determining if the computed value is correctIf the computed values match your requirements, you can continue to usethe dynamic thresholds for monitoring your environment. If you receive toomany false alerts, you can manually edit the dynamic threshold values. Forexample, during migration process, a resource might have a large number ofdisk IOs temporarily and you might receive a number of false alerts. In thissituation, you can manually edit the baseline value to account for thetemporary increase in the load, and then allow the system to dynamicallyadjust the baseline values when the stable operation is restored.

Automatic calculation of baseline valuesDetermining an appropriate threshold is essential while monitoring businesscritical applications. Infrastructure Analytics Advisor analyzes the peak,normal, and low volume phases based on the historical data and adjusts themonitoring thresholds accordingly. Automating the threshold calculationeliminates false alerts and reduces the number of alerts to investigate whichmight otherwise become a management overhead.

The application workloads might vary at different times of the day or week.For example, the workload pattern of an OLTP application might be differenton weekdays and weekends. You can manage varying workloads that occur atdifferent time periods for an application by creating monitoring plans. Thesystem analyzes the performance data accumulated in the scheduled baselineperiod for computing the dynamic threshold values.

The following example shows the response time metrics of a business-criticalapplication monitored over time and how the system derives the automaticthreshold values based on the past performance. The high-level steps thesystem uses to calculate the automatic baseline values are as follows:


• Analyzes historical data for identifying the performance patterns in thespecified baseline period.

• Detects and removes the occasional outliers: In the following example, thedata points that deviates from the norm represent the outliers. The systemignores the outliers appearing at irregular intervals to calculate anappropriate threshold value.

• Calculates the maximum value: The upper limit of the values in the normalrange is used to calculate the maximum value. After determining themaximum value, the system adds the margin of error to the computedvalue.


• Determines the weighted average: The weighted average derives thethreshold values based on the past performance trends over a specifiedtime period.

Setting dynamic thresholds using monitoring profilesYou can create monitoring profiles with dynamic thresholds for managinguser resources only. System resources cannot be monitored using dynamicthresholds.

Using the user resource threshold profile, you can apply dynamic thresholdsacross user resources within your environment. For example, using a userresource threshold profile, you can apply a dynamic threshold setting for allvolumes in an application.

You can create monitoring plans for an OLTP application, whose workloadsvary during weekdays and weekends. You can also create a separate plan formonitoring batch jobs that run at night. The procedure for enabling dynamicthresholds is as follows:

Procedure

1. On the Administration tab, from the navigation pane select MonitoringSettings > User Resource Threshold Profiles > Create ThresholdProfile.

2. In the Create User Resource Threshold Profile window, enter theprofile name, description, select the resource type, and the acceptablemargin of error for sending alerts (Severe, Normal, and Rough).


3. Under Monitoring Plans, click Create Plan to create new monitoringplans. You can either edit the base plan, or create a new plan.

4. In the Create Plan window, enter the plan name, and set the targetperiod. Under Target metric, you will see a list of performance metricsrelated to the selected resource.


5. Click Dynamic to enable dynamic monitoring mode and click OK.6. To save the profile, click OK.

After you save the profile, you are navigated to the profile detail window,where you can assign target resources, or create resource assignmentrules.

7. In the profile detail window you can do the following:• On the Assignment Rules tab, you can create rules for assigning

resources to the monitoring profile automatically.• On the Target Resources tab, you can assign the resources to the

profile manually. You can also view the existing target resourcesassociated to the monitoring profile.

Static thresholdsStatic thresholds are user-defined thresholds which you can manuallyconfigure for use at different times of the day or week depending on theworkload in your environment.

You can use predefined static threshold values in the following scenarios:• When you have a well-defined service level objective which clearly

establishes the performance goals.For example, if you have a service level agreement with the customer tosupport online transactions at a response time of less than 1 second for abusiness critical application, then you can create a User resource thresholdprofile to establish the response time and other performance requirementsfor the application and then assign the target resources for monitoring. If


there is a SLO violation, the system sends a critical alert or a warning andnotifies the user before the problem becomes serious. You can alsogenerate a report that compares the actual response time of the businesscritical application to the SLO and see if your objectives are in complianceand take necessary measures to fix the problem.

• When you can assess the workload patterns in your environment and knowwhat values to assignFor example, define the threshold for a system resource based on thearchitecture of the storage system. If the storage system is VSP G1000,then the recommended MPB (MP Blade) usage is under 60%.

Setting static thresholds using monitoring profilesYou can create monitoring profiles with static thresholds for managing userand system resources. The performance parameters defined in the thresholdprofile determine when an alert is triggered.

Create threshold profiles for user or system resources based on the resourcetype, and then assign the resources you want to monitor.

For user resources

Procedure

1. On the Administration tab, from the navigation pane, selectMonitoring Settings > User Resource Threshold Profiles > CreateThreshold Profile.

2. In the Create User Resource Threshold Profile window, enter theprofile name, description, and select the resource type.

3. On the Monitoring Plans tab, click Create Plan to create newmonitoring plans. You can either edit the base plan or create a new plan.

4. In the Create Plan window, set the target period for monitoring. UnderTarget metric, click Static to enable static monitoring mode. You mustmanually enter the threshold values for the target metrics when youenable static monitoring mode.


5. To save the profile, click OK.After you save the profile, you are navigated to the profile detail window,where you can assign target resources, or create resource assignmentrules.

6. In the profile detail window you can do the following:• On the Assignment Rules tab, you can create rules for assigning

resources to the monitoring profile automatically.• On the Target Resources tab, you can assign the resources to the

profile manually. You can also view the existing target resourcesassociated to the monitoring profile.


For system resourcesThe procedure for setting static threshold for system resources is as follows:

Procedure

1. Go to the Administration tab, from the navigation pane selectMonitoring settings > System Resource Threshold Profiles >Create Threshold Profile.

2. In the Create System Resource Threshold Profile window, enter theprofile name, description, and select the resource type. If required, copythe settings from the default profile or existing system resource profiles.

3. Under threshold values, manually enter the threshold values for theperformance metrics.

4. Under Target Resources, click Add Resources to manually assignresources to the system resource threshold profile.


3End-to-end performance

troubleshootingInfrastructure Analytics Advisor provides analytical diagnostics to quicklyidentify, isolate, and determine the root cause of problems.

The traditional approach of troubleshooting performance problems in theunified infrastructure poses several challenges. For example, it can be difficultto identify performance problem in a storage infrastructure environment thatincludes various virtual machines, servers, network, and storage.

Infrastructure Analytics Advisor offers an out-of-the-box analytics solutionwhich lets you identify and troubleshoot performance problems at the nodelevel. The topology view lets you view the graphical representation of theinfrastructure components and their dependencies, which is crucial fortroubleshooting the infrastructure performance problems. Thetroubleshooting aids helps in efficient root cause analysis.

□ Identifying performance problems

□ Infrastructure components and key performance metrics

□ Troubleshooting high response times

□ Troubleshooting workflow

□ Detecting performance problems

□ Analyzing performance bottleneck

□ Analyzing the root cause of the bottleneck

□ Solving performance problems

End-to-end performance troubleshooting 29IAA Data Analytics and Performance Monitoring Overview

Identifying performance problemsThe IT infrastructure is becoming more complex each day with rapidlyemerging converged infrastructures. Performance problems occur due tovarious factors in your environment. Identifying the performance problemsand troubleshooting the problems quickly is crucial.

As part of your performance management strategy, you define performancegoals and criteria for monitoring your environment. The performanceproblems occur when these predefined goals are not met. Use InfrastructureAnalytics Advisor advanced analytics and troubleshooting features to quicklyfix problems.

The following indications make you aware of a performance problem in yourenvironment :• When an SLO violation occurs

Typically SLAs define the SLOs to evaluate the quality of service. SLOprofiles define the threshold values for the performance parameters whichyou use to evaluate the quality of service. When the threshold values areexceeded an SLO violation occurs.

• When a sharp deviation from the baseline data occursWhen no SLOs are defined for your environment, you can use the baselinevalues to evaluate your system performance. The current performance iscompared to the past performance trends and when there is a significantdeviation from the baseline values, Infrastructure Analytics Advisor sendsan alert to notify you of a potential performance problem so you haveenough time to troubleshoot.

• When the customer notifies you of an application performance degradationand slow down of the infrastructure.

The common causes for performance problems are as follows:• Increased load in an otherwise stable operating environment• Inefficient load balancing strategy, which might cause underutilization of

resources• Changes in the system configuration• Resource management in a shared infrastructure

Infrastructure components and key performance metricsYou must analyze the key performance metrics relevant to the problem andthe workload being analyzed.

The components and key performance metrics available in InfrastructureAnalytics Advisor for monitoring performance are listed in the following table:

30 End-to-end performance troubleshootingIAA Data Analytics and Performance Monitoring Overview

Component

Performanceproblem

Key performance metrics

Identify theresources with SLOviolations orperformanceproblems

Identify therelated resourcesused by theaffected resources

Identify the resourcesthat might be the rootcause

Server CPU contention VM• vCPU Ready• vCPU usage

ESX• pCPU usage• Host CPU Ready1

ESX• pCPU usage• Host CPU

Ready1

VM• vCPU Ready• vCPU usage

Memory swap VM• Usage %• Active memory• Swap in/out rate

ESX• Swap in/out rate1

ESX• Usage %• Active memory1

• Swap in/outrate1

VM• Usage %• Active memory• Swap in/out rate1

Memorycontention

VM• Balloon

VM• Balloon1

ESX• Usage %• Balloon1

VM• Usage %• Active memory• Balloon

Response timedecrement

ESX• pCPU usage• Device Latency

(R/W)1

Storage Response timedecrement

VM• Latency (R/W)

Hypervisor• Latency (T)1

LU (Volume)• Response time

(R/W/T)

Port• usage

Processor• MPB utilization1

Cache• Write Pending %• Side file %

Pool• Utilization

Parity Group• Utilization %• Read Hit %

VM• Read (KBps)• Write (KBps)• Read Operations• Write Operations

LU (Volume)• IOPS (R/W/T)

Network Error packet VM• droppedRx• droppedTx• Transmitted/

received (KBps)

VM• Transmitted/received

(KBps)• PacketsTx1

• Packets Rx1

1 The performance metric is available in Data Center Analytics.


Troubleshooting high response timesThe use case flow for troubleshooting high response times for an OLTPapplication using advanced analytics and troubleshooting features ofInfrastructure Analytics Advisor is described in this section.

The most significant metric to watch out for while monitoring the onlinetransactions is the I/O rate. The application will be able to process largenumber of transactions when the I/O rates are higher. To maintain goodresponse times in an OLTP environment which mostly generates randomaccess I/O, the read I/O response times should be higher. For response timecentric applications, such as OLTP applications, you must maintain lowutilization values to ensure CPU availability and low Q-depth values to ensureno wait time.

Troubleshooting workflowThe basic workflow for analyzing and troubleshooting the performanceproblems using Infrastructure Analytics Advisor is as follows:

1. Detecting performance problems on page 332. Analyzing in E2E view on page 343. Analyzing in Sparkline view on page 364. Identify affected resources on page 385. Analyze shared resources on page 386. Analyze related changes on page 397. Solving performance problems on page 41


Detecting performance problemsYou can view the threshold violations using the Dashboard tab and Eventstab. You can configure the system to send email notifications when thethreshold values are exceeded. You can also use the search feature in theAnalytics tab to find the target resources for performance analysis.

Dashboard

The dashboard displays when you log on to the Infrastructure AnalyticsAdvisor. You can create a custom dashboard, and choose to view the reportsof monitored resources.

The dashboard displays summary reports for the monitored resources,system and resource events, event trends and consumer groups. The reportwidgets display the threshold violations and critical alerts detected on allmonitored resources when threshold values are exceeded.

In the following figure, the warnings display on the monitored VMs andvolumes. From the report widgets, you can click links to access the E2E viewto analyze the cause of the threshold violations.

Events tab

The Events tab displays a list of resource and system events. You can viewthe severity of each event, date and time of the occurrence, category, device,and the component name. You can navigate from the Events tab to the E2Eview for further analysis.


Email notifications

Infrastructure Analytics Advisor allows you to configure email notifications.When the threshold values are exceeded, the system sends an email to notifyyou of the potential performance problem.

Search

The search feature in the Analytics tab lets you search for a resource in theConsumers, Servers, Switches, Storage Systems, and Volumes categories.From the returned search results, you can select the resources to analyze,and launch the E2E view or Sparkline view for further analysis.

Analyzing performance bottleneckThe performance degradation in the user resources is caused by performancebottleneck on the server, network, or storage components.

The performance bottleneck occurs due to various reasons, such as CPUcontention, inefficient load balancing, applications sharing storage pools, portand parity group utilization in shared infrastructure, cache utilization,changes in dynamic tiering policies, and configuration changes.

You can identify and analyze the component causing the bottleneck in any ofthe following views:• E2E view• Analyze bottleneck > Verify Bottleneck tab• Sparkline view• Detail view

Analyzing in E2E viewIn the topology view, if a resource has an alert associated with it, errorindicators display on the resource icons. The color of the indicatorcorresponds with the severity of the alert.

The following shows the E2E configuration related to the affected volumes,00:00:03, 00:00:05, and 00:00:06:


You can change the base point of analysis to narrow down the topologyassociated with the affected volumes. Select the affected volume, right-click,and then select Change Base Point.

The Parity Group is identified as the component causing the performancebottleneck.

Analyzing in Verify Bottleneck windowIn the E2E view, right-click on a resource icon and then select VerifyBottleneck to launch the Verify Bottleneck window.

In the Verify Bottleneck window, you can analyze the performance trends ofthe potential bottleneck candidate with the base point resources. If theperformance charts display similar trend patterns in the same time period,you can assume that the selected resource is the bottleneck candidate. Ifnot, you can repeat the analysis for other resources with alerts in the VerifyBottleneck window.

In the following example, the Parity Group is identified as the bottleneck.


Analyzing in Sparkline viewUse Sparkline charts to analyze the performance trends of the monitoredresources. In the Sparkline view you can compare and correlate theperformance of the base point resources and the related infrastructureresources for identifying the bottleneck.

The Sparkline view displays performance charts for multiple nodes in thesame pane to enable quick comparison between different nodes. You candisplay detailed performance metrics for each node and find the correlationwith other nodes.

The following figure shows an example of analyzing the affected volumes(00:00:03, 00:00:05, and 00:00:06) in the Sparkline view. The trend chartsconfirm that the performance bottleneck is caused due to the parity group.

The volumes (00:00:03, 00:00:05, and 00:00:06) belong to the same paritygroup. If the volumes (logical resources) share the same parity group(physical resource) and if one of the logical volumes utilizes the parity groupmore than the others in the shared infrastructure, the total efficiency of thephysical resource is degraded and the parity group utilization rate increases.

A high parity group utilization rate causes delay in reading from or writing todisk in the parity group, which increases the response time of the application.You can consider allocating the affected volumes to a different parity groupfor load balancing. You can also check the IO performance of the parity groupand see if any other servers access the same parity group to troubleshoot thebottleneck.


Analyzing in Detail viewIn the Sparkline view, you can select multiple graphs and then click ShowPerformance to navigate to the Detail view. In the Detail view, you canclosely analyze the performance trends of the base point resources(00:00:03, 00:00:05, and 00:00:06) and the bottleneck candidate - ParityGroup. Based on the analysis, you will notice that the affected volumes havesimilar trend patterns when compared to the parity group during the sametime period, confirming that the parity group is the bottleneck candidate. Youcan continue to analyze and find the root cause in the Analyze Bottleneckwindow.


Analyzing the root cause of the bottleneckInfrastructure Analytics Advisor integrated troubleshooting aids provideguidance about how to find the root-cause of the performance problems.

Identify affected resourcesIn the Analyze Bottleneck window, click the Identify affected resources tab.In this window, you can identify the consumers, hosts, VMs, and volumesthat use the bottleneck candidate. You can also verify the status of eachresource. Based on the severity level displayed, you can troubleshoot theperformance problems associated with the resources.

Analyze shared resourcesThe performance problem arises in the shared infrastructure when anapplication or a resource uses the majority of the available resources andcauses performance issues for other resources in the shared infrastructure.Infrastructure Analytics Advisor supports efficient optimization of the sharedinfrastructure by quickly identifying the resource contention issues.

In the shared infrastructure, the use of resources by one of the component inthe shared infrastructure negatively impacts the performance of othercomponents. The main scope of the analysis is to find the resource in theshared infrastructure which might be causing the performance bottleneck.

Following are the high-level steps used to analyze the root cause in theAnalyze Shared Resource window:


1. In the Analyze Bottleneck window, click Analyze Shared Resources tab.2. In the Analyze Shared Resources window, compare the performance

trends of the bottleneck candidate with the related resources to find ifany of these resources are over utilizing the bottleneck candidate.

3. If the performance trends of the compared resources show similar trendpatterns in the same time period, then you can assume that theperformance bottleneck is caused due to the resource contention issuesin the shared infrastructure.

In the following example, the Parity Group is identified as the bottleneckcandidate. In the Analyze Shared Resources window, compare theperformance trends of the Parity Group with the trend patterns of theVolumes and VMs that use that Parity Group in the shared infrastructure. Theperformance trends of the Parity Group closely match the trend patterns ofone of the VMs, leading to the confirmation that this VM is the resource thatis over utilizing the Parity Group.

You can resolve the bottleneck caused by the shared resources by adoptingefficient load balancing methodologies, which enables optimal utilization ofthe resources in the shared infrastructure.

Analyze related changesThe configuration changes can sometimes be the source of the performanceproblem in your environment. Infrastructure Analytics Advisor supports thetracking of infrastructure configuration changes. Analyzing these changes andcorrelating them with the performance data lets you determine the effects ofconfiguration changes on the systems performance and behaviour.


The main scope of the analysis is to examine the configuration changes madein your environment which might be the root cause of the performancebottleneck.

Following are the high-level steps used to analyze the root cause in theAnalyze Related Changes window:1. In the Analyze Bottleneck window, click Analyze Related Changes tab.2. In the Analyze Related Changes window, a combination chart that

combines the features of the line chart and the bar chart is displayed. Inthe combination chart you can compare the performance data of thebottleneck candidate with the system configuration changes for aspecified time period.The details of the configuration change events that occurred in thespecified time period is displayed in the lower pane. You can analyze thechange events to see if any of these changes caused performancevariations in the bottleneck candidate. You can also zoom in on theperformance trend chart to select a shorter time period, and view thechange events that occurred in the selected time range.

In the following example, the Parity Group is identified as the bottleneckcandidate. In the Analyze Related Changes window, a combination chart thatcontains two data series is displayed, the bars represent the change eventsand the line represents the performance of the bottleneck candidate. You cancorrelate the performance data of the Parity Group and the change eventsthat occurred in the specified time period to determine the effects of theconfiguration changes. Based on the analysis you can confirm that there wereno configuration change events that caused the performance degradation inthe Parity Group.


Solving performance problemsThe common performance problems and the possible solutions are describedas follows. The possible causes and solutions are intended to provideguidance, and might not satisfy your business process performancerequirements.

The following table lists the commonly observed storage related problemsand possible solutions.

Bottleneck area Root cause and possible solutions

Parity Group utilization • Root causeThe usage rate of the Parity Group increasesbecause of the following possible causes:• Some volumes might be under heavy

load.• Volumes (Logical resources) might belong

to the same Parity Group (physicalresource) which might cause resourcecontention issues in the sharedinfrastructure.

• Possible solutions○ Consider moving some volumes to

another Parity Group with a lower usagerate or higher performance.

○ Consider increasing the number of drives(by concatenating Parity Groups).

○ To manage a Parity Group that is part ofa pool, consider adding another ParityGroup to the pool.

MPB utilization • Root causeThe usage rate of the MP Blade (averageusage rate of the MP cores in the MP Blade)increases because of an increased load. Toomany busy resources such as, internalvolumes, external volumes, or journal groupsaccessing the same MP Blade might causeperformance degradation.

• Possible solutionsConsider allocating the busy resources(internal volumes, external volumes, orjournal groups) to another MP Blade(changing the ownership).

Port utilization • Root causeThe usage rate of the port (amount of dataforwarded by the port divided by the amountof data that can be forwarded by the port)increases because of a number of volumesaccessing the same port.

• Possible solutionsConsider allocating some volumes (or hostgroups) to a different port.



Note: When the connected port is changed,the host might need to be restarted.

Cache utilization • Root causeOut of the total cache memory allocated tothe CLPR, the percentage occupied by thedata waiting to be written to the driveincreases because of the following possiblecauses:○ The usage rate of the drive might be

high, delaying write processing to thedrive.

○ The usage rate of the processors mightbe high, delaying write processing to thedrive.

○ The capacity of the installed cachememory might be insufficient.

• Possible solutions○ Consider allocating some volumes to

another cache partition.

○ Consider increasing the cache memory.

The following table lists the commonly observed server related problems andpossible solutions.


CPU utilization • Root causeThe CPU bottlenecks occur when several VMsrun on the same physical machine, and end-up sharing the same CPU. If the VMs (logicalresources) share the same CPU (physicalresource) and if one of the VMs utilizes theCPU more than the others in the sharedinfrastructure, the total efficiency of the CPUis degraded and the CPU utilization rateincreases. The CPU could become saturatedwith requests because of resource contentionissues.

• Possible solutionsConsider moving the VMs to another server.

Memory utilization • Root causeThe memory bottlenecks occur when severalVMs (logical resources) share the availablememory (physical resources) which mightresult in the performance degradation of thephysical memory.

• Possible solutions



Consider allocating additional physicalmemory, or moving the VMs to anotherserver.


4Optimizing infrastructure resources

with storage IO controlsInfrastructure Analytics Advisor provides storage IO controls to optimizeinfrastructure resources.

Storage IO controls offer storage administrators the ability to prioritize IOtraffic. This feature works in many contexts to improve the efficient usage ofresources in your infrastructure.

□ IO controls for optimization of Infrastructure resources

□ IO control settings for a SLO

□ IO controls for optimizing IO performance after the bottleneck analysis

Optimizing infrastructure resources with storage IO controls 45IAA Data Analytics and Performance Monitoring Overview

IO controls for optimization of Infrastructure resourcesInfrastructure Analytics Advisor features storage IO controls for optimizationof the resources in your infrastructure.

You can achieve optimization of your resources if you set limits on storage IOfor noncritical applications. Setting these limits is like applying caps on IOusage to free up more resources in the infrastructure. When you foreseeincreased IO activity, you can set upper limits on volumes identified withapplications or host servers issuing many IO requests.

When development or testing efforts require more resources than usual, youset IO control limits on the volumes associated with these IO-intensiveapplications for that period. This IO control setting then allows the business-critical applications sufficient access to storage resources.

When IO activity decreases to acceptable levels, you clear the IO controllimits from those volumes. By establishing these temporary limits on storageIO, the infrastructure achieves overall optimization during periods ofincreased IO activity.

IO control settings for a SLOIO controls enable you to meet the goals of your SLO.

SLAs specify a quality of service benchmark for an SLO. For storage IOthroughput, this benchmark is typically measured in IOPS or MBps. As a pre-emptive measure, Infrastructure Analytics Advisor enables you to set limitson storage IO activity for applications on servers that issue too many IOrequests, and therefore provide sufficient resources in the infrastructure tomeet the SLOs. After identifying the consumers with a specific SLO, youselect the volumes and set an upper limit to guarantee the quality of servicebenchmark for that SLO. You can set different storage IO upper limits forconsumers based on grade.

IO controls for optimizing IO performance after thebottleneck analysis

To prevent an increased workload from affecting critical resources, set upperlimits for servers issuing many IO requests and affecting critical resources.

Daily, storage administrators must respond quickly to sudden changes in IOtraffic. Shared infrastructure resources can degrade in performance atunpredictable times. If the bottleneck analysis reveals a spike in total IOPS,as shown in the following figure, the root cause is an insufficient amount ofresources available.

46 Optimizing infrastructure resources with storage IO controlsIAA Data Analytics and Performance Monitoring Overview

Because adding resources cannot be done quickly or might not be possible,the most efficient solution is to manage the IO traffic. For a storageadministrator, this situation must be treated as an emergency. In the Set IOControl window, respond to the emergency by setting an upper limit for thevolumes affecting resources immediately when you detect them.

You might use the upper limit setting as a temporary measure to allow moreimportant tasks the sufficient resources as planned for daily operations. Inthe situation where critical resources require less IO prominence, you mightneed to remove the upper limit setting. All upper limit settings are saved tothe History tab.

You can continue checking the History tab to monitor the upper limit settingsby either searching for volume, consumer, or task. This three-part approachprovides the granularity of user selection when monitoring and controlling IOactivity.

Preventing noncritical resources from causing performance degradation

When you are notified of performance degradation through an alert, performthe bottleneck analysis to detect the disruptive resource:• Review the trend charts through E2E or Sparkline View to compare

performance of selected resources.• Use the Analyze Shared Resources to identify which noncritical resources

are disrupting IO traffic.• When the Resources by bottleneck displays, you see a list of volumes that

correspond to the trend chart.• Identify the target volumes issuing many IO requests.• Select the target volumes and then apply the upper limit setting. For your

reference, give the task an appropriate name in the description field.• Continue monitoring the History tab. If use of available resources has

leveled to the point that IO Control is no longer needed, select the target

Optimizing infrastructure resources with storage IO controls 47IAA Data Analytics and Performance Monitoring Overview

volumes of the task in IO Control Settings and click Off or modify thoselimits as needed.

48 Optimizing infrastructure resources with storage IO controlsIAA Data Analytics and Performance Monitoring Overview

5Flexible reporting and analysis using

Data Center AnalyticsIn the fast-paced world of online transactions, many companies with globaloperations have invested in a sophisticated IT infrastructure that providesthem a competitive edge. Monitoring and reporting features enableorganizations to monitor applications closely and continuously to proactivelyidentify any problems before they manifest into something more severe andrequires immediate attention. Whether you are an IT manager for a bank,health care provider, or a government sector, proactive monitoring andreporting are useful in determining the performance trend of your systemand addressing ways to improve customer service interactions in advance ofcustomer feedback. To do this thoroughly requires a tool that can help trackthe health of you system at all hours and display the relevant metricsinstantly in a report that you can share with your organization forassessment.

Hitachi Infrastructure Analytics Advisor integrates with Data Center Analyticsto provide advanced reporting capability to continuously measure andanalyze performance of your monitored resources. The up-to-date visualrepresentation of your system's health enables you to share reports withothers. You can create three types of reports:• Predefined reports: provide high-level details at the application level and

also a granular report that shows component-level performance data.• Ad-hoc reports: enable you to combine related and unrelated metrics of

any monitored resource in one report to review the overall performanceimpact.

• Custom reports: you create with a report builder.

All reports are included in the Reports dock, and are available when youselect any storage system object in the storage systems hierarchy. Predefinedreports differ based on your selection of the storage system object. Aninteractive chart and filtering resources enable you to view every detail in anyreport. You can also filter reports to display the most relevant data, and canprint, create a PDF, and export a report to a CSV file.

Flexible reporting and analysis using Data Center Analytics 49IAA Data Analytics and Performance Monitoring Overview

Overall and granular level reporting using pre-defined reports

Each node in the tree has predefined reports that cover important attributesof a metric to help your analysis of the resource. If you expand and click anode, for example, 609315f7 under Pools in the tree, the performance reportdisplays. In this case, the Pool IOPS Vs. Response Time report displays and itonly shows the metrics data for the 609315f7. No data for other Pools appearon the report.

Compare node and metric with ad-hoc reports

On the reports, nodes are resources such as RAID Storage 302c7d0 and RAIDStorage 302c6d6, and metrics such as cache usage and write pending rate.You can do a comparison between any nodes or between metrics of a singlenode or different nodes. In Add Report, type the report name in the field,then add specific metrics by dragging and dropping a node from the tree toeither the axis section Y/Left or Y1/Right. The left and right axis boxesdisplay the list of available resources, for example, virtual machines andhosts.

50 Flexible reporting and analysis using Data Center AnalyticsIAA Data Analytics and Performance Monitoring Overview

If, for example, you want to see a pattern for a storage node between twotime periods, you can compare the reports on Storage IOPS to display in oneview. Each graph line is color-coded and you can zoom in reports to get abetter view.

You can also compare how one metric affects the other metrics. For example,you can create an ad-hoc report that compares IOPS with Response Time.This most commonly used report shows whether an increasing load on thesystem (IOPS) affects the performance (response time).

Flexible reporting and analysis using Data Center Analytics 51IAA Data Analytics and Performance Monitoring Overview

To create ad-hoc reports, you can combine the related and unrelatedresource metrics and drag and drop the metrics into the report from thespecific instances in the tree. For example, you can see the metrics for portsand volumes in one chart at any time. Attributes that are directly related, forexample, IOPS and Response Time, usually have a built-in report from theReports dock. Sometimes, the attributes can be unrelated (or indirect) suchas the storage system cache usage from the file system transfer rate on ahost can consume most of the storage from the array. You can add unrelatedmetrics and create a comparison chart.

Custom reports

If the predefined charts and ad-hoc are not sufficient, you can create customreports by building your own query. The Custom Reports feature is based onthe Data Center Analytics query language. This regex-based expressive querylanguage retrieves and filters the data in the Data Center Analytics database.

The Data Center Analytics query language allows complex analysis on thedata in real time with constant run-time. The syntax makes it possible totraverse relations, identify the patterns in the data, and establish acomparison between metrics of a single component or multiple nodes.

The Data Center Analytics UI helps you build your custom query in thefollowing three ways:

• Start with a predefined query and customize it as required.

• Build the query using the Build Query feature.

• Write the query directly using Data Center Analytics query language.

52 Flexible reporting and analysis using Data Center AnalyticsIAA Data Analytics and Performance Monitoring Overview

6Monitoring and quick troubleshooting

with Data Center AnalyticsMany companies with global operations have invested in sophisticatedstorage infrastructure that provides them a competitive edge. Even thesmallest down time in any of these critical applications has a cascading effectand results in logistical challenges. Therefore, as a Storage Administrator ofyour company, you must monitor these applications closely and continuouslyto proactively identify and stop potential problems.

Hitachi Infrastructure Analytics Advisor analyzes configuration andperformance data from storage systems, hypervisors, and operating systems.It defines resource SLO thresholds based on the service agreement, andmonitors the service level through customers' threshold alerts. To maximizestorage performance and ensure performance is at peak efficiency,Infrastructure Analytics Advisor taps into a scalable data repository andadvanced diagnostic engine to rapidly diagnose and troubleshoot storageperformance bottlenecks. The most common problem is slow response timeof applications.

The problem could be in any storage component such as the front-end ports,controllers, or disk drives. Infrastructure Analytics Advisor automaticallysends a notification to you when a monitored metric of a storage componentexceeds the defined threshold. The notification contains details of thecomponent that exceeded the threshold to enable you to quickly identify theproblem and troubleshoot it.

In the example, you navigate from the tree view of Data Center Analytics,which shows a hierarchical representation of the various storage systemobjects, to the highlighted storage system, and then selects an object toanalyze. In this example, controller 0 exceeds the defined threshold of amonitored metric.

Monitoring and quick troubleshooting with Data Center Analytics 53IAA Data Analytics and Performance Monitoring Overview

You quickly view the built-in component reports for historical configurationand performance metrics, and notice some unusual or unexpected behaviorin the report for Transfer Rate. You see an unusual peak in activity close tomidnight.

To take a closer look, you zoom in on the report.

54 Monitoring and quick troubleshooting with Data Center AnalyticsIAA Data Analytics and Performance Monitoring Overview

You confirm there was a spike in activity for Write Transfers close to midnightand you must determine if this is a regular pattern or just a one-off situation.By selecting another period (day) to compare with the current values, youconfirm that a similar peak occurred the day before.

You choose to review similar Configuration and Performance reports for othercomponents, DP Pools, RAID Groups and other storage array components toanalyze the affect on performance at the application level.

By focusing on the overall application instead of individual volumes or similarcomponents in detail, you can view the performance metrics at an applicationlevel. This summary of individual resource metrics gives you a consolidatedview of the overall performance. This helps you to identify and solve theproblem faster than viewing individual volumes or similar component metrics.

Monitoring and quick troubleshooting with Data Center Analytics 55IAA Data Analytics and Performance Monitoring Overview

The ability to compare related metrics enables you to quickly compare thedata transfer rate and its throughput performance generated by theapplication, ports, and storage system to view the affect the application hason the storage system and ports. If the application is utilizing a lot ofbandwidth, you can decide to provide additional bandwidth for the applicationor promote applications to a higher storage tier.

56 Monitoring and quick troubleshooting with Data Center AnalyticsIAA Data Analytics and Performance Monitoring Overview

7Strategic planning using trend analysis

in Data Center AnalyticsStrategic planning with trend analysis provides a repository and analyticreporting engine that enables you to identify and analyze historicalperformance trends necessary to optimize storage system performance andplan future capacity growth. As an IT Manager of your company, one of yourprimary responsibilities is to plan and set aside budget for CAPEX costsrequired for future growth of IT infrastructure, specifically hardware andmanagement software, hypervisors, switches, and other network equipment.You require an easy way to predict and scale up to satisfy future needs andgrowth of the organization.

The Data Center Analytics management server collects and reportsperformance and configuration data over time. Using historical data, you canevaluate the current data usage and predict future requirements. Thefollowing report displays the storage capacity usage trend for a selectedstorage system over a specific time period.

The example report shows an increase in the subscription for 53086_Capacityfrom April 9, indicated by a blue line. This increase suggests that the poolrequires additional capacity to meet the subscription commitment. As in theexample report, if the consumption increases suddenly, the pool is at a

Strategic planning using trend analysis in Data Center Analytics 57IAA Data Analytics and Performance Monitoring Overview

greater risk of running out of disk space. Therefore, you must add morestorage capacity.

Because of the short time window in which the report is created, the changein capacity is minimal, but for a longer period of time, it will be more visible.These reports are useful for you to do additional capacity planning closer tothe time of actual requirement.

Trend analysis is an analytical tool to validate the effectiveness of yourstorage provisioning strategy over a time period. If the measure of capacityrequired to fulfill the subscription commitment compared with total availablefree capacity of a storage pool is consistently high, this indicates that youractual capacity is inadequate to meet your subscription commitment. If themeasure of capacity is low, this suggests that the pool will not completelyutilize the provisioned capacity beyond the current levels and you can safelymove the unused capacity to another pool.

58 Strategic planning using trend analysis in Data Center AnalyticsIAA Data Analytics and Performance Monitoring Overview

IAA Data Analytics and Performance Monitoring Overview

Hitachi Data Systems

Corporate Headquarters2845 Lafayette StreetSanta Clara, California 95050-2639U.S.A.www.hds.com

Regional Contact Information

Americas+1 408 970 [email protected]

Europe, Middle East, and Africa+44 (0) 1753 [email protected]

Asia Pacific+852 3189 [email protected]

Contact Uswww.hds.com/en-us/contact.html

MK-96HIAA004-01October 2016

http://www.hds.com

MAILTO:[email protected]