monitoring & managing the datacenter

Upload: stlucky3

Post on 04-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Monitoring & Managing the DataCenter

    1/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 1

    2006 EMC Corporation. All rights reserved.

    Section 5 - Monitoring and Managing theData Center

    Section 5 - Monitoring and Managing theData Center

    Introduction

    Welcome to Section 5 of Storage Technology Foundations Monitoring and Managing the DataCenter.

    Copyright 2006 EMC Corporation. All rights reserved.

    These materials may not be copied without EMC's written consent.

    EMC believes the information in this publication is accurate as of its publication date. Theinformation is subject to change without notice.

    THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMCCORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KINDWITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLYDISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A

    PARTICULAR PURPOSE.

    Use, copying, and distribution of any EMC software described in this publication requires anapplicable software license.

    EMC2, EMC, Navisphere, CLARiiON, and Symmetrix are registered trademarks and EMCEnterprise Storage, The Enterprise Storage Company, The EMC Effect, Connectrix, EDM,SDMS, SRDF, Timefinder, PowerPath, InfoMover, FarPoint, EMC Enterprise Storage Network,EMC Enterprise Storage Specialist, EMC Storage Logix, Universal Data Tone, E-Infostructure,Access Logix, Celerra, SnapView, and MirrorView are trademarks of EMC Corporation.

    All other trademarks used herein are the property of their respective owners.

  • 7/29/2019 Monitoring & Managing the DataCenter

    2/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 2

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 2

    Section Objectives

    Upon completion of this section, you will be able to:

    y Describe areas of the data center to monitor

    y Discuss considerations for monitoring the data center

    y Describe techniques for managing the data center

    The objectives for this section are shown here. Please take a moment to read them.

  • 7/29/2019 Monitoring & Managing the DataCenter

    3/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 3

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 3

    In This Section

    This section contains the following modules:

    y Monitoring in the Data Center

    y Managing in the Data Center

    This section contains 2 modules, monitoring in the data center and managing in the data center.

  • 7/29/2019 Monitoring & Managing the DataCenter

    4/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 4

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 4

    Monitoring in the Data Center

    After completing this module, you will be able to:

    y Discuss data center areas to monitor

    y List metrics to monitor for different data centercomponents

    y Describe the benefits of continuous monitoring

    y Describe the challenges in implementing a unified andcentralized monitoring solution in heterogeneous

    environments

    y Describe industry standards for data center monitoring

    In this module, you will learn about different aspects of monitoring data center components,

    including the benefits of pro-active monitoring and the challenges of managing a heterogeneous

    environment (multiple hardware/software from various vendors).

  • 7/29/2019 Monitoring & Managing the DataCenter

    5/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 5

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 5

    Monitoring Data Center Components

    HBAHBA

    PortPortHBAHBA

    Cluster

    IPIP

    KeepAlive

    Client

    PortPort

    Health

    Capacity

    Performance

    Security

    Storage Arrays

    Hosts/Servers withApplications

    SAN

    Network

    The Business Continuity Overview module discussed the importance of resolving all single

    points of failure when designing data centers. Having designed a resilient data center, the next

    step is to ensure that all components that make up the data center are functioning properly and

    are available on a 24x7 basis. The way to achieve this is by monitoring the data center on a

    continual basis.

    System Monitoring is essential to ensure that the underlying IT infrastructure business critical

    applications are operational and optimized. The main objective is to ensure that the various

    hosts, network systems and storage are running smoothly and to know how loaded each system

    and component is and how effectively it is being utilized.

    The major components within the data center that should be monitored include:

    y Servers, databases and applications

    yNetwork ((SAN) and IP Networks (switches, routers, bridges))

    y Storage Arrays

    Each of these components should be monitored for health, capacity, performance, and security.

  • 7/29/2019 Monitoring & Managing the DataCenter

    6/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 6

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 6

    Why Monitor Data Centers

    y Availability

    Continuous monitoring ensures availability

    Warnings and errors are fixed proactively

    y Scalability

    Monitoring allows for capacity planning/trend analysis which in turnhelps to scale the data center as the business grows

    y Alerting

    Administrators can be informed of failures and potential failures

    Corrective action can be taken to ensure availability and scalability

    Continuous monitoring of health, capacity, performance and security of all data center

    components is critical to ensure data availability and scalability. For example, information about

    component failures can be sent to appropriate personnel for corrective actions.

    Ongoing trends show that the data storage environment continues to grow at a rapid pace.

    According to the International Data Corporation (IDC), external storage-system capacity growth

    will increase at a compound annual growth rate (CAGR) of approximately 50% through 2007.

    This represents a doubling of the current capacity every 2 years or so. Automated monitoring

    and alerting solutions are becoming increasingly important.

    Monitoring the data center closely and effectively optimizes data center operations and avoids

    downtime.

  • 7/29/2019 Monitoring & Managing the DataCenter

    7/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 7

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 7

    Monitoring Health

    yWhy monitor health of different components

    Failure of any hardware/software component can lead to outage of a

    number of different components

    Example: A failed HBA could cause degraded access to a number ofdata devices in a multi-path environment or to loss of data access in a

    single path environment

    y Monitoring health is fundamental and is easily understoodand interpreted

    At the very least health metrics should be monitored

    Typically health issues would need to be addressed on a high priority

    Health deals with the status/availability of a particular hardware component or a software

    process. (i.e., status of SAN device or port, database instance up/down, HBA status, disk/drive

    failure, etc.)

    If a component has failed, it could lead to down time unless redundancy exists.

    Monitoring the health of data center components is very important and is easy to understand and

    interpret (i.e., a component is either available or it has failed). Monitoring for capacity,

    performance, and security depend on the health and availability of different components.

  • 7/29/2019 Monitoring & Managing the DataCenter

    8/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 8

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 8

    Monitoring Capacity

    yWhy monitor capacity

    Lack of proper capacity planning can lead to data un-availability and

    the ability to scale

    Trend reports can be created from all the capacity data

    Enterprise is well informed of how IT resources are utilized

    y Capacity monitoring prevents outages before they canoccur

    More preventive and predictive in nature than health metrics

    Based on reports one knows that 90% of a file system is full and that the

    file system is filling up at a particular rate

    95% of all the ports have been utilized in a particular SAN fabric, a new

    switch should added if more arrays/servers are to be added to the samefabric

    From a monitoring perspective, capacity deals with the amount of resources available.

    Examples:y Available free/used space on a file system or a database table space

    y Amount of space left in a RAID Group

    y Amount of disk space available on storage arrays

    y Amount of file system or mailbox quota allocated to users.

    yNumber of available ports in a switch (e.g., 52 out of 64 ports in use, leaving 12 free ports

    for expansion)

  • 7/29/2019 Monitoring & Managing the DataCenter

    9/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 9

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 9

    Monitoring Performance

    yWhy monitor Performance metrics

    Want all data center components to work efficiently/optimally

    See if components are pushing performance limits or if they arebeing under utilized

    Can be used to identify performance bottlenecks

    y Performance Monitoring/Analysis can be extremelycomplicated

    Dozens of inter-related metrics depending on the component inquestion

    Most complicated of the various aspects of monitoring

    Performance monitoring measures the efficiency of operation of different data center

    components.

    Examples:

    yNumber of I/Os thorough a front-end port of a storage array

    yNumber of I/Os to disks in a storage array

    y Response time of an application

    y Bandwidth utilization

    y Server CPU utilization

  • 7/29/2019 Monitoring & Managing the DataCenter

    10/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 10

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 10

    Monitoring Security

    yWhy monitor security

    Prevent and track unauthorized access

    Accidental or malicious

    y Enforcing security and monitoring for security breaches isa top priority for all businesses

    Security prevents and tracks unauthorized access.

    Examples of security monitoring are:y Login failures

    y Unauthorized storage array configuration/re-configuration

    y Monitoring physical access (via badge readers, biometric scans, video cameras, etc.)

    y Unauthorized Zoning and LUN masking in SAN environments or changes to existing zones

  • 7/29/2019 Monitoring & Managing the DataCenter

    11/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 11

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 11

    Monitoring Servers

    y Health

    Hardware components

    HBA, NIC, graphic card, internal disk

    Status of various processes/applications

    y Capacity

    File system utilization

    Database

    Table space/log space utilization

    User quota

    HBAHBA

    HBAHBA

    Any failure of a hardware component such as HBA or NIC, should be immediately detected and

    rectified. As seen earlier, component redundancy can prevent total outage. Mission critical

    applications running on the servers should also be monitored continuously. A database might

    spawn a number of processes that are required to ensure operations. Failure of any of these

    processes can cause non-availability of the database. Databases and applications usually have

    mechanisms to detect such errors and report them.

    Capacity monitoring on a server will involve monitoring file system space utilization. By

    continuously monitoring file system free space, estimate the growth rate of the file system and

    effectively predict as to when it will become a 100% full. Corrective action such as extending

    the space of a file system can be taken well ahead of time to avoid a file system full condition.

    In many environments, system administrators enforce space utilization quota on users. For

    example, a user cannot exceed 10 GB of space or a particular file cannot be greater than 100

    MB.

  • 7/29/2019 Monitoring & Managing the DataCenter

    12/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 12

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 12

    Monitoring Servers

    y Performance

    CPU utilization

    Memory utilization

    Transaction response times

    y Security

    Login

    Authorization

    Physical security

    Data center access

    HBAHBA

    HBAHBA

    Two key metrics of performance of servers are the CPU and memory utilization. A continuously

    high value (above 80%) for CPU utilization is an indication that the server is running out of

    processing power. During periods of high CPU utilization, applications running on the server,

    and consequently end-users of the application, will experience slower response times. Corrective

    action could include upgrading processors, adding more processors, shifting some applications

    to different Servers, or restricting the number of simultaneous client access. Databases,

    applications, and file systems utilize Server physical memory (RAM) to stage data for

    manipulation. When sufficient memory is not available, data has to be paged in and out of disks.

    This process will also result in slower response times.

    Login failures and attempts by unauthorized users to execute code or launch applications should

    be closely monitored to ensure secure operations.

  • 7/29/2019 Monitoring & Managing the DataCenter

    13/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 13

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 13

    Monitoring the SAN

    y Health

    Fabrics

    Fabric errors, zoning errors

    Ports

    Failed GBIC, status/attribute change

    Devices

    Status/attribute Change

    Hardware Components

    Processor cards, fans, power supplies

    y Capacity

    ISL utilization

    Aggregate switch utilization

    Port utilization

    Uninterrupted access to data over the SAN depends on the health of its physical and logical

    components. The GBICs, power supplies, and fans in switches and cables are the physical

    components. Any failure in these must be immediately reported. Constructs such as zones and

    fabrics are the logical components. Errors in zoning such as specifying the wrong WWN of a

    port will result in failure to access that port. These have to be monitored, reported, and rectified

    as well.

    By way of capacity, the number of ports on different switches that are currently used/free should

    be monitored. This will aid in planning expansion by way of adding more Servers or connecting

    to more storage array ports. Utilization metrics at the switch level and port level, along with

    utilization of Interswitch Links (ISLs), are also a part of SAN capacity measurements. These can

    be viewed as being a part of performance metrics as well.

  • 7/29/2019 Monitoring & Managing the DataCenter

    14/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 14

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 14

    y Performance

    Connectivity ports

    Link failures

    Loss of signal

    Loss of synchronization

    Link utilization

    Bandwidth MB/s or frames/s

    Connectivity devices

    Statistics are usually a cumulative value of all the port statistics

    Monitoring the SAN

    A number of SAN performance/statistical metrics can be used to determine/predict hardware

    failure (health). For example, an increasing number of link failures may indicate that a port is

    about to fail. The following are metrics which describe these failures:

    y Link Failures - the number of link failures occurring on a connectivity device port. A high

    number of failure could indicate a hardware problem (bad port, bad cable )

    y Loss of Signal - the number of loss of signal events occurring on a connectivity device port.

    A high number indicates a possible hardware failure.

    y Loss of Synchronization - the number of loss of synchronization events occurring on a

    connectivity device port. High counts may indicate hardware failure.

    Connectivity device port performance can be measured with the Receive or Transmit Link

    Utilization metrics. These calculated values give a good indicator of how busy the switch port isbased on the assumed maximum throughput. Heavily used ports can cause queuing delays on the

    host.

  • 7/29/2019 Monitoring & Managing the DataCenter

    15/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 15

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 15

    Monitoring the SAN

    y Security

    Zoning

    Ensure communication between dedicated sets of ports (HBA andStorage Ports)

    LUN Masking

    Ensure the only certain hosts have access to certain Storage Arrayvolumes

    Administrative Tasks

    Restrict administrative tasks to a select set of users

    Enforce strict passwords

    Physical Security

    Access to Data Center should be monitored

    SAN Security includes monitoring the fabrics for any zoning changes. Any errors in the zone set

    information can lead to data inaccessibility. Unauthorized zones can compromise data security.

    User login/authentication to switches should be monitored to audit administrative changes.

    Ensure that only authorized users are allowed to perform LUN masking tasks. Any such tasks

    performed should be audited for proper authorization.

  • 7/29/2019 Monitoring & Managing the DataCenter

    16/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 16

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 16

    Monitoring Storage Arrays

    y Health

    All hardware components

    Front End

    Back End

    Memory

    Disks

    Power Supplies

    Array Operating Environment

    RAID processes

    Environmental Sensors

    Replication processes

    Storage arrays typically have redundant components to function when individual components

    fail. Performance of the array might be affected during such failures. Failed components should

    be replaced quickly to restore optimal performance. Some arrays include the capability to send a

    message to the vendors support center in the event of hardware failures. This feature is typically

    known as call-home.

    It is equally important to monitor the various processes of the storage array operating

    environment. For example, failure of replication tasks will compromise disaster recovery

    capabilities.

  • 7/29/2019 Monitoring & Managing the DataCenter

    17/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 17

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 17

    Monitoring Storage Arrays

    y Capacity

    Configured/unconfigured capacity

    Allocated/unallocated storage

    Fan-in/fan-out ratios

    y Performance

    Front End utilization/throughput

    Back End utilization/throughput

    I/O profile

    Response timeCache metrics

    Physical disks in a storage array are partitioned into LUNs for use by hosts.

    yConfigured capacity is the amount of space that has been partitioned into LUNsy Unconfigured capacity is the remaining space on the physical disks

    Allocated storage refers to LUNs that have been masked for use by specific hosts/servers.

    Unallocated storage refers to LUNs that have been configured, but not yet been masked for host

    use.

    Monitoring storage array capacity enables you to predict and react to storage needs as they

    occur.

    Fan-in/fan-out ratios and availability of unused front end ports (ports to which no host has yet

    been connected) is useful when new hosts/servers have to be given access to the storage array.

    Performance: Numerous performance/statistical metrics can be monitored for storage arrays.

    Some of the key metrics to monitor are the utilization rates of the various components that make

    up the storage arrays. Extremely high utilization rates can lead to performance degradation.

  • 7/29/2019 Monitoring & Managing the DataCenter

    18/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 18

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 18

    Monitoring Storage Arrays

    y Security

    LUN Access

    Ensure the only certain hosts have access to certain Storage Arrayvolumes

    Disallow WWN spoofing

    Administrative tasks

    Most arrays allow the restriction of various array configuration tasks

    Device configuration

    LUN masking

    Replication operations

    Port configuration

    Physical Security

    Monitor access to data center

    World Wide Name (WWN) spoofing is a security concern. For example, an unauthorized host

    can be configured with a HBA that has the same WWN as another authorized host. If this host is

    now connected to the storage array via the same SAN, then zoning and LUN Masking

    restrictions will be bypassed. Storage arrays have mechanisms in place which can prevent such

    security breaches.

    Auditing of array device configuration tasks, as well as replication operations is important, to

    ensure that only authorized personnel are performing these.

  • 7/29/2019 Monitoring & Managing the DataCenter

    19/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 19

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 19

    Monitoring IP Networks

    y Health

    Hardware Components

    Processor cards, fans, Power Supplies, ...

    Cables

    y Performance

    Bandwidth

    Latency

    Packet Loss

    ErrorsCollisions

    y Security

    IP

    Network performance is vital in a storage environment. Monitor network latency, packet loss,

    availability, traffic, and bandwidth utilization for:

    I/O (Bandwidth Usage)

    Errors

    Collisions

  • 7/29/2019 Monitoring & Managing the DataCenter

    20/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 20

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 20

    Monitoring the Data Center as a Whole

    y Monitor data center environment

    Temperature, humidity, airflow, hazards (water, smoke, etc.)

    Voltage power supply

    y Physical security

    Facility access (Monitoring cameras, access cards, etc.)

    Monitoring the environment of a data center is just as crucial as monitoring the different

    components. Most electrical/electronic equipment are extremely sensitive to heat, humidity,

    voltage fluctuations, etc. Data center layout and design have to account for correct levels of

    ventilation, accurate control of temperature/humidity, uninterrupted power supplies, and

    corrections to voltage fluctuations. Any changes to the environment should be monitored and

    reported immediately. Physical security is easy to understand.

  • 7/29/2019 Monitoring & Managing the DataCenter

    21/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 21

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 21

    End-to-End Monitoring

    HBAHBA

    HBAHBA

    Cluster

    IPIP

    KeepAlive

    Client

    PortPort

    PortPort

    Storage Arrays

    Hosts/Servers withApplications

    SAN

    NetworkSingle Failure

    Multiple Symptoms

    Root Cause Analysis

    Business Impact

    A good end-to-end monitoring system should be able to quickly analyze the impact that a single

    failure can cause. The monitoring system should be able to deduce that a set of seemingly

    unrelated symptoms are result of a root cause. It should also be able to alert on the impact to

    business arising from different component failures.

  • 7/29/2019 Monitoring & Managing the DataCenter

    22/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 22

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 22

    Monitoring Health: Array Port Failure

    HBAHBA

    PortPort

    HBAHBA

    PortPort

    SW2

    HBA

    HBA

    SW1

    H3

    Degraded

    Degraded

    Degraded

    H2

    H1

    Storage Arrays

    Hosts/Servers withApplications

    SAN

    HBAHBA

    HBAHBA

    Here is an example of the importance of end-to-end monitoring. In this example, 3 Servers (H1,

    H2, and H3) have 2 HBA each and are connected to the storage array via two switches (SW1

    and SW2). The three servers share the same storage ports on the Storage Array.

    If one of the storage array ports fails it will have the following effect on the whole data center:

    y Since all servers are sharing the ports, all the storage volumes that were accessed via SW1

    will be unavailable.

    y The servers will experience path failures. Redundancy enables them to continue operations

    via SW2.

    y The applications will experience reduced performance (degraded), because the number of

    available paths to the storage devices has been cut in half.

    y If the applications belong to different business units all of these would be affected even

    though only a single port has failed.

    This example illustrates the importance of monitoring the health of storage arrays.

    By constantly monitoring the array, you can detect the fault as soon as it happens and fix it right

    away so as to minimize the time that applications have to run in a degraded mode.

  • 7/29/2019 Monitoring & Managing the DataCenter

    23/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 23

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 23

    Monitoring Health: HBA failure

    HBAHBA

    HBAHBA

    SW2

    SW1

    H3

    Degraded

    H2

    H1

    Storage Arrays

    Hosts/Servers withApplications

    SAN

    PortPort

    PortPort

    HBAHBA

    HBAHBA

    HBAHBA

    HBAHBA

    The scenario presented here is the same as the previous (3 Servers H1, H2 and H3 have 2 HBA

    each and are connected to the storage array via two switches SW1 and SW2. The three servers

    share the same storage ports on the storage array). In this example, if there is a single HBA

    failure, the server with the failed HBA will experience path failures to the storage devices that it

    had access to. Application performance on this server will be affected.

  • 7/29/2019 Monitoring & Managing the DataCenter

    24/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 24

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 24

    Monitoring Health: Switch Failure

    Storage Arrays

    SW2

    Hosts/Servers withApplications

    SW1

    SAN

    All Hosts

    Degraded

    PortPort

    PortPort

    PortPort

    PortPort

    In this example, a number of servers (with 2 HBAs each) are connected to the storage array via

    two switches (SW1 and SW2). Each server has independent paths (2 HBAs) to the storage array

    via switch SW1 and switch SW2.

    What happens if there is a complete switch failure of switch SW1?

    All the hosts that were accessing storage volume via switch SW1 will experience path failures.

    All applications on the servers will run in a degraded mode. Notice that the failure of a single

    component (a switch in this case) has a ripple effect on many data center components.

  • 7/29/2019 Monitoring & Managing the DataCenter

    25/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 25

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 25

    Monitoring Capacity: Array

    SW2

    SW1

    New Server

    Can the Array provide the requiredstorage to the new server?

    Storage Array

    Hosts/Servers with

    Applications

    SAN

    PortPort

    PortPort

    PortPort

    PortPort

    This example illustrates the importance of monitoring the capacity of arrays.

    A number of servers (with 2 HBAs each) are connected to the storage array via two switches(SW1 and SW2). Each server has independent paths (2 HBAs) to the storage array via switch

    SW1 and switch SW2. Each of the servers has been allocated storage on the storage array.

    An application on the new server has to be given access to storage devices from the array, via

    switches SW1 and SW2.A new server has to be deployed. Monitoring the amount of configured

    and unconfigured space on the array is critical for deciding if this is possible. Proactive

    monitoring will help from the initial planning stages to final deployment.

  • 7/29/2019 Monitoring & Managing the DataCenter

    26/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 26

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 26

    Monitoring Capacity: Servers File System Space

    File SystemFile System

    Warning: FS is 66% Full

    Critical: FS is 80% Full

    Extend FS

    No Monitoring FS Monitoring

    This example illustrates the importance of monitoring capacity on servers.

    y On the left is an application server which is writing to a file system without monitoring the

    file system capacity. Once the file system is full, the application will no longer be able to

    function.

    y On the right is a similar setup. An application server is writing to a file system. In this case,

    the file system is monitored. A warning is issued at 66%, then a critical message at 80%. We

    can take action and extend the file system before the file system full condition is reached.

    Proactively monitoring the file system can prevent application outages caused by lack of file

    system space.

  • 7/29/2019 Monitoring & Managing the DataCenter

    27/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 27

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 27

    Monitoring Performance: Array Port Utilization

    PortPort

    PortPort

    Storage Arrays

    SW2

    Hosts/Servers withApplications

    SW1

    SANH3

    H2

    H1

    HBA

    HBA

    New Server

    H4

    100%

    PortUtil.

    %

    H1 + H2 + H3

    HBAHBA

    HBAHBA

    HBAHBA

    HBAHBA

    HBAHBA

    HBAHBA

    This example illustrates the importance of monitoring performance metrics on storage arrays.

    Three Servers (H1, H2 and H3) have two HBAs each and are connected to the storage array viatwo switches (SW1 and SW2). The three servers share the same storage ports on the storage

    array. A new server H4 has to be deployed and must share the same storage ports as H1, H2 and

    H3.

    To ensure that the new server does not adversely affect the performance of the others, it is

    important to monitor the array port utilization. In this example, the utilization for the shared

    ports is shown using the green and red lines in the line graph. If the actual utilization prior to

    deploying the new server is the green line, then there is room to add the new server. Otherwise,

    the deployment of the new server will impact performance of all servers.

  • 7/29/2019 Monitoring & Managing the DataCenter

    28/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 28

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 28

    Monitoring Performance: Servers

    Critical: CPU Usage above 90% forthe last 90 minutes

    Most servers have tools that allow you to interactively monitor CPU usage. For example,

    Windows Task Manager displays the CPU and Memory usage (as shown above). Interactive

    tools are fine if only a few servers are being managed. In a data center, with potentially

    hundreds of servers, the tool must be capable of monitoring many servers simultaneously. Tools

    tool should send a warning to the System Administrator whenever the CPU utilization exceeds a

    specified threshold.

  • 7/29/2019 Monitoring & Managing the DataCenter

    29/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 29

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 29

    Monitoring Security: Servers

    Login 1

    Login 2

    Login 3

    Critical: Three successivelogin failures for username

    Bandit on server H4,possible security threat

    Login failures could be accidental (mistyping) or could be the result of a deliberate attempt to

    break into a system. Most servers will usually allow two successive login failures and will not

    allow any more attempts after a third successive login failure. In most environments, this

    information may simply be logged in a system log file. Ideally, you should monitor for such

    security events. In a monitored environment when there are three successive login failures, a

    message could be sent to the System Administrator to warn them of a possible security threat.

  • 7/29/2019 Monitoring & Managing the DataCenter

    30/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 30

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 30

    Monitoring Security: Array Local Replication

    Storage Array

    SW2

    Workgroup 1 (WG2)

    SW1

    SAN

    Workgroup 2 (WG1)WG2

    WG1

    Warning: Attempted replicationof WG2 devices by WG1 user

    Access denied

    ReplicationCMD

    PortPort

    PortPort

    PortPort

    PortPort

    This example illustrates the importance of monitoring security breaches in a storage array. A

    storage array is a shared resource. In this example, the array is being shared between two

    workgroups. The data of WG1 should not be accessible by WG2. Likewise, WG2 should not be

    accessible by WG1.

    A user from WG1 may try to make a local replica of the data that belongs to WG2. Typically,

    mechanisms will be in place to prevent such an action. If this action is not monitored or

    recorded in some fashion, be unaware that some one is trying to violate security protocols. But if

    this action is monitored, a warning message can be sent to the Storage Administrator.

  • 7/29/2019 Monitoring & Managing the DataCenter

    31/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 31

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 31

    Monitoring: Alerting of Events

    yWarnings require administrative attention

    File systems becoming full

    Soft media errors

    y Errors require immediate administrative attention

    Power failures

    Disk failures

    Memory failures

    Switch failures

    Monitoring systems allow administrators to assign different severity levels for different

    conditions in the data center. Health related alerts will usually be classified as being critical or

    fatal meaning that a failure in a component has immediate adverse consequences. Other alerts

    can be arranged in a spectrum from Information to Fatal.

    Generically:

    y Information useful information requiring no administrator intervention, e.g. an authorized

    user has logged in

    y Warning administrative attention is required, but the situation is not critical. An example

    may be that a file system has reached the 75% full mark. The administrator has time to

    decide what action should be taken

    y Fatal immediate attention is required, because the condition will affect systemperformance or availability. If a disk fails, for example, the administrator must ensure that it

    is replaced quickly.

    The sources of monitoring messages may include hardware components, such as servers and

    storage systems, and software components, such as applications.

    Continuous monitoring, in combination with automated alerting, enables administrators to:

    y Reactively respond to failures quickly

    y Proactively avert failures by looking at trends in utilization and performance

  • 7/29/2019 Monitoring & Managing the DataCenter

    32/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 32

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 32

    Monitoring: Challenges

    ApplicationsDatabases

    UNIX

    WIN

    Servers

    MF

    Network

    SAN IP

    Oracle Informix MS SQL

    Brocade

    McData

    Cisco

    SUN

    IBM

    HP

    Storage Arrays

    TLU

    NAS

    SAN

    DAS

    CAS

    NetApp

    EMC

    Hitachi

    The core elements of the data center are the storage arrays, networks, servers, databases, and

    applications.

    y Storage arrays could be NAS, CAS, DAS, SAN attached or Tape/Disk Library Units

    y The network consists of the SAN and the IP Network

    y Servers could be Open Systems (UNIX or Windows) or Mainframe. There are numerous

    vendors who supply these data center components

    The challenge is to monitor and manage each of these components. Typically, each vendor will

    provide monitoring/management tools for each of the components. As a consequence, in order

    to successfully monitor and manage a data center, learn multiple tools and terminologies. In an

    environment where multiple tools are in use, it is almost impossible to get a complete picture of

    what is going on a single page.

    Most data center components are inter-related (i.e. a SUN host is connected to a EMC storage

    array via a Cisco SAN). In an ideal world, the monitoring tool should be able to correlate the

    information from all objects in one place, so that you can make an informed decision on any of

    the metrics that is monitored.

  • 7/29/2019 Monitoring & Managing the DataCenter

    33/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 33

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 33

    Monitoring: Ideal SolutionMonitoring/Management

    Engine

    Storage Arrays

    Network

    Servers, Databases,

    Applications

    ApplicationsDatabases

    UNIXWIN

    Servers

    MF

    Network

    SAN IP

    One UI

    Storage Arrays

    TLU

    NAS

    SAN

    DAS

    CAS

    The ideal solution to monitoring all data center objects from all the vendors would be a

    Monitoring/Management engine that would be able to gather information on all the objects and

    be able to manage all the same via a single user interface.

    The engine should also be able to perform root cause analysis and indicate how individual

    component failures affect various business units.

    y Single interface to monitor all objects in the data center

    y Root cause analysis - multiple symptoms may be triggered by single root cause

    y How to individual component failures affect various business units

    y Should have mechanism to inform administrators of events via e-mail/page/SNMP traps etc.

    y Should provide the ability to generate reports

  • 7/29/2019 Monitoring & Managing the DataCenter

    34/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 34

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 34

    Without Standards

    y No common access layer betweenmanaged objects and applications

    vendor specificy No common data model

    y No interconnect independence

    y Multi-layer management difficulty

    y Legacy systems can not beaccommodated

    y No multi-vendor automated discovery

    y Policy-based management is notpossible across entire classes ofdevices

    Network Management

    Applications Management

    Host Management

    Storage Management

    Database Management

    Interoperability!

    SAN Administrators have often been faced with the dilemma of integrating multi-vendor

    hardware and software under a single management umbrella. It is relatively easy for

    administrators to monitor individual switches. But, monitoring a set of switches together and

    correlating data is a more complex challenge.

    Users and administrators want the flexibility to select the most suitable products for a particular

    application or set of applications and then easily integrate those products into their computing

    environments. Traditionally this has not been possible for the reasons listed above.

    Without standards, policy-based management is not possible across entire classes of devices.

    This poses a big dilemma for diverse environments.

  • 7/29/2019 Monitoring & Managing the DataCenter

    35/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 35

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 35

    Simple Network Management Protocol (SNMP)

    y SNMP

    Meant for network management

    Inadequate for complete SAN Management

    y Limitations of SNMP

    No Common Object Model

    Security - only newer SAN devices support v3

    Positive response mechanism

    Inflexible - No auto discovery functions

    No ACID (Atomicity, Consistency, Isolation, and Durability)properties

    Richness of canonical intrinsic methods

    Weak modeling constructs

    Until recently, Simple Network Management Protocol (SNMP) has been a protocol of choice

    that has been used quite effectively to manage multi-vendor SAN environments. However,

    SNMP, being primarily a network management protocol, is inadequate when it comes to

    providing a detailed treatment on the fine grain elements in a SAN. Some of the limitations of

    SNMP are shown here. While SNMP still retains a predominant role in SAN management,

    newer and emerging standards may change this.

  • 7/29/2019 Monitoring & Managing the DataCenter

    36/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 36

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 36

    Storage Management Initiative (SMI)

    y Created by the Storage NetworkingIndustry Association (SNIA)

    yIntegration of diverse multi-vendorstorage networks

    y Development of more powerfulmanagement applications

    y Common interface for vendors todevelop products that incorporate themanagement interface technology

    y Key components Inter-operability testing Education and collaboration Industry and customer promotion Promotions and demonstrations Technology center SMI specification Storage industry architects and

    developers

    Management Application

    Integration InfrastructureObject Model Mapping Vendor Unique Features

    Platform IndependentDistributedAutomated DiscoverySecurityLockingObject Oriented

    SMI-S

    Interface

    CIM/WBEM

    Technology

    Tape Library

    MOF

    Switch

    MOF

    Array

    MOF MOF

    Many OtherStandard

    ObjectModel per

    Device

    VendorUniqueFunction

    The Storage Networking Industry Association (SNIA) has been engaged in an initiative to

    develop a common, open storage and SAN management interface based on the Distributed

    Management Task Forces (DMTF) Common Information Model. This initiative is known as the

    Storage Management Initiative (SMI).

    One of the core objectives of this initiative is to create a standard that will be adopted by all

    Storage and SAN vendors, hardware and software alike, that will bring about true

    interoperability and allow administrators to manage multi-vendor and diverse storage networks

    using a single console or interface.

    The Storage Management Initiative Specification (SMI-S) offers substantial benefits to users

    and vendors. With SMI-S, developers have one complete, unified and rigidly specified object

    model, and can turn to one document to understand how to manage the breadth of SANcomponents. Management application vendors are relieved of the tedious task of integrating

    incompatible management interfaces, letting them focus on building management engines that

    reduce cost and extend functionality. And device vendors are empowered to build new features

    and functions into subsystems.

    SMI-S-compliant products will lead to easier, faster deployment and accelerated adoption of

    policy-based storage management frameworks. A test suite developed by the SNIA will certify

    compliance of hardware components and management applications with the specification.

    Certified components also will be subjected to rigorous interoperability testing in an SMI

    laboratory.

  • 7/29/2019 Monitoring & Managing the DataCenter

    37/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 37

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 37

    Storage Management Initiative Specification

    (SMI-S)

    y Based on:

    Web Based EnterpriseManagement (WBEM) architecture

    Common Information Model (CIM)

    y Features:

    A common interoperable andextensible management transport

    A complete, unified and rigidlyspecified object model that

    provides for the control of a SAN An automated discovery system

    New approaches to the application

    of the CIM/WBEM technology

    Physical Components

    Removable Media

    Tape Drive

    Disk Drive

    Robot

    Enclosure

    Host Bus Adapter

    Switch

    Logical Components

    Zone

    Other

    Volume

    Clone

    Snapshot

    Media Set

    Performance

    Capacity Planning

    Removable Media

    Storage Resource Management

    Volume Management

    Media Management

    Other

    Container Management

    File System

    Backup and HSM

    Database Manager

    Data Management

    Managed Objects

    Storage Management Interface Specifi cation

    Graphical User Management Users

    Management Tools

    SMI-S forms a layer that resides between managed objects and managed applications. Thefollowing features of SMI-S provide the key to simplifying SAN management:

    y Common data model: SMI-S is based on Web Based Enterprise Management (WBEM)technology and the Common Information Model (CIM). SMI-S agents interrogate a device,such as a switch, host or storage array, extract the relevant management data from CIM-enabled devices, and provide it to the requester.

    y Interconnect independence: SMI-S eliminates the need to redesign the management transportand lets components be managed using in-band or out-of-band communications, or a mix ofthe two. SMI-S offers further advantages by specifying the CMI-XML over HTTP protocolstack and utilizing the lower layers of the TCP/IP stack, both of which are ubiquitous intoday's networking world.

    y Multilayer management: SMI-S has been developed to work with server-based volumemanagers, RAID systems and network storage appliances, a combination that most storage

    environments currently employ.y Legacy system accommodation: SMI-S has been developed to incorporate the management

    mechanisms in legacy devices with existing proprietary interfaces through the use of a proxyagent. Other devices and subsystems also can be integrated into an SMI-S network usingembedded software or a CIM object manager.

    y Automated discovery: SMI-S-compliant products announce their presence and capabilities toother constituents. Combined with the automated discovery systems in WBEM to supportobject model extension, this will simplify management and give network managers thefreedom to add components to their SAN more easily.

    y Policy-based management: SMI-S includes object models applicable across entire classes ofdevices, which lets SAN managers implement policy-based management for entire storage

    networks.

  • 7/29/2019 Monitoring & Managing the DataCenter

    38/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 38

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 38

    Common Information Model (CIM)

    y Describes the management of data

    y Details requirements within a domain

    y Information model with required syntax

    The Common Information Model (CIM) is the language and methodology for describing

    management data.

    Information used to perform tasks is organized or structured to allow disparate groups of people

    to use it. This can be accomplished by developing a model or representation of the details

    required by people working within a particular domain. Such an approach can be referred to as

    an information model.

    An information model requires a set of legal statement types or syntax to capture the

    representation, and a collection of actual expressions necessary to manage common aspects of

    the domain.

    A CIM schema includes models for systems, applications, Networks (LAN), and devices. The

    CIM schema will enable applications from different developers on different platforms to

    describe management data in a standard format so that it can be shared among a variety of

    management applications.

  • 7/29/2019 Monitoring & Managing the DataCenter

    39/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 39

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 39

    Web Based Enterprise Management (WBEM)

    Web Based Enterprise Management (WBEM) is a set of management and internet standard

    architectures developed by the Distributed Management Task Force (DMTF) to unify the

    management of enterprise computing environments, traditionally administered through

    traditional management stacks like SNMP and CMIP.

    WBEM provides the ability for the industry to deliver a well-integrated set of standard-based

    management tools leveraging emerging web technologies.

    The DMFT has developed a core set of standards that make up WBEM, which includes a data

    model, the CIM standard; an encoding specification, xml CIM encoding specification; and a

    transport mechanism, CIM Operation over HTTP .

  • 7/29/2019 Monitoring & Managing the DataCenter

    40/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 40

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 40

    Enterprise Management Platforms (EMPs)

    y Graphical applications

    y Monitoring of many (if not all) data center components

    y Alerting of errors reported by those components

    y Management of many (if not all) data center components

    y Can often launch proprietary management applications

    y May include other functionality

    Automatic provisioningScheduling of maintenance activities

    y Proprietary architecture

    Enterprise Management Platforms (EMPs) are complex applications, or suites of applications,

    that simplify the tasks of managing and monitoring data center environments.

    They will monitor data center components such as network switches, SAN switches, hosts, and

    alert the user of any problems with those components. At a minimum, the icon associated with

    the component in the GUI will change color to indicate its condition. Other forms of alerting,

    such as email or paging, may also be used.

    In addition to the monitoring functionality, management functionality is usually included as

    well. This may take the form of native management by code embedded into the EMP, or may

    involve launching the proprietary management utility supplied by the manufacturer of the

    component.

    Other included functionality often allows easy scheduling of operations that must be performed

    on a regular basis, as well as provisioning of resources such as disk capacity.

  • 7/29/2019 Monitoring & Managing the DataCenter

    41/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 41

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 41

    Module Summary

    Key points covered in this module:

    yIt is important to continuously monitoring of data centercomponents to support the availability and scalabilityinitiatives of any businessComponents include the server, SAN, network, and storage arrays

    y The four areas of monitoring:Health

    Capacity

    Performance

    Security

    y There are attempts to define a common monitoring andmanagement model

    These are the key points covered in the module. Please take a moment to review them.

  • 7/29/2019 Monitoring & Managing the DataCenter

    42/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 42

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 42

    Managing in the Data Center

    After completing this module, you will be able to:

    y Describe individual component tasks that would have tobe performed in order to achieve overall data centermanagement objectives

    y Explain the concept of Information Lifecycle Management

    The objectives for this module are shown here. Please take a moment to review them.

  • 7/29/2019 Monitoring & Managing the DataCenter

    43/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 43

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 43

    Managing Key Data Center Components

    HBAHBA

    PortPortHBAHBA

    Cluster

    IPIP

    KeepAlive

    Client

    PortPort

    Availability

    Capacity

    Performance

    Security

    Storage Arrays

    Hosts/Servers withApplications

    SAN

    Network

    Reporting

    In the module on Monitoring, we learned about the importance of monitoring the various data

    center components for Health, Capacity, Performance, and Security. In this section, we will

    focus on the various management tasks that need to be performed in order to ensure that

    Capacity, Availability, Performance, and Security requirements are met.

    The major components within the data center to be managed are:

    y IP Networks

    y Servers and all applications and databases running on the servers

    y Storage Area Network (SAN)

    y Storage Arrays

    Data Center Management can be broadly categorized as Capacity Management, AvailabilityManagement, Security Management, Performance Management and Reporting. Specific

    management tasks could address one or more of the categories. E.g. A LUN Masking task,

    addresses Capacity (storage capacity is provided to a specific host), Availability (if a device is

    masked via more than one path then single point of failure is eliminated), Security (masking

    prevents other hosts from accessing a given device) and Performance (if a device is accessible

    via multiple paths then host based multipathing software can improve performance by load

    balancing).

  • 7/29/2019 Monitoring & Managing the DataCenter

    44/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 44

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 44

    Data Center Management

    y Capacity Management

    Allocation of adequate resources

    y Availability Management

    Business Continuity

    Eliminate single points of failure

    Backup & Restore

    Local & Remote Replication

    Capacity Management ensures that there is adequate allocation of resources for all applicationsat all times. Capacity Management involves tasks that need to be performed on all data center

    components in order to achieve this goal. Let us take the example of allocating storage to a newapplications that will be deployed on a new server from an intelligent storage array (we willexplore this specific example in much more detail later in this module). To achieve thisobjective the following tasks would have to be performed on the storage array, the SAN and onthe server:

    y Storage Array: Device configuration, LUN Masking

    y SAN: Unused Ports, Zoning

    y Server: HBA Configuration, host reconfiguration, file system management,application/database management

    Availability Management ensures business continuity by eliminating single points of failure inthe environment and ensuring data availability though the use of backups, local replication andremote replication. Backup, local and remote replication have been discussed in Section 4 Business Continuity. Availability management applies to all data center components.

    In this example, of a new application/server, availability is achieved as follows:

    y Server: At least two HBAs, multi-pathing software with path failover capability, Cluster,Backup.

    y SAN: Server is connected to the storage array via two independent SAN Fabrics, SANswitches themselves have built-in redundancy of various components.

    y Storage Array: Devices have some RAID protection, Array devices are made available to thehost via at least two front-end ports (via independent SAN fabrics), Array has built-inredundancy for various components, local and remote replication, backup.

  • 7/29/2019 Monitoring & Managing the DataCenter

    45/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 45

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 45

    Data Center Management, continued

    y Security Management

    Prevent unauthorized activities or access

    y Performance Management

    Configure/Design for optimal operational efficiency

    Performance analysis

    Identify bottlenecks

    Recommend changes to improve performance

    Security Management prevents unauthorized access to, and configuration tasks on, the data center

    components. Unauthorized access to data is prevented as well. In the new application/server deployment

    example, security management is addressed as follows:

    y Server: Creation of user logins, application/database logins, user privileges.

    Volume/Application/Database management can only be performed by authorized users.

    y SAN: Zoning (restricts access to front-end ports by specific HBAs). Administrative/Configuration

    operations can only be performed by authorized users.

    y Storage Array: LUN Masking (restrict access to specific devices by specific HBAs).

    Administrative/Configuration operations can only be performed by authorized users. Replication

    operations are restricted to authorized users as well.

    Performance Management ensures optimal operational efficiency of all data center components.

    Performance analysis of metrics collected is an important part of performance management and can becomplicated because data center components are all inter-related. The performance of one component will

    have an impact on other components. In the new application/server deployment example performance

    management will involve:

    y Server: Volume Management, Database/Application layout, writing efficient applications, multiple

    HBAs and multi-pathing software with intelligent load balancing.

    y SAN: Design sufficient ISLs in a multi-switch fabric. Fabric design core-edge, full mesh partial

    mesh

    y Storage Array: Choice of RAID type and layout of the devices (LUNs) on the back-end of the array,

    choice of front-end ports (are the front-end ports being shared by multiple servers, are the portsmaxed out), LUN Masking devices on multiple ports for multi-pathing.

  • 7/29/2019 Monitoring & Managing the DataCenter

    46/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 46

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 46

    Data Center Management, continued

    y ReportingEncompasses all data center components is used to provide

    information for Capacity, Availability, Security and PerformanceManagement

    Examples

    Capacity Planning

    Storage Utilization

    File System/Database Tablespace Utilzation

    Port usage

    Configuration/Asset Management

    Device Allocation

    Local/Remote Replica

    Fabric configuration Zone and Zonesets

    Equipment on lease/rotation/refresh

    Chargeback

    Based on Allocation or Utilization

    Performance reports

    Reports can be generated for all data center components. Data center reports can be used for

    trend analysis, capacity planning, chargeback, basic configuration information, etc.

  • 7/29/2019 Monitoring & Managing the DataCenter

    47/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 47

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 47

    Host

    Scenario 1 Storage Allocation to a New Server

    File /Database

    Mgmt

    Configured

    MappedHostUsed

    File System/ Database

    Used

    ConfigNew

    Volumes

    AssignVolumes

    Ports

    StorageAllocationTasks

    Array

    Unconfigured

    SANAllocateVolumes

    Hosts

    Reserved

    SANZoning

    VolumeGroup

    Allocated

    HostAllocated

    FileSystemMgmt

    Volume

    Mgmt

    Let us explore the various management tasks with the help of an example. Let us assume that a

    new server has to be deployed in an existing SAN environment and has to be allocated storage

    from a storage array. The allocated storage is to be used by an application which uses a

    relational database. The database uses file systems. The picture breaks down the individual

    allocation tasks. We will explore the individual tasks in the next few slides.

    Storage Array Management

    y Configure new volumes on the array for use by the new server

    y Assign new volumes to the array front end ports

    SAN Management

    yPerform SAN Zoning Zone the new servers HBAs via redundant fabrics to the front endports of the storage Array

    y Perform LUN Masking on the storage array Give the new server access to the new

    volumes via the array front end ports

    Host Storage Management

    y Configure HBAs on new server

    y Configure server to see new devices after zoning and LUN Masking is done

    y Volume Management (LVM tasks)

    y File System Managementy Database/Application Management

  • 7/29/2019 Monitoring & Managing the DataCenter

    48/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 48

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 48

    Array Management Allocation Tasks

    y Configure new volumes (LUNs)

    Choose RAID type, size and number of volumes

    Physical disks must have the required space available

    y Assign volumes to array front end ports

    This is automatic on some arrays while on others this step must beexplicitly performed

    Host Connectivity

    Front End Back End

    Cache

    Physical Disks

    Intelligent Storage System

    LUN 0

    LUN 1

    RAID 0

    RAID 1

    RAID 5

    As we learned previously, the physical disks at the backend of the storage array are not directly

    presented as LUNs to a Host. Typically, a RAID Group or RAID set would be created and then

    LUNs could be created within the RAID set. These LUNs are then eventually presented to a

    host. These LUNs appear as physical disks from a host point of view. The space on the array

    physical disks that has not been configured for use as a host LUN is considered un-configured

    space and can be used to create more LUNs.

    Based on the storage requirements configure enough LUNs of the required size and RAID type.

    On many arrays, when the LUN is created, it is automatically assigned to the Front End ports of

    the array. On some arrays, the LUNs have to be explicitly assigned to array front end ports this

    operation is called Mapping.

  • 7/29/2019 Monitoring & Managing the DataCenter

    49/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 49

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 49

    y Server must have HBA hardware installed and configured

    Install the HBA hardware and the software (device driver) and

    configure

    y Optionally install multi-pathing software

    Path failover and load balancing

    Server Management HBA Configuration

    HBA

    HBA

    Driver

    New Server

    HBAMulti-path

    The installation of the HBA hardware, software, and HBA configuration has to be performed

    before the server can be connected to the SAN. Multi-pathing software can be optionally

    installed. Most enterprises would opt to use multi-pathing because of availability requirements.

    Multi-pathing software can also perform load balancing, which will help performance.

  • 7/29/2019 Monitoring & Managing the DataCenter

    50/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 50

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 50

    y Perform Zoning

    Zone the HBAs of the new server to the designated array front end

    ports via redundant fabrics

    Are there enough free ports on the switch?

    Did you check the array port utilization?

    y Perform LUN Masking

    Grant the HBAs on the new server access to the LUNs on the array

    HBA

    HBA

    SAN Management Allocation Tasks

    Storage Array

    SW2

    SW1

    Port

    HBA

    HBA

    New Server

    Port

    Port

    Port

    Zoning and LUN Masking operations have been discussed in detail in the section on FC SAN.

    Zoning tasks are performed on the SAN Fabric. LUN Masking operations are typicallyperformed on the storage array.

    The switches should have free ports available for the new server. Check the array port utilization

    if the port is shared between many servers.

  • 7/29/2019 Monitoring & Managing the DataCenter

    51/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 51

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 51

    HBA

    HBA

    Server Management Allocation

    y Reconfigure Server to see new devices

    y Perform Volume Management tasks

    y Perform Database/Application tasks

    HBA

    HBA

    LV

    VG

    FS

    DB App

    Reconfigure Server to see new devices

    yBus rescan or a reboot

    Perform Volume Management tasks

    y Create Volume Groups/Logical Volumes/File Systems

    # of Logical Volumes/File Systems depends on how the database/application is to be laid

    out

    Database/Application tasks

    y Install database/application on the Logical Volumes/File Systems that were created

    y Startup database/application

  • 7/29/2019 Monitoring & Managing the DataCenter

    52/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 52

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 52

    Scenario 2 Running out of File System Space

    Solutions

    y Offload non-critical data

    Delete non-essential data

    Move older/seldom used data toother media

    ILM/HSM strategy

    Easy retrieval if needed

    y Extend File System

    Operating System and Logical

    Volume Manager dependent

    Management tasks seen inScenario 1 will apply here as well

    File System

    Warning: FS is 66% Full

    Critical: FS is 80% Full

    In this scenario, we will explore data center management tasks that you would possibly have to

    execute to prevent a file system from getting 100% full.

    When a file system is running out of space, either:

    y Actively perform tasks which off load data from the existing file system (keep file system

    the same size)

    Delete unwanted files

    Offload files that have not been accessed for a long time to tape or to some other media

    from which it can be easily retrieved if necessary

    y Extend the file system to make it bigger

    Considerations for extending file systems

    Dynamic extension of file systems is dependent on the specific operating system or

    logical volume manager (LVM) in use

    The possible tasks to extend file systems is discussed in more detain in the next slide

    In reality, a good data center administrator should constantly monitor file systems and offload

    non-critical data and also be ready to extend the file system, if necessary.

  • 7/29/2019 Monitoring & Managing the DataCenter

    53/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 53

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 53

    No

    Yes

    No

    Scenario 2 Running out of File System Space, continued

    Correlate File System with Volume Group

    or Disk Group.

    Is there free space available in the VG?

    Execute Commandto extend File

    System.

    Is the File System being

    replicated?

    Yes

    Does the Array have configured

    LUNs that can be allocated?Allocate LUNs to server

    Execute Command toextend VG.

    Yes

    Does the array have unconfigured

    capacity?Configure new LUNs

    Identify/Procure another arrayNo

    Does the server have additional

    devices available?

    No

    Yes

    No

    Done

    Yes

    Perform tasks to ensure that

    the larger File System and

    Volume Group are replicated

    correctly

    The steps/considerations prior to the extension of a file system have been illustrated in the flow

    chart. The goal is to increase the size of the file system to avoid application outage. Other

    considerations revolve around local/remote replication/protection employed for the application.

    For instance, if the application is protected via remote/local replication and a new device is

    added to the Volume Group, ensure that this new device is replicated as well.

    The steps include:

    y Correlate the file system to the logical volume and volume group if an LVM is in use

    y If there is enough space in the volume group extend the file system

    y If the volume group does not have space does the server have access to other devices

    which can be use to extend the volume group extend the volume group extend the file

    system

    y If the server does not have access to additional devices allocate additional devices to the

    server many or all of the steps discussed in scenario 1 will have to be used to do this

    (configure new LUNs on array, LUN mask, reconfigure server to recognize new devices

    extend volume group extend file system)

  • 7/29/2019 Monitoring & Managing the DataCenter

    54/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 54

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 54

    Scenario 3 Chargeback ReportStorage Arrays

    SW2

    Hosts/Serverswith Applications

    SW1

    LV

    VG

    FSProduction

    (Green)

    Local Replica(Blue)

    Remote Replica(Red)

    DB AppLV

    VG

    FS

    DB App

    LV

    VG

    FS

    DB App

    Port

    Port

    Port

    Port

    Scenario 3: In this scenario, we will explore the various data center tasks that will be necessary

    to create a specific report.

    A number of servers (50 only 3 shown in picture) with 2 HBAs each and are connected to a

    Storage Array via two switches SW1 and SW2. Each server has independent paths (2 HBAs) to

    the storage array via switch SW1 and switch SW2. Applications are running on each of the

    servers, array replication technology is used to create local and remote replicas. The Production

    devices are represented by the green devices, local replica by the blue devices and the remote

    replicas by the red devices.

    A report documenting the exact amount of storage used by each application (including that used

    for local and remote replication) has to be created. The amount of raw storage used must be

    reported as well. The cost of the raw storage consumed by each application must be billed to theapplication owners. A sample report is shown in the picture. The report shows the information

    for two applications. Application Payroll_1 has been allocated 100 GB of storage. Production

    volumes are RAID 1 volumes hence the raw space used by the production volumes is 200 GB.

    Local replicas are on unprotected (no fault tolerance) volumes, hence raw space used by local

    replicas is 100 GB. The remote replicas are on RAID5 (5 disk group) volumes, hence raw space

    used for remote replicas is 125 GB.

    What are the various data center management steps to perform in order to create such a report?

  • 7/29/2019 Monitoring & Managing the DataCenter

    55/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 55

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 55

    Scenario 3 Chargeback Report Tasks

    Correlate Application File Systems Logical Volumes VolumeGroups Host Physical Devices Array Devices (Production)

    Determine Array Devices used for Local ReplicationDetermine Array Devices used for Remote Replication

    Determine storage allocated to application based on the size of thearray devices

    Example:Array 1

    SourceVol 1

    LocalReplica

    Vol 1

    SourceVol 2

    LocalReplica

    Vol 2

    LV

    VG

    FS

    RemoteArray

    RemoteReplica

    Vol 1

    RemoteReplica

    Vol 2

    DB App

    The first step in determining the chargeback costs associated with an application is to correlate

    the application with the array devices that are in use. As indicated in the picture, trace the

    application to the file systems, logical volumes, volume groups, and eventually to the array

    devices. Since the applications are being replicated, determine the array devices used for local

    replication and the array devices used for remote replication. In the example shown, the

    application is using Source Vol 1&2 (in Array 1). The replication devices are Local Replica

    Vol 1&2 (in Array 1) and Remote Replica Vol 1&2 (in the Remote Array).

    Keep in mind that this can change over time. As the application grows, more file systems and

    devices may be used. Thus, before a new report is generated, the correlation of application to the

    array devices should be done to ensure that the most current information is used.

    After the array devices are identified, the amount of storage allocated to the application can beeasily computed. In this case Source Vol 1&2 are each 10GB in size. Thus the storage

    allocated to the application is 20GB (10+10). The allocated storage for replication would be

    20GB for local and 20GB for remote. The allocated storage is the actual storage that can be

    used, it does not represent the actual raw storage used by the application. To determine the raw

    space, determine the RAID protection that is used to the various array devices.

  • 7/29/2019 Monitoring & Managing the DataCenter

    56/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 56

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 56

    Scenario 3 Chargeback Report Tasks, continued

    Determine RAID type for Production/Local Replica/Remote Replicadevices

    Determine the total raw space allocated to application forproduction/local replication/remote replication

    Compute the chargeback amount based of price/raw GB of storage

    Repeat steps for each application and create report

    Repeat the steps each time the report is to be created(weekly/monthly)

    Example:

    2 Source Vols = 2*10GB RAID 1 = 2* 20GB raw = 40GB2 Local Replica Vols = 2*10GB = 2*10GB raw = 20GB

    2 Remote Replica Vols = 2*10 GB RAID 5 = 2*12.5 GB raw = 25GB

    Total raw storage = 40+20+25 = 85GB

    Chargeback cost = 85*0.25/GB = 21.25

    To determine the raw space, review the steps displayed on the slide using the example listed.

    Determine RAID type for Production/Local Replica/Remote Replica devices. In the exampleshown, production devices are 10GB RAID 1, Local replica devices are 10GB with no

    protection, and remote replica devices are 10GB RAID 5 (5 disk group) devices. Determine the

    total raw space allocated to application for production, local replication, and remote replication.

    Based on the values from step 1, you can determine that the total raw space used by the

    application is 85GB. (Total raw storage = 40+20+25 = 85GB). Compute the chargeback amount

    based on price per raw GB of storage. Based on the cost per GB of storage (for the example this

    equals .25/GB), the chargeback cost can be computed. (Chargeback cost = 85*0.25/GB = 21.25).

    Repeat these steps for each application and create a report. Repeat the steps each time the report

    is to be created (weekly/monthly).

    The exercise would have to repeated for every single application in the enterprise in order to

    generate the require report. These tasks can be done manually. Manual creation of the report

    may be acceptable if only one or two applications exist. The process can become extremely

    tedious if many applications exist. The best way to create this report would be to automate these

    various tasks.

  • 7/29/2019 Monitoring & Managing the DataCenter

    57/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 57

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 57

    Information Lifecycle Management

    y Information Management Challenges

    y Information Lifecycle

    y Information Lifecycle Management

    Definition

    Process

    Benefits

    Implementation

    Information Lifecycle Management (ILM) is a key approach for assuring availability, capacity,

    and performance. Lets look at some of the aspects of ILM.

  • 7/29/2019 Monitoring & Managing the DataCenter

    58/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 58

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 58

    Key Challenges of Information Management

    Informationgrowth is relentless

    Informationis more strategic

    than ever

    Informationchanges in value

    over time

    Access, availability, and protection ofcritical information assets at optimal cost

    Ability to prioritize informationmanagement based on data value

    Scaling resources to manage complexity

    1CHALLENGEScaling infrastructure within budget constraints

    2CHALLENGE

    Reducing risk of non-compliance

    3CHALLENGE

    4CHALLENGE

    5CHALLENGE

    Companies face three key challenges related to information management:

    Strong growth of information:y Post-dot com rate of growth is around 50%, driven by digitization, increased use of e-mail,

    etc.

    y Just planning for growth can take up to 50% of storage resources

    y Meeting growth needs has increased the complexity of a customer environment

    Information is playing a more important role in determining business success:

    yNew business applications provide more ways to extract a competitive advantage in the

    marketplace, e.g., companies like Dell, WalMart, and Amazon, where, at the heart of their

    respective business models, is the strategic use of information.

    Finally, information changes in value, and many times not necessarily in a linear fashion.

    y For example, customers become inactive, reducing the need for account information;

    pending litigation makes certain information more valuable, etc.

    y Understanding the value of information should be at the heart of managing information in

    general

  • 7/29/2019 Monitoring & Managing the DataCenter

    59/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 59

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 59

    New OrderRecord

    OrderProcessing

    OrdersFulfilled

    WarrantyClaim

    WarrantyVoided

    VALUETIME

    Dispose

    Archive

    Migrate

    Access

    Protect

    Create

    The Information Lifecycle

    Sales Order Application Example

    Information that is stored on a computer has a different value to a company, depending on how

    long it is stored on the network. In the above example, this sales order goes through differing

    value to the company from the time that it is created to the time that the warrantee is eventually

    voided.

    In a typical sales example as this one, the value of information is highest when a new order is

    created and processed. After order fulfillment, there is potentially less need to have real-time

    access to customer/order data, unless a warranty claim or other event triggers that need.

    Similarly, after the product has entered EOL, or after the account is closed, there is little value in

    the information and it can be disposed.

  • 7/29/2019 Monitoring & Managing the DataCenter

    60/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 60

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 60

    Information Lifecycle Management Definition

    Information Lifecycle Management is a strategy, not a product or service in itself; further, this

    strategy is proactive and dynamic in helping plan for IT growth as it relates to business needs,

    and reflects the value of information in a company.

    A successful information lifecycle management strategy must be:

    y Business-centric by tying closely with key processes, applications, and initiatives of the

    business

    y Centrally managed, providing an integrated view into all information assets of the business,

    both structured and unstructured

    y Policy-based, anchored in enterprise-wide information management policies that span all

    processes, applications, and resources

    y Heterogeneous, encompassing all types of platforms and operating systems

    y Aligned with the value of data, matching storage resources to the value of the data to the

    business at any given point in time

  • 7/29/2019 Monitoring & Managing the DataCenter

    61/66

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Monitoring and Managing the Data Center - 61

    2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center -