achieving rapid data recovery for ibm aix environments - executive overview of double-take...

Upload: gryglewskia

Post on 04-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Achieving Rapid Data Recovery for IBM AIX Environments - Executive Overview of Double-Take Availability for AIX

    1/10

    Achieving Rapid Data

    Recovery for IBM AIXEnvironmentsAn Executive Overview ofDouble-Take Availability for AIX

  • 7/29/2019 Achieving Rapid Data Recovery for IBM AIX Environments - Executive Overview of Double-Take Availability for AIX

    2/10

    W H I T E P A P E R

    v i s i o n s o l u t i o n s . c o m2

    Introduction

    Planning for recovery is a requirement in businesses of all sizes. In implementing an

    operational plan that ensures that both data and applications can be recovered, IT

    personnel are generally confronted with several challenges:

    How can I ensure my applications and data are recoverable without impacting

    business operations?

    Do I have data protection approaches available to me that meet my recovery point

    and recovery time objectives?

    Can I afford to implement a comprehensive plan that covers both my local and

    remote (disaster) recovery requirements?

    Are there cost-effective alternatives that meet my requirements?

    Business requirements are not the only mandates that may be driving the evolution of

    your recovery plan. Various industry-specic regulatory mandates, including Sarbanes-

    Oxley, HIPAA and SEC, specify requirements for data retention and recoverability. In

    meeting these requirements, businesses have to deal with a variety of risks to data:

    inadvertently deleted les or records (operator error), viruses or hackers that can cause

    data corruption or deletion, and natural disasters that may put much more than just

    your data at risk. Distributed or branch ofces may also have ease of use requirements

    that may not apply to larger, more centralized businesses.

    Do you have a plan that meets your recovery requirements to your satisfaction across

    these areas?

  • 7/29/2019 Achieving Rapid Data Recovery for IBM AIX Environments - Executive Overview of Double-Take Availability for AIX

    3/10

    W H I T E P A P E R

    v i s i o n s o l u t i o n s . c o m3

    Issues with Legacy Recovery Technologies

    I youre like most Businesses, youre using some orm o data protection today probablytape-based backup. Periodically, someone shuts applications down to perorm a backup to

    tape. Depending on the volume o data that is being copied, this may take several hours and

    requires manual intervention to set up the backup job, run it, conrm that it occurred, and then

    return the application to operation. The backup copy may be kept locally in case data needs to

    be recovered in the near term, and eventually (ater several weeks) it may be moved to an osite

    location or archival storage purposes. The reason to make and keep copies o your data is so

    that, in the event o some sort o event or catastrophe that deletes or destroys data, you have a

    clean copy saely tucked away to use or recovery purposes.

    Tape is used or backup and archive because it is very inexpensive, but it is an old technology

    that has been available almost since the dawn o computing. There are several issues with tape-

    based backup:

    Tape-based backup is a time-intensive process that is potentially disruptive to your

    applications; this issue is commonly reerred to as the backup window problem.

    Because of its impact on applications and resources, tape-based backups are usually not

    taken more than once a day, and oten only once every several days, meaning that there

    are very ew tape-based recovery points available or use over the course o a week; this

    is problematic because your data is changing very requently (on the order o seconds or

    minutes) and the ewer points in time you have a copy o (or recovery purposes) the more

    data loss on average occurs or a given recovery; this issue is commonly reerred to as the

    recovery point objective (RPO) problem.

    Once it is clear that a recovery needs to occur, it takes time to perform the recovery (e.g.nding the right tape, transporting it (i its osite), restoring it to disk, restarting the application

    on top o the data, etc.); this issue is commonly reerred to as the recovery time objective

    (RTO) problem.

    As a storage media for backup, tape is not entirely reliable; in fact, leading analyst groups

    such as the Gartner Group, the Enterprise Strategy Group and the Taneja Group state that

    as many as 1 in 4 backup tapes suer rom some sort o problem that precludes perorming

    a recovery.

    Transporting tapes to osite acilities or archival purposes also has inherent risk. Widely

    publicized tape losses during physical transport (by truck) have hit large companies like Bank o

    America, Citigroup Inc., ChoicePoint Inc. and LexisNexis and resulted in the theft of hundreds ofthousands o company records. Replication o data across secure IP-based networks is a much

    aster, easier and saer way to transport data to osite locations or archival storage purposes.

    I you are driven by either business or regulatory requirements to deploy a disaster recovery

    solution, a pure tape-based data protection strategy can subject you to undue risk.

    eading analyst groups, such

    as the Gartner Group, the

    nterprise Strategy Group and

    e Taneja Group, state that as

    many as 1 in 4 backup

    tapes suer rom some sort

    o problem that precludes

    perorming a recovery.

  • 7/29/2019 Achieving Rapid Data Recovery for IBM AIX Environments - Executive Overview of Double-Take Availability for AIX

    4/10

    W H I T E P A P E R

    v i s i o n s o l u t i o n s . c o m4

    The Proven Solution

    Double-Take Availability for AIX

    , rom Vision Solutions, is designed to resolve the ollowingcommon problems:

    Backup window: Data is continuously and transparently copied rom designated servers

    throughout the day as changes occur, so you never again have to concern yoursel with

    backup windows.

    Recovery Point Objective: Using a technology called continuous data protection (CDP),

    Double-Take Availability will allow you to retroactively pick any previous point and generate a

    readable, writable snapshot o what the data looked like at the selected point; this eectively

    presents you with all possible recovery points to minimize data loss on recovery in a way

    that tape, with its limited number o recovery points, never can.

    Recovery Time Objective: Double-Take Availability restores directly from disk, providing

    you with fast, reliable restores in a way that tape cannot. And your ability to pick the optimal

    recovery point to minimize data loss means that you will spend less time restoring the

    entire application environment; this eectively shortens the downtime associated with data

    recovery and hence the impact and cost o an outage.

    Redundant Application Server:The backup server provides a manual ailover target that

    will allow a critical application to be rapidly restarted with access to current data (to allow

    processing to continue i the primary server or some reason cannot be restarted).

    Remote replication: Double-Take Availability includes the ability to replicate data across IP

    networks so you can migrate your aged data to a remote acility without exposing it to the

    risks associated with the physical transport o tape-based media.

    Double-Take Availability is already in use with referenceable customers across different vertical

    markets, including nancial services and healthcare. It runs on IBM Power Systems servers with

    AIX 5.3 and above and is applicable to any application running on AIX. Applications for which

    Double-Take Availability is a good t meet the following prole:

    A 7 x 24 application environment with a small to non-existent backup window.

    Critical applications (from a business point of view) that have high rates of data change

    (where ewer recovery points translates to signicant amounts o lost data on recovery).

    Applications with stringent recovery time requirements that are not currently being met withexisting data protection technologies.

    By 2011, some orm o CDPwill be deployed in 80 percent

    o the Fortune 2000.

    Gartner

  • 7/29/2019 Achieving Rapid Data Recovery for IBM AIX Environments - Executive Overview of Double-Take Availability for AIX

    5/10

    W H I T E P A P E R

    v i s i o n s o l u t i o n s . c o m5

    How Does the Double-Take Availability Solution Work?

    A dedicated backup System p server is established, and can protect one or more applicationsrunning on other IBM AIX production servers that are connected to it locally via an IP-based

    network. Double-Take Availability mirrors production server writes to designated Protected

    Storage (the storage where the application you want to protect resides) to the backup server

    over IP. These writes are stored in the Recovery Storage that is directly attached to the Backup

    server. Double-Take Availability runs continuously in the background and does not noticeably

    impact the performance of the Protected Application.

    At any time, an administrator can go into the management interface of the Backup server,

    running on a separate Windows-based PC, and generate a historical view (a snapshot of

    the data at any randomly selected previous point in time). These historical views can then be

    presented to any other server on the network or the purposes o recovery or to perorm any

    type o o-host processing. These historical views are ully read/write capable, which means

    that they can support o-host processing tasks like data analysis, testing, development

    or backup all without imposing any impact whatsoever on the Protected Application. A

    historical view can be presented back to the production server as well, but note that there is

    another option with respect to the production server, called Production Restore, which uses

    dierencing technology to modiy the Protected Storage to look like the historical view selected

    on the backup server.

    Most restore requests are

    driven by issues such

    as an inadvertently deleted

    fle or data corruption

    that is introduced by a

    virus or a hacker.

    Figure 1. Double-Take RecoverNow mirrors data to a local backup server, which can then retroactively

    present snapshots or recovery, analysis, testing or development purposes with no impact

    on the production server(s).

    Data Tap

    Protected Server

    Ethernet (IP) network

    LAN

    IBM Server

    Snapshot presentation

    Manual failover target

    provides server-level

    redundancy

    Protected Storage Recovery Storage

    Production restore

    AIX 5

  • 7/29/2019 Achieving Rapid Data Recovery for IBM AIX Environments - Executive Overview of Double-Take Availability for AIX

    6/10

    W H I T E P A P E R

    v i s i o n s o l u t i o n s . c o m6

    Double-Take Availability also supports asynchronous replication. This allows you to replicate

    the continuous data stream or selected historical views to a remote acility, as long as it is

    connected to the primary acility by an IP-based network. Replication o the continuous data

    stream provides ull any point in time recovery capabilities at the remote site. This conguration

    is optimal or disaster recovery capability, since historical views created at the remote site can

    only be presented to servers at the remote site that are on the same LAN as the remote backup

    server. Replication represents a much aster, much more secure way to get your data to an

    osite storage acility. To use this eature, you will need to purchase another System p server

    running AIX and the backup server software license at the remote site.

    n optimized recovery window

    seven days is confgured on

    the Backup server...

    Any restore requirement dur-

    ing that seven day period is

    perormed instantaneously

    rom disk, without the need

    o build up a restore image

    rom multiple incremental

    backups.

    Protected Server

    Snapshots created by the

    IBM Server can be presented

    to a Backup Server for any

    type of off host processing,

    like backup.

    Snapshots created by the

    IBM Server can be presented

    to a Recovery Server to

    recover a production application.

    IBM Server

    Backup Server Tapes can then be

    transported off site

    for remote storage.

    IBM Server Recovery Server

    IP-based replication

    WAN

    LOCAL SITE REMOTE SITE

    Data Reference Patterns

    100%

    50%

    0%

    Online(ms)

    Retrievalactivity

    Amount of data

    Nearline(sec)

    Archival/deletion(sec/mins)

    References declineas data ages.

    Data is being kept forlonger periods of time.

    The percent of total datadeleted is declining.

    TCO encourages migrationof less active data.

    Days since creation

    1 3 7 15 30 60 90

    Source: Harison Information Strategies

  • 7/29/2019 Achieving Rapid Data Recovery for IBM AIX Environments - Executive Overview of Double-Take Availability for AIX

    7/10

    W H I T E P A P E R

    v i s i o n s o l u t i o n s . c o m7

    The Correlation Between Data Age and Possibility of

    Re-Use/Restore

    It has been proven over time that most data recovery requests are or relatively recent data,

    and that there is a direct correlation between the age o data and the possibility that it would

    be required or restore purposes. Most restore requests are driven by issues such as an

    inadvertently deleted le or data corruption that is introduced by a virus or a hacker. Typically

    these problems are discovered within several hours or at most a ew days rom when they rst

    occur, resulting in restore requests or more recent data.

    In general, the only time you may need to restore data that has already been archived would

    be in the event o a disaster that physically destroys computer equipment and acilities, such as

    an earthquake or a tornado. While it pays to be prepared against these occurrences, they are

    very rare. The slope o the red line in Figure 3 varies by company type, but it refects the general

    relationship in all industries between the age o data and the chance that it would need to berestored.

    Another key factor to note is that as data ages, it becomes less important to support the ability

    to restore to any point in time. Note the inection point in the red line in Figure 3 that occurs

    around Day 3. Restore requests or data drop o signicantly ater that point. This might suggest

    that you would want to manage roughly 3 days worth o your most recent data with Double-

    Take Availability, migrating it to less exible but less expensive media locally thereafter for several

    weeks, and then eventually storing it in an o-site acility ater about 30 days. This 3 day window

    is reerred to as the optimized recovery window.

    Two Sample Use Cases

    Using Double-Take Availability to Provide Zero Impact Data Protection and Rapid

    Local Recovery

    In this scenario, we assume the customer wants to solve the rapid recovery problem at the local

    level. They have chosen, however, not to replicate and will continue to migrate data to tape or

    physical shipment to an osite location.

    The customer is running an Oracle database as an order entry system on an IBM Power

    Systems server with AIX and 600GB of internal storage. This server will become the production

    server.10% o the data changes on a monthly basis, and the overall rate o data growth is

    orecast at 30% per year. Based on past experience, the customer knows that restore requests

    tend to drop o signicantly ater seven days. The customer currently does daily incrementalbackups and weekly full backups using a 100 Mbit Ethernet LAN. Incremental backups take

    roughly 90-120 minutes per day, while the ull backup takes between ten and teen hours using

    a small tape cartridge autoloader.

    To install the Double-Take Availability solution, the customer purchases a second IBM Power

    Systems server on AIX to act as the backup server. Based on the rate of data change and

    forecast database growth, 1.5 TB of Recovery Storage is housed internally to the backup

    server. This backup server is attached to the same LAN as the production server. Double-Take

  • 7/29/2019 Achieving Rapid Data Recovery for IBM AIX Environments - Executive Overview of Double-Take Availability for AIX

    8/10

    W H I T E P A P E R

    v i s i o n s o l u t i o n s . c o m8

    Availability is installed on the production server, while the relevant storage which underlies the

    Oracle application is designated as the Protected Storage. An optimized recovery window of

    seven days is congured on the backup server. An initial synchronization between the production

    server and the backup server is perormed while the production server continues to run (itis run as a background process) so that database access is not impacted. Once the initial

    synchronization is complete, continuous data protection is enabled.

    To take advantage of the capabilities of their newly implemented Double-Take Availability

    solution, the customer makes some changes to their data protection processes. With seven

    days o data included in the optimized recovery window, the customer no longer needs to

    perform daily incrementals. Any restore requirement during that seven day period is performed

    instantaneously rom disk and without the need to build up a restore image rom multiple

    incremental backups, thus cutting recovery time to minutes.

    A weekly tape backup is still desirable to prepare for the eventual archiving of data offsite, but

    the Oracle application no longer needs to be shut down to perorm backups. Once a week, a

    historical view is created by the backup server, which then uses it to perorm a tape backup. The

    customer continues to use its existing tape backup sotware to perorm this backup. Double-

    Take Availability is compatible with all backup software packages for the purposes of historical

    view presentation or o-host backup. These tapes are kept onsite or two weeks, and then sent

    to an osite acility or archival storage.

    Implemented in this way, Double-Take Availability for AIX provides the following benets:

    Backups to tape are now completely decoupled from the production application so they can

    now be scheduled to occur when it is convenient or the administrator, without concern or

    impact on business processes.

    Backups are only taken once a week now (instead of daily), taking less administrative time.

    Restores within the optimized recovery window occur rapidly and reliably from disk,

    completely resolving tape media integrity issues or near term restores.

    Data loss on recovery is minimized because the administrator now has access to the

    optimal recovery point to minimize data loss or every conceivable ailure scenario (this is the

    RPO issue).

    Recovery time is shortened in several ways:

    no restore from tape to disk is required (the application can just be started right up on

    the selected historical view).

    a recovery point never needs to be built up from incrementals so there is less

    administrative overhead associated with recovery (the selected point is just immediately

    presented rom disk).

    there is less time spent preparing the application for production use again after the

    recovery because the best recovery point to resolve the problem can be selected (e.g.

    i the problem is a le deletion or data corruption problem, the point right beore that

    event occurred can be chosen).

  • 7/29/2019 Achieving Rapid Data Recovery for IBM AIX Environments - Executive Overview of Double-Take Availability for AIX

    9/10

    W H I T E P A P E R

    v i s i o n s o l u t i o n s . c o m9

    Recovery time is considerably shortened in the event of a problem with the production

    server: The Protected Application is simply started on the backup server, using the latest,

    current copy o the production data (the latest historical view). It can continue to run there

    until such time as the production server can be repaired and restarted.

    In addition to these benets, there is another advantage that did not exist with the previous

    tape-based approach. Patched and upgraded applications can be tested against current

    production data in a manner completely decoupled from the production environment. A

    historical view o the current data state is created and presented to a staging server (also on

    the LAN) where the patched or upgraded application can be tested. Once the administrator is

    satised with the stability o the new environment, it can be deployed in production. Double-Take

    Availability makes it easy to create these historical views for testing purposes, ensuring more

    reliable patch and upgrade processes against production environments.

    Archiving To Tape with A Multi-Site Double-Take Availability Confguration

    In this scenario, we assume the customer wants to solve three problems (backup window, RPO

    and RTO) but they also want to migrate their archival data to a remote acility with minimal risk.

    For the purposes o this example, well assume they are running an IBM DB2 UDB database on

    an IBM Power Systems server with AIX.

    Adding to their production server, the customer purchases a local backup server with an

    appropriate amount of storage, and the Double-Take Availability software licenses. Then, to

    enable the remote replication capability, the customer purchases another IBM Power Systems

    server, to be located at the remote site, running the same operating system.

    The customer wants to take the weekly ull tape backup rom disk at the remote site or

    archival storage. Both the local and remote backup servers are connected via an IP network.

    With this conguration, the only change to their ormer backup processes is that they now keep

    no tape at all at the local (production) site, only at the remote site.

    Once a week, a historical view that represents the ull backup is created on the remote-site

    backup server. The remote backup server then backs up the data to tape. Recoveries o data

    that is already archived can be restored rom tape to disk on the remote backup server, and

    then replicated back to the local (production site) backup server. At that point, the view can be

    manipulated or any recovery or o-host processing purposes in the same manner as any locally

    created view.

    This solution provides the ollowing benets:

    All of the benets of the local conguration example accrue here, including removal of thebackup window, minimized data loss and much more rapid, reliable recoveries (due to rapid

    restores direct rom disk and to the availability o the backup server as a manual ailover

    platorm).

    The additional advantages that accrue with the remote conguration include a fast, easy and

    secure way to migrate data rom a local site to a remote site without incurring any o the risk

    associated with physical transport, and a ast, easy and secure way to get that data back to

    a local site on those rare occasions when a recovery rom older data is required.

    Tape-only backups are no

    longer a easible data

    rotection strategy in todays

    business environment

  • 7/29/2019 Achieving Rapid Data Recovery for IBM AIX Environments - Executive Overview of Double-Take Availability for AIX

    10/10

    W H I T E P A P E R

    15300 Barranca Parkway

    Irvine, CA 92618800-957-4511

    801-799-0300

    visionsolutions.com

    Copyright 2010, Vision Solutions, Inc. All rights reserved. IBM and Power Systems are

    trademarks of International Business Machines Corporation. W indows is a registered trademark

    of Microsoft Corporation. Linux is a registered trademark of Linus Torvalds.

    Recovery Time Comparisons

    When downtime costs you money, a rapid

    recovery capability presents a quantiable returnon investment opportunity. By oering a much

    aster and easier way to perorm data recovery

    than that oered by tape, savings accrue not

    only in the area o downtime but in terms o

    administrative time and expense. As shown in

    Figure 4 below, Double-Take Availability can

    shorten recovery times by hours and even days

    in some cases.

    Summary

    Any business that is experiencing rapid growth

    or consolidation is very likely using a suboptimal

    data recovery solution built around tape-based

    backup. This type o legacy solution potentially

    interrupts business processes, due to the

    requirement or a backup window, subjects the business to potentially signicant data loss

    when recoveries are required, and is time consuming and labor intensive or both data protection

    operations and recoveries.

    Double-Take Availability for AIX is a proven solution to the data recovery problem that is in use

    at a variety of referenceable accounts today. Double-Take Availability leverages CDP technology

    to support instantaneous recoveries rom disk, resulting in minimal data loss (due to its abilityto present all possible recovery points), rapid, reliable recovery (due to its ability to restore

    immediately rom disk), all while not imposing any downtime on production applications (zero

    impact data protection).

    Because Double-Take Availability ensures that data on the backup server is always current,

    it can be relied upon as a manual ailover platorm that allows application processing to be

    rapidly restarted in the event o a catastrophic production server ailure. In addition, Double-Take

    Availability supports asynchronous replication that will allow businesses to establish cost-

    eective and secure multi-site disaster recovery strategies that support rapid recovery, even

    from archived data. Double-Take Availability runs on IBM Power Systems servers with AIX and

    is applicable to any AIX application, but is applied most often for use with business- or mission-

    critical applications such as enterprise databases or le systems.

    Recovery Time for 1 TB of Data

    Review & Roll back from Historical View

    Apply Roll Back to Production Se rver

    Double-Take RecoverNow Recovery

    RebuildVolumes

    Apply Archive Logs

    ResynchronizeVolumes

    Apply Archive Logs

    Restore Data Files from Tape

    Apply Archive Logs

    IBM

    Hours

    Local Offsite Tape

    20

    15

    10

    5

    0

    Recovery from Local Copy

    Recovery from Offsite Copy

    Recovery from Tape

    3 Hrs

    20 Min

    9 Hrs

    17 Hrs