best bets for backup: how to optimize your storage and choose a...

13
E-Guide Best bets for backup: How to optimize your storage and choose a dedupe method Believe it or not but there are inexpensive ways to optimize your enterprise backup system that have a dramatic and positive impact on your infrastructure as a whole. Check out this expert E-Guide to learn backup best practices, when it makes sense to make a major upgrade, and how to figure out what may be slowing down your backup systems and fixing it. Also learn tips for choosing the best dedupe technology for your company. Sponsored By:

Upload: others

Post on 12-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

E-Guide

Best bets for backup: How to

optimize your storage and choose

a dedupe method

Believe it or not but there are inexpensive ways to optimize your

enterprise backup system that have a dramatic and positive impact on

your infrastructure as a whole. Check out this expert E-Guide to learn

backup best practices, when it makes sense to make a major upgrade,

and how to figure out what may be slowing down your backup systems

and fixing it. Also learn tips for choosing the best dedupe technology

for your company.

Sponsored By:

Page 2: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 2 of 13

E-Guide

Best bets for backup: How to optimize

your storage and choose a dedupe

method

Table of Contents

Backup best practices: Easy fixes for your enterprise backup system

Deduplication best practices and choosing the best dedupe technology

Resources from EMC Corporation

Page 3: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 3 of 13

Backup best practices: Easy fixes for your enterprise backup system

By W. Curtis Preston

There are easy, cheap things you can do to optimize your enterprise backup system and

have a dramatic impact. This column will discuss how to figure out what may be slowing

down your backup system, how to fix it and will give you some hints as to when to throw in

the towel. You will learn backup best practices, and when the best thing to do is to make a

major upgrade.

The first and most important thing that you must do to be able to improve your enterprise

backup system is to be armed with information. You need to know solid answers to the

following questions:

How much data are you backing up in a full? (You need this number for each client

you are backing up, as well as the aggregate number.)

How much data are you backing up in an incremental? (You also need these for each

client.)

The answer to these questions are best found by querying your backup system. It may take

some time if you've never used that part of your data backup system, so you may need to

get some support (from your backup software vendor, or perhaps a user-based support

community). But if this critical information can't be obtained from your backup software, it's

time to move on to a different backup software package. Symantec NetBackups' bpimagelist

and EMC NetWorker's mminfo commands are examples of how to obtain this information.

The next question you need an answer to is:

What is the network interface for your backup system and how is it configured (e.g.,

TOE, jumbo frames)?

This question is best answered by either the system or network administrator. The answer

you want to hear is "10 GbE offload NIC with jumbo frames," but it's probably not the

answer you're going to hear. 10 GbE is obviously the most recent topology and offers

incredible benefits to your backup system. The most obvious benefit is that you'll have a

network that is faster than your tape drive. Without a network that is faster than your

Page 4: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 4 of 13

target, it is impossible to properly design your data backup system. Therefore, you will need

to change your target or network. If you are unable to upgrade your network, the next most

obvious step would be to move to some type of disk-based backup system, as disk is much

more forgiving of slow networks than tape is. But if it's possible to upgrade your backup

server's NIC, that's a whole lot easier and cheaper than completely rearchitecting your

backup system.

An offload NIC offloads some or all of the TCP processing from your host CPU. If it offloads

all of the processing it is called a TOE card, for TCP Offload Ethernet card. If it offloads some

functions but not others, it is just called an offload card.

Manufacturers of both types of cards would be glad to explain why their approach is better

than the other, but suffice it to say that either is better than having neither. Finally there

are jumbo frames. The standard Maximum Transfer Unit (MTU) of Ethernet is 1,500 bytes

and this value was decided decades ago. The argument for Jumbo Frames is that today's

network speeds are so fast that making a frame every 1,500 bytes requires too many CPU

interrupts; a "jumbo" frame size of 9,000 bytes is more appropriate. If your NIC and

network support Jumbo Frames, your CPU can be interrupted six times less frequently,

increasing the effective speed of your interface.

If you're stuck on 100 MbE, then you really have to upgrade both your network

infrastructure and NIC to have any kind of decent performance. There's no reason to buy a

GbE NIC, though, even if your network doesn't support 10 GbE yet. Buy a 10 GbE NIC and

have it step down to 1 GbE. Then you'll be the first one to take advantage of 10 GbE once

they upgrade the network. If you're using 1 GbE, and your network can support 10 GbE,

upgrading your backup server's NIC to 10 GbE should be the first thing on your priority list.

You should also ask yourself:

What are the I/O throughput capabilities of your backup server?

This question is a little difficult to answer using specifications; it is best to answer via testing

where you take other things out of the picture. The details on how to do this are way

beyond the scope of this column, but a basic suggestion would be to use tools like iostat in

Linux and Unix, and iometer in Windows. How fast can you move I/O through this server?

Page 5: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 5 of 13

Are you limited by the backplane or your I/O bus? If you are, there's not much you can do

other than to upgrade the server on which you're running your backups.

Next, find out:

What is the native transfer rate of your tape drive?

What compression ratio are you getting on your data?

These questions are about finding out how fast your tape drive should be running. So look

at the tapes marked full by your backup software and see how much data you typically fit

on a tape before it says it's full. If you consistently fit 600 GB on a 400 GB tape, then you're

getting 1.5:1 compression. Multiply that number by the vendor's rated throughput for your

drive, and you've got your target throughput number for your tape drive.

Then you need to do whatever you can to supply a rate of data to that tape drive that is

close to its target throughput rate. If your target rate is 150 MBps and you only have a 1

GbE NIC, you see why we spent so much time on talking about the network. Techniques

such as multiplexing (good for backups, can hurt restores) and disk staging (good for

backups, doesn't help restores) are the things you should explore to get your backups to go

fast enough to keep that tape drive happy.

Incremental backups almost always go too slow to keep a tape drive happy, so they will

most certainly need to be staged to disk. Full backups can either be staged to disk or

multiplexed (if your backup software supports it). Once you've made one tape drive happy,

see if you can do the same for two or three or more. Don't back up to more tape drives than

you can stream. Even if you're upgraded to 10 GbE and you're pumping 800 MBps of

backups into your backup server, that's only enough to keep four LTO-5 tape drives

streaming at full speed. If you can't keep one tape drive happy, though, adding more tape

drives to the mix will only make things worse.

What is the fastest speed at which you will need to restore data?

Finally, you need to investigate whether your backup system is capable of performing the

fastest restores that it is required to make. Look at the largest and fastest restore that you'll

ever need to do and test to see if your system is capable of performing that restore. For

Page 6: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 6 of 13

example, make sure that any multiplexing that you're using hasn't had a negative impact on

restore speed.

Although getting the answers to these questions may take some work, you'll be well on your

way to learning some valuable backup best practices, and solving some common enterprise

backup problems.

About this Author: W. Curtis Preston (a.k.a. "Mr. Backup"), executive editor and

independent backup expert, has been singularly focused on data backup and recovery for

more than 15 years. From starting as a backup admin at a $35 billion dollar credit card

company to being one of the most sought-after consultants, writers and speakers in this

space, it's hard to find someone more focused on recovering lost data. He is the webmaster

of BackupCentral.com, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS."

Page 7: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

EMC PRESENTS

EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks are the property of their respective owners.© Copyright 2011 EMC Corporation. All rights reserved. Source: IDC Worldwide Purpose Built Backup Appliance 2010-2015 Market Analysis and Forecast, 2010 Vendor Shares report (May 2011).

“Discover the Power of Next Generation Backup”

See why EMC is the leader in backup and recovery at www.EMC.com/transformbackup.

Page 8: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 8 of 13

Deduplication best practices and choosing the best dedupe technology

By Todd Erickson, Features Writer

Data deduplication is a technique to reduce storage needs by eliminating redundant data in

your backup environment. Only one copy of the data is retained on storage media, and

redundant data is replaced with a pointer to the unique data copy. Dedupe technology

typically divides data sets in to smaller chunks and uses algorithms to assign each data

chunk a hash identifier, which it compares to previously stored identifiers to determine if the

data chunk has already been stored. Some vendors use delta differencing technology, which

compares current backups to previous data at the byte level to remove redundant data.

Dedupe technology offers storage and backup administrators a number of benefits,

including lower storage space requirements, more efficient disk space use, and less data

sent across a WAN for remote backups, replication, and disaster recovery. Jeff Byrne, senior

analyst for the Taneja Group, said deduplication technology can have a rapid return on

investment (ROI). "In environments where you can achieve 70% to 90% reduction in

needed capacity for your backups, you can pay back your investment in these dedupe

solutions fairly quickly."

While the overall data deduplication concept is relatively easy to understand, there are a

number of different techniques used to accomplish the task of eliminating redundant backup

data, and it's possible that certain techniques may be better suited for your environment.

So when you are ready to invest in dedupe technology, consider the following technology

differences and data deduplication best practices to ensure that you implement the best

solution for your needs.

In this guide on deduplication best practices, learn what you need to know to choose the

right dedupe technology for your data backup and recovery needs. Learn about source vs.

target deduplication, inline vs. post-processing deduplication, and the pros and cons of

global deduplication.

Page 9: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 9 of 13

Source Deduplication vs. Target

Deduping can be performed by software running on a server (the source) or in an appliance

where backup data is stored (the target). If the data is deduped at the source, redundancies

are removed before transmission to the backup target. "If you're deduping right at the

source, you get the benefit of a smaller image, a smaller set of data going across the wire

to the target," Byrne said. Source deduplication uses client software to compare new data

blocks on the primary storage device with previously backed up data blocks. Previously

stored data blocks are not transmitted. Source-based deduplication uses less bandwidth for

data transmission, but it increases server workload and could increase the amount of time it

takes to complete backups.

Lauren Whitehouse, a senior analyst with the Enterprise Strategy Group, said source

deduplication is well suited for backing up smaller and remote sites because increased CPU

usage doesn't have as big of an impact on the backup process. Whitehouse also said

virtualized environments are also "excellent use cases" for source deduplication because of

the immense amounts of redundant data in virtual machine disk (VMDK) files. However, if

you have multiple virtual machines (VMs) sharing one physical host, running multiple hash

calculations at the same time may overburden the host's I/O resources.

Most well-known data backup applications now include source dedupe, including Symantec

Corp.'s Backup Exec and NetBackup, EMC Corp.'s Avamar, CA Inc.'s ArcServe Backup, and

IBM Corp.'s Tivoli Storage Manager (TSM) with ProtecTier.

Target deduplication removes redundant data in the backup appliance -- typically a NAS

device or virtual tape library (VTL). Target dedupe reduces the storage capacity required for

backup data, but does not reduce the amount of data sent across a LAN or WAN during

backup. "A target deduplication solution is a purpose built appliance, so the hardware and

software stack are tuned to deliver optimal performance," Whitehouse said. "So when you

have large backup sets or a small backup window, you don't want to degrade the

performance of your backup operation. For certain workloads, a target-based solution might

be better suited."

Page 10: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 10 of 13

Target deduplication may also fit your environment better if you use multiple backup

applications and some do not have built-in dedupe capabilities. Target-based deduplication

systems include Quantum Corp.'s DXi series, IBM's TSM, NEC Corp.'s Hydrastor series,

FalconStor Software Inc.'s File-interface Deduplication System (FDS), and EMC's Data

Domain series.

Inline Deduplication vs. Post-processing dedupe

Another option to consider is when the data is deduplicated. Inline deduplication removes

redundancies in real time as the data is written to the storage target. Software-only

products tend to use inline processing because the backup data doesn't land on a disk

before it's deduped. Like source deduplication, inline increases CPU overhead in the

production environment but limits the total amount of data ultimately transferred to backup

storage. Asigra Inc.'s Cloud Backup and CommVault Systems Inc.'s Simpana are software

products that use inline deduplication.

Post-process deduplication writes the backup data into a disk cache before it starts the

dedupe process. It doesn't necessarily write the full backup to disk before starting the

process; once the data starts to hit the disk the dedupe process begins. The deduping

process is separate from the backup process so you can dedupe the data outside the backup

window without degrading your backup performance. Post-process deduplication also allows

you quicker access to your last backup. "So on a recovery that might make a difference,"

Whitehouse said.

However, the full backup data set is transmitted across the wire to the deduplication disk

staging area or to the storage target before the redundancies are eliminated, so you have to

have the bandwidth for the data transfer and the capacity to accommodate the full backup

data set and deduplication process. Hewlett-Packard Co.'s StorageWorks StoreOnce

technology uses post-process deduplication, while Quantum Corp.'s DXi series backup

systems use both inline and post-process technologies.

Content-aware or application-aware deduplication products that use delta-differencing

technology can compare the current backup data set with previous data sets. "They

understand the content of that backup stream, and they know the format that the data is in

Page 11: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 11 of 13

when the backup application sends it to that target device," Whitehouse said. "They can

compare the workload of the current backup to the previous backup to understand what the

differences are at a block or at a byte level." Whitehouse said delta-differencing-based

products are efficient but they may have to reverse engineer the backup stream to know

what it looks like and how to do the delta differencing. Sepaton Inc.'s DeltaStor system and

Exagrid System Inc.'s DeltaZone architecture are examples of products that use delta

differencing technology.

Global Deduplication

Global deduplication removes backup data redundancies across multiple devices if you are

using target-based appliances and multiple clients with source-based products. It allows you

to add nodes that talk to each other across multiple locations to scale performance and

capacity. Without global deduplication capabilities, each device dedupes just the data it

receives. Some global systems can be configured in two-node clusters, such as FalconStor

Software's FDS High Availability Cluster. Other systems use grid architectures to scale to

dozens of nodes, such as Exarid Systems'DeltaZone and NEC's Hydrastor.

The more backup data you have, the more global deduplication can increase your dedupe

ratios and reduce your storage capacity needs. Global deduplication also introduces load

balancing and high availability to your backup strategy, and allows you to efficiently manage

your entire backup data storage environment. Users with large amounts of backup data or

multiple locations will gain the most benefits from the technology. Most of the backup

software providers offer products with global dedupe, including Symantec NetBackup and

EMC Avamar, and data deduplication appliances, such as IBM's ProtecTier and Sepaton's

DeltaStor offer global deduplication.

As with all data backup and storage products, the technologies used are only one factor you

should consider when evaluating potential deduplication systems. In fact, according to

Whitehouse, the type of dedupe technologies vendors use is not the first attribute many

administrators look at when investigating deduplication solutions. Price, performance, and

ease of use and integration top deduplication shopper's lists, Whitehouse said. Both

Whitehouse and Byrne recommend first finding out if your current backup product has

Page 12: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 12 of 13

deduplication capabilities. If not, analyze your needs long term and study the vendors'

architectures to determine if they match your workload and scaling requirements.

Page 13: Best bets for backup: How to optimize your storage and choose a …docs.media.bitpipe.com/io_10x/io_101700/item_451860/EMC... · 2011-09-14 · SearchDataBackup.com E-Guide Best bets

SearchDataBackup.com E-Guide

Best bets for backup: How to optimize your storage and choose a dedupe method

Sponsored By: Page 13 of 13

Resources from EMC Corporation

EMC Defenders of the Virtual World

EMC Backup to the Future

EMC Backup and Recovery Solutions

About EMC Corporation

EMC Corporation is the world leader in products, services and solutions for information

storage and management that help organizations extract the maximum value from their

information, at the lowest total cost, across every point in the information lifecycle. We are

the information storage standard for every major computing platform and, through our

solutions, serve as caretaker for more than two-thirds of the world's most essential

information. We help enterprises of all sizes manage their growing volumes of information--

from creation to disposal--according to its changing value to the business through

information lifecycle management (ILM) strategies. EMC information infrastructure solutions

are at the heart of this mission, helping organizations manage, use, protect, and share their

information assets more efficiently and cost-effectively. Our world-class solutions integrate

networked storage technologies, storage systems, software, and services.