are you ready for the ‘ava’ generation?...monitor and manage a distributed avamar deployment....

34
EMC Proven Professional Knowledge Sharing 2010 Are You Ready for the ‘AVA’ Generation? Randeep Singh Randeep Singh Solution Architect HCL Comnet

Upload: others

Post on 24-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

EMC Proven Professional Knowledge Sharing 2010

Are You Ready for the ‘AVA’ Generation?

Randeep Singh

Randeep SinghSolution ArchitectHCL Comnet

Page 2: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 2

Table of Contents

Figures………………………………………………………………………………………………….3

Summary……..…………………………………………………………………………………………4

Abstract……….…………………………………………………………………………………………4

Rise of De-Dupe Concept……………………………………………………………………………..5

Terms and Concept in Avamar………………………………………………………………………..7

Introduction of RAIN Technology……………………………………………………………………..9

Active/Passive Replication…………………………………………………………………………….10

Protection of Data in VMware Environment………………………………………………………….13

Tuning Client Cache to optimize Backup Performance…………………………………………….17

Integration with Traditional Backup Civilization………………………………………………………21

Protection of Remote Offices…………………………………………………………………………..23

Architecting Avamar to Ensure Maximum System Availability……………………………………..26

Pro-active Steps to Manage Health and Capacity…………………………………………………..27

Daily Server Maintenance……………………………………………………………………………...29

Tape Integration…………………………………………………………………………………………30

What’s New………………………………………………………………………………………………32

Why EMC Avamar………………………………………………………………………………………33

Disclaimer: The views, processes, or methodologies published in this article are those of the

author. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.

Page 3: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 3

LIST OF FIGURES

Source-based Data De-Duplication ………………………………………………………………...5

Global De-Dupe Concept …………………………………………………………………………….6

Traditional versus Daily Full Backups……………………………………………………………….6

Avamar Client Plug-ins………………………………………………………………………………..8

RAIN Architecture……………………………………………………………………………………...9

Active/Passive Replication……………………………………………………………………………11

Avamar Server Deployed as a Virtual Appliance…………………………………………………..14

Backup Solutions for VMware………………………………………………………………………..15

Processes Used to Filter Out Previously Backed Up Files………………………………………..18

Effectiveness of File Cache for Backing up File Servers versus Databases…………………....18

Avamar and Networker Integration…………………………………………………………………...22

Avamar and Networker Integration (Networker Console)……………………………………….....22

Protection of Remote Offices……………………………………………………………………........24

Avamar Data Transport Components………………………………………………………………...31

Page 4: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 4

SUMMARY Once when I was having a cup of coffee with my students I realized every creature on this earth

whether it’s a man, women, or child produces data. In fact, by 2010 most database sizes

crossed into double digits in terms of petabyte. But data growth is not our most significant

challenge; the main challenge arises when this data needs to be backed up with “on-demand”

24X7 accessibility. And, of course, how can we forget about the IT investment.

My students ask “Is there any way that one can minimize the backup window?” To this, I reply

“well, now it’s time to step into a new AVA generation.” Traditional backup solutions store data

repeatedly and total storage under management gets expanded by 5-10 times. Consider the

case of government regulations; the risk of shipping tapes is one of the greatest security

concerns in today’s IT infrastructure. Also these traditional backup solutions require a rotational

schedule of full and incremental backups. Due to unnecessary data movement enterprises are

often faced with backup windows that roll into production hours, network constraints, and too

much storage to manage. Tape drives, media, and libraries are prone to mechanical failure and

transport of tapes can be unreliable. While disk storage has been used to meet these

challenges, it provides a fraction of the data protection challenges faced by the organizations.

Consider the case of remote offices where the risk of data loss or exposure is extremely high.

The answer to the above challenges is Avamar®.

ABSTRACT EMC Avamar enables fast and reliable backup and recovery across the enterprise which can

include LAN/SAN servers in the data center, VMware environments, remote offices, etc. In order

to achieve this, Avamar utilizes patented global data deduplication technology to identify

redundant sub-file data segments at the source, before it is transferred across the network and

stored to disk. Only new, unique sub-file variable length data segments are moved, thereby

reducing daily network bandwidth and storage by up to 500x. Moreover for security, we can

encrypt data both at rest as well as in flight. For remote offices, Avamar uses centralized

management which protects hundreds of remote offices quite easily and efficiently. Total back-

end storage is also reduced by nearly 50x as Avamar stores only a single instance of each sub-

file data segment. It also utilizes patented RAIN technology to provide fault tolerance across

nodes and eliminate single points of failure. To achieve scalability, Avamar has scalable grid

architecture. It can also work with existing traditional tape solutions such as NetWorker®.

Page 5: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 5

RISE OF DE-DUPE CONCEPT

Looking at the data of enterprises today, we find that data is highly redundant with lots of

identical files and data stored within and across systems. Performing backup using traditional

backup software will store this redundant data again and again. This is where the concept of

deduplication that Avamar uses comes into play, i.e. global, source data deduplication that

eliminates redundancy at both the file and sub-file data segment level. The challenge of

redundancy is solved in backup at the source before it is transferred over the network, whether

LAN or WAN. To ensure that each unique data segment is backed up only once across the

enterprise, Avamar agents are deployed on the systems to be protected that identify and filter

repeated data segments stored across the enterprise over time. As a result, only a small

amount of incremental backup data is generated, whether it is file system or database data.

EMC Avamar uses an intelligent variable length method for determining segment size that looks

at the data itself to determine logical boundary points. This is opposite to the fixed block or fixed

length segment method that traditional backups use, where even a small change in a data set

can change all fixed length segments in a data set. The algorithm used in Avamar analyzes the

binary structure of a data set, i.e. 0s and 1s, which make up a data set to determine segment

boundaries, using which Avamar client agents will be able to identify the exact same segments

for any data set. A variable length segment size averages about 24 KB and then compressed to

an average of 12 KB. For this 24 KB segment, a unique 20-byte ID using SHA-1 encryption

algorithm is generated by Avamar. This unique ID determines whether or not a data segment

has been previously stored.

FIGURE 1 SOURCE-BASED DATA DEDUPLICATION

Page 6: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 6

Figure 1 shows the deduplication concept. Figure 2 shows how it is actually occurring, i.e.

deduplication is occurring before data is transferred across the network. This happens by

identifying the sub-file variable length data segments at the source, breaking data into atoms

and then, using deduplication, sending each atom only once, thereby reducing the data to be

backed up by up to 500 times.

FIGURE 2 GLOBAL DE-DUPE CONCEPT Assuming there is 5 TB of data to be backed up, the data reduction factor will be 100-150x if

there will be a mixture of file system and database data at source and data to be backed up at

source will remain only 35-50 GB. For Windows and UNIX, file data is reduced by nearly 300x

and data to be backed up will remain approximately 20 GB at source. Not only this, it also

results in 85 % reduction in total client CPU utilization.

Figure 3 shows the comparison of traditional backup versus Avamar daily full backups.

FIGURE 3 TRADITIONAL VS AVAMAR DAILY FULL BACKUPS

Page 7: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 7

Below are some of the factors that impact data deduplication ratios:

Type of data

• Duplication in user-generated data is greater than from natural sources

• Encrypted and compressed data are not ideal candidates for deduplication

• More user-created content = higher deduplication ratio

Data change rate

• Small data change rates = more duplicate data in subsequent backups

• Less change = higher deduplication ratio

Retention policy

• Longer retention increases chances data will be repeatedly backed up

• Longer retention policy = higher deduplication ratio

Ratio of full backups to incremental backups

• More full backups increase the amount of data being repeatedly backed up

• More full backups = higher deduplication ratio

TERMS AND CONCEPTS OF AVAMAR Let us begin by learning some terms: Avamar Servers – Logical grouping of one or more nodes used to store and manage client

backups.

Node – Self-contained, rack-mountable network-addressable computer that runs Avamar server

software on the Linux operating system. It is the primary building block in any Avamar server.

Nodes can be any of the following types:

Utility Node – Dedicated to scheduling and managing background Avamar server jobs. In

scalable multi-node Avamar servers, a single utility node provides essential internal services for

the server which can be Management Console Server (MCS), cronjob, External authentication,

Network Time Protocol (NTP), and Web access.

Storage Nodes – store the actual backup data. Multiple storage nodes are configured with

multi-node Avamar servers based upon performance and capacity requirements. Storage nodes

can be added to an Avamar server over time to expand performance with no downtime required.

Avamar clients connect directly with Avamar storage nodes; client connections and data are

load balanced across storage nodes.

Page 8: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 8

NDMP Accelerator node – specialized node that utilizes NDMP in order to provide data

protection for NAS filers (for example, EMC Celerra®, Network Appliance) and Novell NetWare

servers.

Access Nodes – used to run the Avamar File system

Single Node Servers – combines all the features and functions of utility and storage nodes in a

single node.

Hard Disk Storage – Disks used for storage.

Stripes – Unit of disk drive space managed by Avamar to ensure fault tolerance.

Object – Single instance of deduplicated data. Each Avamar object inherently has a unique ID.

Objects are stored and managed within stripes on the Avamar server.

Avamar Administrator – graphical management console software application that is used to

remotely administer an Avamar system from a supported Windows client computer.

Avamar Enterprise Manager – Web-based management interface that provides the ability to

monitor and manage a distributed Avamar deployment.

Avamar Clients – filter out redundant data before sending data over networks, making it

possible to protect systems even over congested LANs or WANs.

FIGURE 4 AVAMAR CLIENT PLUG-INS Agents – Platform-specific software processes running on the client that communicates with the

management console server (MCS) and with any plugins installed on that client. There are two

types of plugins supported.

File system Plugins – used to browse, back up, and restore files or directories on a specific

client file system which can be HP-UX, IBM AIX, Linux, MAC OS X, Microsoft Windows (2000,

2003, 2008, XP, and Vista), Sun Solaris, Novell Netware, and VMware.

Application Plugins – support backup and restore of databases or other special application.

Supported application plugins include; DB2, NDMP for NAS filers, Microsoft Exchange,

Microsoft SQL Server, and Oracle databases.

Page 9: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 9

Avamar Replicator – enables efficient, encrypted, and asynchronous replication of data stored

in an Avamar server to another Avamar server deployed in remote locations without the need to

ship tapes.

INTRODUCTION OF RAIN TECHNOLOGY Assume that your traditional backup solution fails. What will you do? Avamar uses a new

concept of RAIN (Redundant Array of Independent Nodes) technology that provides failover and

fault tolerance across the nodes in an Avamar server grid. If a server node fails or becomes

unavailable, then data stored on any node can be reconstructed from any other node, thus

providing reliable data protection and access. In addition to providing failsafe redundancy, RAIN

can be used when rebalancing the capacity across the nodes once we have expanded the

Avamar server by adding nodes. Through this, we will be able to manage the capacity of the

system as the amount of data added to the system continues to grow. Figure 5 shows the RAIN

architecture.

FIGURE 5 RAIN ARCHITECTURE Operational best practices for using RAIN technology in Avamar

• Always enable RAIN for any configuration other than single-node servers and 1x2’s (two

active data nodes). Minimum RAIN configuration is a 1x4 (four active data nodes plus a

utility node and spare node). Double-disk failures on a node or a complete RAID

controller failure can occur, which can corrupt the data on a node. Without RAIN, the

Page 10: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 10

only recourse is to reinitialize the entire system and replicate the data from the

replication target.

• When deploying non-RAIN servers, we must replicate the data on these servers to

ensure the data is protected. Non-RAIN servers have no data redundancy and so, any

loss of data can have significant and widespread impacts to the backups stored on that

Avamar server.

• When deploying a RAIN configuration, always include an active spare node in the

module. The best way to minimize system downtime is to have a spare node readily

available.

• For performance reasons, Avamar recommends limiting the number of nodes to 16

active data nodes (1x16 configuration). Limit starting configurations to 12 to 14 active

data nodes so that nodes can be added later if needed to recover from high capacity

utilization situations.

ACTIVE/PASSIVE REPLICATION PHASE Replication transfers data from a source Avamar server to a destination Avamar server. The

data replicated to the destination server can be directly restored back to primary storage without

having to be staged through the source Avamar server. Replication is accomplished by way of

highly efficient, asynchronous Internet Protocol (IP) data transfers, which can be scheduled

during off-peak hours to optimize use of network bandwidth. Replication also uses data

deduplication technology that finds and eliminates redundant sequences of data before it is sent

to the destination server, thereby reducing network traffic and promoting efficient use of hard

disk storage.

Using replication, an enterprise can centrally protect and manage multiple remote branch offices

that are using individual single-node servers for local backup and restore. The centralized multi-

node server can then be used for disaster recovery in the event of catastrophic data loss at any

remote branch office. Replication can also be used to replicate data stored in a multi-node

server to any other multi-node server in an enterprise, enabling multi-node servers to provide

peer-to-peer disaster recovery for each other.

If the primary server and the replication target are configured in an active-passive configuration,

then with one change to the DNS entry and, with a few commands, all clients can continue daily

backups to the replication target as shown in Figure 6. In this example of failover, clients still

back up to host name BACKUP_SRVR, but in the DNS server, BACKUP_SRVR is associated

with IP address 192.168.200.1, the utility node of REPL_DST.

Page 11: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 11

Replication is useful for more than recovering from the traditional disaster associated with a site

failure. Replication is the most reliable form of redundancy that the system can offer because it

creates a logical copy of the data from the replication source to the destination. It does not

create a physical copy of the blocks of data, and therefore, any corruptions, whether due to

hardware or software, are far less likely to be propagated from one Avamar server to another.

Also during replication, multiple checks of the data occur to ensure that only uncorrupted data is

replicated to the replication target.

Figure 6 ACTIVE/PASSIVE REPLICATION

There are two kinds of Replication:

• Normal Replication

• Full root-to-root replication.

In Normal replication, user data on the source Avamar server is replicated to a destination

Avamar server. A special “REPLICATE” domain on the destination server is created during the

first replication operation. This domain contains a mirrored representation of the entire source

Page 12: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 12

server client tree on the destination server. All data within the REPLICATE domain is read-only.

The only operations allowed on these backups are:

• Redirected restores to other clients not within the REPLICATE domain.

• Changing a backup expiration date.

• Validating backups on other clients not within the REPLICATE domain.

• Viewing backup statistics.

• Deleting a backup.

Full “root-to-root” replication creates a complete logical copy of an entire source server on the

destination server. Replicated data is not copied to the REPLICATE domain; it is added directly

to the root domain just as if source clients had registered with the destination server and source

server data replicated is fully modifiable on the destination server.

To fully replicate an Avamar server, all of the following data must be copied from the source

server to the destination server during each replication operation:

• Client backups

• Domains, clients, and users

• Groups, datasets, schedules, and retention policies

• State of the server-like contents of the activity monitor and server monitor databases at

the time of the last MCS backup or flush.

LIMITATIONS of REPLICATION:

• Only static data is replicated, i.e. data that is quiescent on source server gets replicated.

Thus, any operation that writes data to the source server and is not completed yet or

running will not be replicated to the destination server during current replication

operation. No doubt, it will be replicated during the next replication operation.

• Avamar Administrator can only manage one server at a time. In an environment with

more than one server, Avamar Enterprise Manager multi-system management console

is used.

BEST PRACTICES FOR REPLICATION:

• Protect the data on the Avamar server by replicating the data to another Avamar server.

• Set up active-passive replication pairs to take advantage of failover capability, i.e. one

Avamar server (active) is dedicated for taking backup of clients and the other Avamar

server (passive) is the dedicated replication target for backup source server. If the active

Page 13: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 13

server fails, then by changing just one DNS entry, all clients will begin backing up to the

passive server as described in the previous image.

• Ensure that available bandwidth is sufficient to replicate all of the daily changed data

within a 4-hour window so that the system can accommodate peaks of up to 8 hours per

day. If using daily replication, avoid using --include flag option as every time a new client

is added to the Avamar system, client data is not replicated until editing repl_cron file to

add --include option for new client. In other words, every time we add a new client to the

active Avamar server, the client data is not replicated unless we edit the repl_cron.cfg

file to add a new --include option for that client. Also, a system that specifies the --

include option might not include the MC_BACKUPS data. That means none of the

policies are then replicated.

• For best results, ensure that the target server is running either the same or later version

of Avamar software as the source Avamar server.

• Use a large time-out setting for initial replication. Recent backups might not get

replicated as replication processes can time out before all backups are successfully

replicated. This is because replication always replicates data backups alphabetically by

client name and earliest backups before later backups.

• Normal source server maintenance tasks such as hfs check and garbage collection

should not be running during the replication session.

• Schedule back ups during period of low backup activity as this will ensure that the

greatest number of client backups will be replicated during each replication session.

PROTECTION OF DATA IN VMWARE ENVIRONMENT There are many benefits of VMware view to IT organizations, but as the amount of data stored

on virtual machines increases, traditional backup finds it difficult to back up such environments.

EMC Avamar delivers fast and efficient protection for such environments by deduplicating data

at the source, thereby sending only unique, sub-file variable length data segments during daily

full backups. Traditional backup solutions will move approximately 200% of primary data on a

weekly basis. In comparison, Avamar, utilizing deduplication, moves approximately 2%. Results

are:

• Up to 95% reduction in data moved

• Up to 90% reduction in backup times

• Up to 50% reduction in disk impact

• Up to 95% reduction in NIC usage

• Up to 80% reduction in CPU usage

• Up to 50% reduction in memory usage

Page 14: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 14

EMC Avamar Virtual Edition for VMware is the industry’s first deduplication virtual appliance for

backup, recovery, and disaster recovery. Avamar Virtual Edition enables users to deploy

Avamar’s deduplication technology easily, effectively, and in a repeatable fashion on VMware

ESX Server hosts. Each virtual appliance supports up to 2 TB of deduplicated backup capacity

compared to a traditional backup schedule that would require approximately 70 TB of tape or

disk storage and can leverage the existing VMware shared server and storage infrastructure to

lower costs and simplify IT management. Avamar Virtual Edition supports VMotion for

deployment flexibility, and up to two Avamar Virtual Edition virtual appliances per ESX server to

provide scalability. Replication between Avamar virtual appliances or from Avamar virtual

appliances to physical Avamar servers eliminates reliance on offsite tape shipments and the risk

of losing unencrypted data. Figure 7 shows the Avamar Server software deployed as a virtual

appliance.

FIGURE 7 Avamar Server Deployed As a Virtual Appliance Key components that need to be protected are:

• Microsoft Active Directory that provides authentication and user configuration to the

VMware view solution.

• View Manager that interacts with vCenter to manage virtual desktop instances.

• Virtual desktop templates.

• vCenter, the centralized management application for the virtual infrastructure.

• ESX Server that provides the virtualization of the desktop instances.

• User data disk is the location of unique user data that is mapped to the virtual desktop

instance upon user login and where users store their individual data.

Page 15: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 15

There can be two data protection strategies for a VMware environment. The first strategy uses a combination of hardware and software to make a simultaneous copy of

the LUNs where the virtual machine desktops are stored, as well as the view manager

application and configuration information. It will allow a total recovery of the entire environment.

A requirement for using this strategy involves making a duplicate copy of the VMware view

environment for presentation to the backup solution. The second strategy involves protecting the key components of VMware view infrastructure

independently using Avamar client software agents. This will avoid the infrastructure cost we

would have incurred if using the first strategy. Figure 8 shows the Avamar client backup solutions for VMware.

Figure 8 Backup Solutions for VMware In the first solution, i.e. at Guest level, the Avamar agent resides inside each virtual machine,

deduplicates data within the virtual machines as if they are physical servers, moving minimum

data to thus reduce resource contention, accelerate backup, and provide file level restore for

Windows, Linux, and Solaris. Advantages and considerations for this method:

Advantages

• Highest level of data deduplication, resulting in maximum storage efficiency

• Guest-level backup easily fits into most existing backup schemes; day-to-day

backup procedures do not change

• Support for fast partial restores of individual directory (folder) or files

• Optional support for application-level support for DB2, Exchange, Oracle,

and SQL Server databases

• No advanced scripting or VMware knowledge required

Page 16: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 16

Considerations The only significant consideration to this approach is that although file and directory restores are

a simple one-step process, full system recovery is a two-step procedure in which you first load a

known-good operating system image inside the virtual machine, then restore the unique data

from the guest-level backups stored on the Avamar server.

In the second solution, i.e. VCB, the Avamar agent resides on the proxy (ESX) server,

deduplicates within and across the VMDK files, and supports both VCB file (Windows only) and

image level backup. Avamar replication provides disaster recovery for backed up VMDKs files.

Advantages and considerations for this method:

Advantages

• Backup resources are offloaded to the VCB proxy server

• Full-image backups of running virtual machines

• File-level backups for Windows virtual machines

• Image virtual disks compressed

• Consolidation ratios optimized for VMware

Considerations

• Restores are a multi-step process

• VCB does not support iSCSI or NAS/NFS

• VCB can only back up a virtual machine with a disk image stored on a device that the

proxy can access

• VCB cannot back up virtual disks that are RDM in physical compatibility mode

• VCB can only back up a virtual machine that has an IP address or domain name server

(DNS) name associated with it

• VCB can only perform a file-level backup of a virtual machine running Microsoft

Windows NT 4.0, Windows 2000, Windows XP, Windows XP Professional, or

Windows 2003

• All backups and restores must be initiated using the Avamar Administrator

Graphical management console; it is not possible to initiate backups and restores

directly from the virtual machine interface In the third solution, i.e. at the Console level, the Avamar agent resides on the ESX console

operating system, deduplicates within and across the VMDK files, and provides VMDK-image

level (no file level) backup without VCB. No proxy server or SAN server is required for this

configuration.

Page 17: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 17

Advantages and considerations for this method:

Advantages

• Virtual machine image-level backups without VCB

• No proxy or SAN required

Considerations

• No file-level backup

• Two-step restore

• Advanced configuration (without proper scheduling and setup, backup jobs can

impact system resources and running virtual machines)

• If we perform a backup by suspending or powering down a virtual machine, and then

back it up directly, the virtual machine must be powered down or suspended before

backup begins. Virtual disk images are not compressed (empty space is processed).

TUNING CLIENT CACHE TO OPTIMIZE BACKUP PERFORMANCE Whenever backup begins, the avtar process loads two cache files into memory from the var

directory that is in the Avamar installation path. The first of the cache files is the file cache

(f_cache.dat). It stores a 20-byte SHA-1 hash of the file attributes, and is used to quickly identify

which files have previously been backed up to the Avamar server. When backing up file servers,

the file cache screens out approximately 98% of the files. When backing up databases,

however, the file cache is not effective since all the files in a database appear to be modified

every day.

The second cache is the hash cache (p_cache.dat). It stores the hashes of the chunks and

composites that have been sent to the Avamar server. The hash cache is used to quickly

identify which chunks or composites have previously been backed up to the Avamar server. The

hash cache is very important when backing up databases.

These client caches are used to:

• Reduce the amount of time required to perform a backup

• Reduce the load on the Avamar client

• Reduce the load on the Avamar server

Figure 9 shows the process that avtar uses to filter out previously backed up files and chunks. It

shows the values that are incremented and reported in the avtar advanced statistics option. First

of all, it starts the file and computes SHA-1 hash of file attributes and checks whether file hash

exists in the cache. If chunk hash exists in hash cache, then it processes next chunk. If it does

not, then a request is sent to the server which checks whether the chunk hash exists on the

Page 18: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 18

Avamar server. If it does, it then processes the next chunk; otherwise, it will send chunk and

chunk hash to the Avamar server.

FIGURE 9 Processes Used to Filter Out Previously Backed Up Files Also consider Figure 10 that shows the effectiveness of file cache when backing up file servers

in contrast to databases. In this example, we have considered 1 TB of file system and 1 TB of

database.

Page 19: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 19

Figure 10 Effectiveness of file cache, backing up file servers vs. databases It is clear from Figure 10 that file cache provides 98% reduction in client load, approximately 980

GB of data is screened out during backup, and only around 3 GB data is sent to the Avamar

server. While in the case of databases, file cache is 0% effective in reducing client load and

around 30 GB of data is sent to the Avamar server.

By default, file cache consume up to 1/8th of the physical RAM on the Avamar client. For

example, if the client has 4 GB of RAM, the file cache will be limited to 4 GB/8, or 512 MB

maximum. The file cache doubles in size each time it needs to grow. The current file cache

sizes are 5.5 MB, 11 MB, 22 MB, 44 MB, 88 MB, 176 MB, 352 MB, 704 MB, and 1,408 MB.

The file cache includes two 20-byte SHA-1 hash entries:

• the hash of the file attributes, which include file name, file path, modification time, file

size, owner, group, and permissions.

• the hash of the actual file content, independent of the file attributes.

File cache rule is: If the client comprises N million files, the file cache must be at least N million

files x 40MB/million files. Example: If a client has four million files, the file cache must be at least

160MB (4 million files x 40MB/million files). In other words, the file cache must be allowed to

grow to 176MB.

Hash cache consume up to 1/16th of the physical RAM on the Avamar client. Example: For

clients with 4 GB of RAM, the hash cache will be limited to 4 GB/16, or 256MB maximum. The

hash cache also doubles in size each time it needs to grow. The current hash cache sizes are

24MB, 48MB, 96MB, 192MB, 384MB, 768MB, and so forth.

Page 20: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 20

So, in this example where a client has 4 GB of RAM, the maximum size of the hash cache will

be 192MB. The hash cache include only one SHA-1 hash per chunk, which is the hash of the

contents of the chunk.

Hash cache rule is: If the client comprises Y GB of database data, the hash cache must be at

least Y GB/average chunk size x 20MB/million chunks. Use 16kB as the average chunk size for

streaming Exchange and Oracle database backups. Use 24kB as the average chunk size for all

other backups, including backups of database dump files.

Example: If an Exchange database client has 250GB of database data, the hash cache must

accommodate 16 million chunks (250GB/16kB). At 20MB per million chunks, this means that the

hash cache must be at least 320MB.

The amount of memory consumed by the avtar process is generally in the range of 20 to 30 MB.

This amount depends on which operating system the client is running, and also fluctuates

during the backup depending on the structure of the files that are being backed up by avtar.

Since the file cache and hash cache can grow to maximum sizes of 1/8th and 1/16th of the total

RAM in the system, respectively, you can see that, for a client that has, for example, more than

1/2 GB of RAM, the file and hash caches contribute more to the overall memory use than the

rest of the avtar process. This is because both caches are read completely into memory at the

start of the avtar backup. Also, by default, the overall memory that client caches use is limited to

approximately 3/16th of the physical RAM on the Avamar client.

Changing Client Cache Size We can override the default limits on the size of the file and hash caches by using

the following two flags:

--filecachemax=VALUE --hashcachemax=VALUE Where VALUE is an amount in MB or a fraction (negative value = fraction of RAM).

As described previously, the default values are:

--filecachemax=-8 --hashcachemax=-16 For example, the following option will limit the file cache to 100 MB. Because the file cache

doubles in size every time it needs to grow, the file cache will actually grow to 88 MB if the

following option is used:

--filecachemax=100

Page 21: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 21

To optimize performance, sometimes we need to increase the cache sizes from the default

values. These situations arise if there are either millions of small files or large databases. Millions of Small Files If the client has millions of small files, then we might need to increase

the file cache from the default size. The general rule is the client requires 512 MB of physical

RAM for every one million files on the Avamar client. If a client has one million files, a minimum

of 40MB is required just to store all the file hashes for a single backup, since there are two 20-

byte entries per file. Since the file hashes must be stored for several backups, more than this is

required. As discussed previously, the file cache doubles in size each time it needs to grow. The

current file cache sizes are: 5.5 MB, 11MB, 22MB, 44MB, 88MB, 176MB, 352MB, 704MB, and

1,408MB. This means growth to about 44MB occurs. By default, since 1/8th of the physical

RAM of 512MB is used, cache can grow to a limit of 64MB, which means that the default value

of 1/8th of RAM for the file cache is adequate. Large Databases If the client has a few large files, the default of 1/16th for the hash cache is

usually insufficient. For example, for a 240GB database, a cache of up to 10 million hashes is

required. Since each hash is 20 bytes, a hash cache that is at least 200MB is required, and that

can only store the hashes for a single backup. The hash cache also doubles in size each time it

needs to grow. The current hash cache sizes are: 24MB, 48MB, 96MB, 192MB, 384MB,

768MB, and 1,536MB. The next increment available is 384MB. Therefore, if this client has 4 GB

of RAM, the hash cache must grow to 1/8th of the RAM. If the default of 1/16th of the RAM is

used, then the hash cache will be limited to 192MB, resulting in an undersized hash cache. In

the case of databases, since very few files are backed up, the file cache will be considerably

smaller, so the net memory use is still about 1/8th to 3/16th of the RAM. Operational Best Practice: Never allow the total combined cache sizes to exceed 1/4 of the

total available physical RAM. INTEGRATION WITH TRADITIONAL BACKUP CIVILIZATION A growing number of organizations are considering the benefits of data deduplication to reduce

the cost of disk-based backup and recovery. A recent ESG Research Survey indicates that 11%

of respondents are currently using data deduplication and 32% plan on doing so. Until recently,

the only way to introduce data deduplication into an existing backup and recovery environment

was to swap tape or disk-based recovery hardware for an appliance that supports data

deduplication. While these hardware-based solutions can deduplicate backup data stored on

disk, they do nothing to reduce the amount of data flowing over the network and can create

islands of deduplication. ESG Lab has confirmed that EMC NetWorker can be easily deployed,

leveraging Avamar deduplication technology to create a global pool of deduplicated data that

Page 22: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 22

drastically reduces disk and network requirements while preserving the familiar look and feel of

NetWorker.

The latest release of Avamar has been integrated with the NetWorker backup product. As a

result, NetWorker users can now take advantage of Avamar data deduplication using familiar

NetWorker interfaces, workflows, and backup policies. Also, it will provide centralized control of

traditional and next-generation backup.

NetWorker clients can be configured to take advantage of Avamar data deduplication to

dramatically decrease the amount of time, network bandwidth, and disk capacity required to

perform backup jobs. This is accomplished by installing a version 7.4 or later NetWorker agent

with built-in Avamar support on an application server. The agent performs source-side global

data deduplication before forwarding backup data to a NetWorker server and an Avamar Data

Store. Figure 11 shows a pictorial representation of how this is happening.

Figure 11 Avamar and Networker Integration Configuring NetWorker to deduplicate data can be performed in three simple steps:

• Select Avamar as a NetWorker managed application

• Create a deduplication node

• Discover Avamar deduplication node

Select the deduplication backup checkbox as shown in Figure 12.

.

Page 23: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 23

Figure 12 Avamar and Networker Integration (Networker Console) ESG Lab used NetWorker with Avamar-enabled deduplication to back up a Windows virtual

server. Here are the results of the test.

• Backup took 5 minutes 31 seconds to back up 15 GB of data using an existing

NetWorker backup policy.

• Restore of a 10 MB log file took two seconds.

• NetWorker deduplication backup summary report yielded that the latest full backup of

the Windows virtual server protected 15,261 MB of data, yet only 127 MB of data was

transferred.

PROTECTION OF REMOTE OFFICES

Information critical to the success and efficiency of an organization is not just found in corporate

data centers; it also resides at remote and branch offices. Remote and branch offices (ROBOs)

contain a large amount of enterprise data, yet most still rely upon traditional tape-based backup

methods manned by non-IT staff. Tape backup, snapshots, mirroring, staging, and archiving all

significantly increase the amount of storage under management. Non-IT staff handling backups

and tapes increases the cost of data protection and the risk of data loss. In a distributed

environment, it is impractical to centralize backup operations with traditional backup

architectures and existing network capacity. Moving even daily incremental backup data sets

requires so much network bandwidth and time that even this simple process can become

prohibitively expensive and inefficient.

Page 24: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 24

Avamar addresses this issue by improving backup processes in several ways. Replicating data

over a WAN instead of shipping tapes can be cost-prohibitive because of the significant amount

of data that needs to be transferred across the network. Avamar employs capacity reduction

technology and global data deduplication across sites and servers with a lightweight agent

running on the client systems it protects.

Avamar requires a one-time-only full backup where redundant data is eliminated and after that,

only new, unique sub-file data segments are backed up daily across the WAN to the data center

or DR site. The software identifies and filters repeated variable length data segments stored in

files within a single system and across multiple systems over time, capturing only net new

segments. In addition, data compression algorithms are applied to eliminate space and

redundant file patterns, further shrinking data volumes.

Avamar source-based global data deduplication significantly reduces the amount of data sent

over the WAN, enabling fast, daily full backups for remote office data. Backups are centrally

managed from a corporate data center using the intuitive Avamar Management console and

data can be encrypted for security.

Figure 13 illustrates how remote and branch offices using Avamar can reduce the amount of

data being transferred during backup (labeled ―Bandwidth Efficient�) by sending only unique

blocks of data to a centralized data center. At smaller remote offices, only Avamar software

agents are deployed on the systems, while larger remote offices and data centers typically

deploy a local Avamar server to improve recovery performance.

Figure 13 Protection of Remote Offices

Page 25: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 25

Avamar also supports disaster recovery and offsite archival by replicating backup data over a

wide area network to a remote data center. Data deduplication and replication capabilities

ensure that only changes since the last backup, comprised of unique and compressed sub-file

variable length data segments, are replicated. This is extremely valuable, reducing the cost of

network bandwidth and improving the performance of backups and replication over the WAN.

There are 2 ways to back up clients in remote offices:

• Back up remote office clients to a small Avamar server located in the remote office

(remote Avamar backup server) and replicate data to a large centralized Avamar server

(centralized replication destination).

• Back up those clients directly to a large centralized Avamar server (centralized Avamar

backup server) and replicate data to another large centralized Avamar server

(centralized replication destination).

Which method is best depends upon the following factors:

• RTO (Recovery Time Objective): The first method is more reliable because restore can

be done directly from that server across the local area network to the client.

• Server Administration: The amount of administration and support required is roughly

proportional to the number of Avamar servers deployed in an environment. Accordingly,

option 2 is more reliable.

The tradeoff then becomes RTO versus the additional cost of deploying, managing, and

supporting multiple Avamar server instances.

According to the operational best practice, unless we cannot meet our restore time objectives,

architect the system so that clients back up directly to a large active centralized Avamar server

and then replicate data to another large, passive centralized Avamar server.

Best practices recommended for taking backup of remote clients:

• Copy the data to a portable storage device (such as a USB hard drive)

• Install the Avamar software on a client local to the Avamar server

• Attach the portable storage device to the local client

• Perform a one-time backup of all the data from the portable storage device

• Start to back up the remote client

This procedure ensures that when the initial remote client backup takes place, it will only need

to transfer data which has changed or has been added since the seeding process occurred.

Page 26: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 26

ARCHITECTING AVAMAR TO ENSURE MAXIMUM SYSTEM AVAILABILITY In this section, we will consider the factors involved in architecting Avamar to ensure maximum

system availability. These factors include RAIN, RAID, replication, checkpoints, and backup of

remote offices. Of these factors, we have already discussed RAIN, replication, and how to

protect remote offices to ensure maximum system availability. We will discuss RAID and

checkpoints in this section.

RAID

An individual drive has a certain life expectancy before it fails, as measured by MTBF (Mean

time Before Failure). Since there are potentially hundreds or even thousands of drives in a disk

array, the probability of a drive failure increases significantly. As an example, if the MTBF of a

drive is 750,000 hours, and there are 100 drives in the array, then the MTBF of the array

becomes 750,000/100, or 7,500 hours. RAID (Redundant Array of Independent Disks) was

introduced to mitigate this problem.

RAID combines two or more disk drives in an array into a RAID set or a RAID group. The RAID

set appears to the host as a single disk drive. Properly implemented RAID sets provide:

• Higher data availability

• Improved I/O performance

• Streamlined management of storage devices

All standard Avamar server node configurations use RAID to protect the system from disk

failures. The Avamar server nodes are typically configured with SCSI hard drives in a RAID 5

configuration or SATA hard drives in a RAID 1 configuration.

Failed drives impact I/O performance and affect Avamar server performance and reliability.

Further, RAID rebuilds can significantly reduce the I/O performance, adversely impacting

performance and reliability of the Avamar server.

Operational best practices for implementing RAID:

• Protect the data on the Avamar server by using RAID to protect against individual hard

drive failures.

• Set the RAID rebuild to a relatively low priority.

• Always set up log scanning to monitor and report hardware issues; email home feature

can be set for this.

• For hardware purchased separately, the customer must regularly monitor and address

hardware issues promptly.

Page 27: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 27

CHECKPOINTS A checkpoint is a snapshot of the Avamar server taken for the express purpose of facilitating

server rollbacks. If there are backups running, the server will suspend the backups, make the

system read-only, take the checkpoint, make the system read-write, and then resume the

backups. Checkpoints are taken to assist with disaster recovery and provide redundancy across

time. These are the consistent snapshots of the entire Avamar system that can be verified for

integrity. If an integrity check fails due to an inconsistency, and the inconsistency cannot be

fixed, the system can be quickly rolled back to a prior checkpoint. Checkpoints allow recovering

from operational issues and certain kinds of corruption by rolling back to the last validated

checkpoint. Although checkpoints are an effective way to revert the system to an earlier point in

time, checkpoints are like all other forms of redundancy and therefore, require disk space. The

more checkpoints we need to retain, the larger the checkpoint overhead. Operational best

practice is to leave the checkpoint retention policy at the default values, typically set to retain the

last two checkpoints and the last validated checkpoint. PROACTIVE STEPS TO MANAGE HEALTH AND CAPACITY Capacity is one of the most important aspects of administering an Avamar server. There are two

main components that define Avamar Server capacities:

• Storage Subsystem (GSAN) Capacity

• Operating System Capacity

GSAN capacity is the total amount of commonality factored data and RAIN parity data (net after

garbage collect) on each data partition of the server node. This is measured and reported by the

GSAN process. Operating System Capacity is the total amount of data in each data partition, as measured by

the operating system, which not only includes the size of all the stripes in the current working

directory, but also the checkpoint overhead and the amount of data stored on the data partition

by other activities.

Now we will describe how Avamar server will behave as it crosses various consumed storage

thresholds.

100% — The “Server Read-Only Limit.” When server utilization reaches 100% of total storage

capacity, it automatically becomes read-only. This is done to protect the integrity of the data

already stored on the server.

95% — The “Health Check Limit.” This is the amount of storage capacity that can be

consumed and still have a “healthy” server.

Page 28: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 28

Although an Avamar server could be allowed to consume 100% of available storage capacity, it

is not a good practice to actually let that occur. Consuming all available storage can prevent

certain server maintenance activities from running, which might otherwise free additional

storage capacity for backups. For this reason, a second limit is established called the “health

check limit.” This health check limit is derived by subtracting some percentage of server storage

capacity from the server read-only limit. The default health check limit is 95%. We can

customize this setting according to a specific site but setting this limit higher than 95% is not

recommended. When server utilization reaches the health check limit, existing backups are

allowed to complete, but all new backup activity is suspended. Notifications are also sent in the

form of a pop-up alert each time we log into Avamar Administrator and a system event will

require acknowledgment before any future backup activity can resume.

80% — Capacity Warning Issued. When server utilization reaches 80%, a pop-up notification

will inform that the server has consumed 80% of its available storage capacity. Avamar

Enterprise Manager capacity state icons are yellow.

Icons that communicate capacity levels

The most significant daily operational impact of capacity is to backup performance and

hfscheck. Operational best practices recommendations:

• Understand how to monitor and manage the storage capacity of the Avamar server on a

daily basis.

• Limit storage capacity usage to 80% of the available "server utilization" capacity. Proactive steps to manage capacity:

When the GSAN capacity exceeds 80% of the final Read Only threshold, perform the following

operational best practices:

• Stop adding new clients to the system.

• Adjust retention policies to decrease the retention period of backups if possible and thus,

reduce capacity use.

Page 29: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 29

• Check if backups are preventing a garbage collect operation from starting. If so, the

following error message is logged in the /usr/local/avamar/var/cron/gc.log file

MSG_ERR_BACKUPSINPROGRESS or garbage collection skipped because

backup is in progress.

• If possible, decrease the GSAN capacity by deleting or expiring backups and running

garbage collection. Note that deleting backups does not free space until garbage collect

has run several times. Garbage collect finds and deletes the unique data associated with

these backups.

• Replicate the data to another server temporarily, and then replicate the data back after

reinitializing the server. Because replication creates a logical copy of the data, this

compacts all the data onto fewer stripes.

• For multi-node Avamar servers utilizing RAIN, add nodes and rebalance the capacity.

DAILY SERVER MAINTENANCE:

Daily Avamar server maintenance comprises three essential activities:

• Checkpoint (discussed above)

• Garbage collection

• HFS check

Garbage Collection Garbage collection is the process of recovering unused space from

backups that have expired. The garbage collection process requires a quiet system in order to

run — garbage collection will not initiate if any backups are running.

HFS check A Hash File system (HFS) check is an internal operation that validates the integrity

of a specific checkpoint. Once a checkpoint has passed a HFS check, it can be considered

reliable enough to be used for a system rollback. In order for a HFS check to successfully start,

there must be two or fewer currently running backups per storage node.

Avamar servers define two daily “maintenance windows,” during which some or all of the

essential server maintenance activities are performed. The default start times for these

maintenance windows are:

• 6:00 A.M. (morning)

• 6:00 P.M. (evening)

Page 30: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 30

The following table lists the essential server maintenance activities, i.e. checkpoint,

HFS check, and garbage collection, and summarizes each activity’s impact on

normal system operation.

CHECKPOINT GARBAGE COLLECTION

HFS CHECK

Runs in morning? Yes Yes Yes

Runs in evening? No No No

Typical run time 5 minutes -

1 hour

2 hours max Several hours

Starts if backups

are running?

Yes No Only if two or fewer

backups are running

on each node

Effect on running

backups

Temporarily

suspended

Not applicable Unaffected

Effect on new

backups

Can start Cannot start Can start

System state

during activity

Read-only Read-only Read-write

TAPE INTEGRATION

Avamar Tape Output option provides the ability to archive backup data stored within an Avamar

server to tape via a disk cache. This option utilizes a disk staging area (for example, a media

server with a disk pool or a file server) to temporarily stage Avamar backup data in un-

deduplicated format, which is then retrieved by any standard third-party tape-based backup

software solution to write the data to tape. The Avamar Tape Output option is configurable to

output to tape either all or a subset of the backup data stored within the Avamar server, and can

also be tuned based on the size of the disk staging area. A restore is performed by a restore

directly from tape to the appropriate client(s). The Avamar Tape Output option provides the

following advantages:

• A restore can be performed at the granularity of a single file.

• A restore can be performed directly from the tape software to a client without having to

restore via the Avamar server.

• ACL’s and alternative data streams are maintained on tape if the operating system of the

disk staging area system matches the operating system of the original client that was

backed up.

Page 31: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 31

Avamar has introduced Avamar Data Transport, an Avamar server that runs as a virtual

machine in a VMware ESX Server (3.x) environment. Avamar Data Transport is a hardware and

software system that transports deduplicated data from an Avamar server to tape. Transport

nodes are sized at 1.0 TB nodes. A transport node functions as a target of replication from a

primary Avamar server which, after replication is complete, is moved to tape. Transport nodes

can also be recovered from tape, allowing backed up data to be accessed by Administrators or

users. Avamar Data Transport 1.0 supports a maximum of four transport nodes. Following are

the terms that will be useful for understanding its working:

Avamar Data Transport Framework The framework is a service-oriented architecture that

provides applications with common services such as communication, security, and event

logging.

Control Node The control node is a 64-bit Linux host where the Avamar Data Transport

application and the Avamar Data Transport Framework run. The Avamar Data Transport user

interface is accessed from a web server that is installed on this host. In addition, the database of

all transported files is often located on the control node. There is only one control node in an

Avamar Data Transport system.

Tape Backup Server Avamar Data Transport supports NetWorker or Symantec Net Backup

tape backup products. Both of these products have a server component that drive the tape

archiving process.

Transport Node Transport nodes are Avamar Virtual Edition servers that have been modified

specifically for use in the Avamar Data Transport system. They are the targets of replication

from Avamar servers, and are used as backup and restore target by the tape backup server.

The system consists of the following physical components:

• Physical Avamar server

• ESX server for hosting control and transport nodes

• Tape backup server configured with NetWorker or Symantec NetBackup

• Tape library, tape drives, and tapes

Page 32: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 32

FIGURE 14 Avamar Data Transport Components

Before installing Avamar Data Transport, the components mentioned above must be running in

your network environment. After the pre-installation checks have been made, install and

configure the Avamar Data Transport components (as shown in Figure 14) on the appropriate

hosts in the system:

• Avamar Data Transport System Service (on the Avamar servers)

• Control node (on the ESX server)

• Transport nodes (on ESX server)

• Transport node services (on transport nodes)

• Avamar Data Transport groups or policies on the tape backup server

Installation and configuration of these components occurs in the following order, as shown in

Figure 14:

1. Control node

2. Avamar Data Transport System Service on the Avamar server (or servers)

3. Transport nodes

4. Tape backup server and tape backup clients

5. Transport node Services

After installing these components, Avamar Data Transport is configured to transport

deduplicated data to tape. This way, the Avamar Data Transport node is used for tape out

mechanism and ships the data backed up to tape.

Page 33: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 33

WHAT’S NEW Avamar Desktop/Laptop introduction in Avamar 5.0 Recently EMC introduced Avamar 5.0, designed for backing up desktops and laptops. The

Avamar Desktop/Laptop features are designed to enhance the functionality of the Avamar client

for Windows and Mac desktops and laptops. Avamar Desktop/Laptop is based on the Avamar

client software. It adds enhanced features for an enterprise’s desktop and laptop computers.

These features include:

• Web browser UI that provides manual backup, restore by search, restore by browse, and

backup history.

• UI available in ten languages.

• Domain-oriented data security. Users are authenticated through your enterprise’s

Active Directory or Open LDAP-compliant directory service. Users can also be

authenticated using Avamar authentication.

• User-initiated backups. Users can manually initiate a backup from the client.

• Users can restore folders and files directly to the original location.

• Creation of a restore set by search or directory-tree browse. Users can use search or

can browse a backup directory-tree to create a set of folders and files to restore.

• Backup history. Users can view the folders and files backed up in their last 10 successful

backups.

• Deployment of Avamar clients using common Systems management tools. Avamar

Desktop/Laptop can be push installed on Windows and Mac desktop and laptop

computers using Systems management tools, such as Microsoft Systems Management

Server 2003 (SMS).

WHY AVAMAR?

• Solves the challenges associated with traditional backup by using patented global,

source data deduplication technology to identify redundant data segments at the source

before transfer across the network.

• Enables fast, reliable backup and recovery for VMware environments, remote offices,

and LAN/NAS servers across existing networks and infrastructure.

• Data encrypted in flight and at rest, thus adding security.

• RAID protection from disk failures.

• Supports SAN or internal disk storage.

• Scalable grid architecture providing scalability.

Page 34: Are You Ready for the ‘AVA’ Generation?...monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making

2010 EMC Proven Professional Knowledge Sharing 34

• Fault tolerance across nodes, thereby eliminating single points of failure and providing

high availability using RAIN technology.

• Eliminates manual recovery drill checks by ensuring that checks run daily in order to

ensure data integrity.

• New design and features for desktop and laptop solution.

• Centralized web-based management provides ease of management.

• Eliminates LAN backup bandwidth bottlenecks; reduces daily network bandwidth by up

to 500x using deduplication.

• Provides an alternative to archaic IT processes (shipping tapes for disaster recovery),

providing automated, encrypted remote copy over existing WANs, thereby mitigating risk

involved with shipping tapes.

• Integration with exiting backup infrastructure and software.

• Simple one-step recovery.

• Reduction of total back-end storage by up to 50x for cost-effective, long-term, disk-

based recovery.