Transcript

1© Copyright 2010 EMC Corporation. All rights reserved.

Next-Generation Data protection

Deduplikacjakluczowy element backupu nowej generacji

Piotr Nogaś

BRS EMEA EE TC Manager

2© Copyright 2010 EMC Corporation. All rights reserved.

(8/11/10): F1000 Sample. Q4 ‟07, n=151; Q3 ‟08, n=140; Q2 ‟09, n=155; Q4 ‟09, n=185; Q2 ‟10, n=166. *Note that due to

multiple responses per interview, total exceeds 100%.

F1000 Storage Professionals‟ Pain Points

What are your top storage-related pain points?

0% 20% 40% 60% 80%

Other

Managing Storage Equipment

Power Management

Vendor Management

Application Recoveries and/or Backup Retention

Regulatory Compliance

Data Mobility

Storage Provisioning

Archiving and Archive Management

Dealing With Performance Problems

Lack of Integrated Tools

Managing Complexity

Backup Administration and Management

Managing Costs

Proper Capacity Forecasting and Storage Reporting

Managing Storage Growth

Q4 '07

Q3 '08

Q2 '09

Q4 '09

Q2 '10

3© Copyright 2010 EMC Corporation. All rights reserved.

DataCenter wavespast and future

• storage consolidation

• servers consolidation

• server virtualization

virtual environments

protection:– VMWARE (API)

– HyperV,

– XEN (host based)

– virtual partitions via FC/VTL

need for hundred+ of tape

drives emulation and high

concurency

• storage virtualization – dedupliacation appliances for

backup

– dedupliacation on tier 1

storage (primary)

• virtualization everywhere– VDI

– cloud

4© Copyright 2010 EMC Corporation. All rights reserved.

Storage OPTIMIZATION Deduplication

OPTIMIZATION

Server &

Primary

Storage

Server/Storage

Consolidation

/Virtualization

Network OPTIMIZATIONWAN

Optimization

Optimization Technologies Center Stage

5© Copyright 2010 EMC Corporation. All rights reserved.

Backup Environments Transformationroot causes

Unabated data growth

• Backup = 4 to 30 times production capacity

• Full backups kept for months or years

• New requirements to keep mo re data for longer periods

0

8

12

Zett

ab

yte

s

4

10

16

Source: IDC Digital Universe Study, sponsored by EMC, May 2010;

chart does not include data that does not need protection

2010 2012 2014 2016 2018 2020

0

1,000

2,000

Exab

yte

s

500

1,500

2,500

Digital Information Created and Replicated WorldwideFive times growth in four years

2008 2009 2010 2011 2012

Source: IDC Digital Universe white paper, sponsored by EMC, May 2009

Needing Protection

Protected

Unprotected in

2010 = Size of

entire digital

universe in 2018

6© Copyright 2010 EMC Corporation. All rights reserved.

Major Trends Driving the Transformation of Backup EnvironmentsServer virtualization

• Increased complexity

• Virtual machine sprawl

• High utilization, little bandwidth for backup

Old Paradigm20% resource utilization

CP

U U

tiliz

atio

n

100%

80%

40%

0%

60%

20%

New Paradigm80% resource utilization

Shared Physical Resources

CP

U U

tiliz

atio

n

100%

80%

40%

0%

60%

20%

VMware ESX Server

Hardware

7© Copyright 2010 EMC Corporation. All rights reserved.

“The process of detecting and identifying

the unique data segments within a given

set of information, enabling the elimination

of redundancy when stored or moved.”

Before:

total segments = 39

After:

Unique segments = 6Data Set 3

Data Set 2

Data Set 1

Deduplication

What is Data Deduplication?

8© Copyright 2010 EMC Corporation. All rights reserved.

Replicate smarter.

Move only deduplicated data over existing networks

with up to 99% bandwidth efficiency for cost-effective

disaster recovery.

By designed plenty of duplicated data Standard backup schedule with 91 days retention

(full +6 diff/incr) can contain same data 15+ times.

Keep logical copies vs physical. Deduplicate for

capacity and SLA

Recover reliably.Continuous fault-detection and self-healing ensure

data recoverability to meet SLAs.

Why Deduplicated Backup?

WAN

9© Copyright 2010 EMC Corporation. All rights reserved.

Prawie Robi Różnicędeduplication - principles

• Fixed vs Dynamic block – capacity requirements

• Number of streams per appliance

• Robustness/security MD5 vs. SHA-1http://en.wikipedia.org/wiki/MD5#Collision_vulnerabilities

http://en.wikipedia.org/wiki/SHA-1#Comparison_of_SHA_functions

• Time for SHA-1:• HW support

• Multicore CPUs

• Smart not hard – rainbow tables in deduplication algorithms

http://pl.wikipedia.org/wiki/T%C4%99czowe_tablicehttp://kestas.kuliukas.com/RainbowTables/

– Reducing CPU and disk cycles

10© Copyright 2010 EMC Corporation. All rights reserved.

Architecture AdvantageVariable vs. Fixed

Variable Segment deduplication significantly reduces:

Power, Cooling, Management, Complexity

100TB lives on

50TB

100TB lives on 33TB

100TB lives on 25TB

(4:1 is 25TB)

100TB lives on 5TB

100TB lives on

100TB

File Level

Fixed Block

Variable Block

Whitespace

Reduction

11© Copyright 2010 EMC Corporation. All rights reserved.

Type of dataMore user created, unstructured, content*

= higher deduplication ratio

*Encrypted and compressed data not ideal

deduplication candidates

Factors Impacting Deduplication RatiosSmall variations can have big impact

Data change rateLess change = higher deduplication ratio

Retention policyLonger retention policy

= higher deduplication ratio

Full to incremental

backup ratioMore full backups = higher deduplication ratio

12© Copyright 2010 EMC Corporation. All rights reserved.

Real World ResultsAvamar daily full backups vs. traditional daily full backups

Data TypeAmount of Primary

Data Backed Up

Amount of Data

Moved Daily

Windows file systems 3,573 GB 6.1 GB

Mix of Windows, Linux, and UNIX file systems 5,097 GB 11.7 GB

Engineering files on NAS (NDMP backups) 3,265 GB 24.2 GB

Mix of 20% databases, 80% file systems (Windows and UNIX) 9,583 GB 80.0 GB

Mix of Linux file systems and databases 7,831 GB 104.2 GB

Source: EMC

While results will vary by data type and mix, Avamar can

dramatically improve backup performance and efficiency

13© Copyright 2010 EMC Corporation. All rights reserved.

VMware Guest BackupSmart not Hard - Avamar backup versus traditional backup

Traditional Avamar

CPU Usage

1:20 p.m. 1:30 p.m. 1:40 p.m.

Network Usage

1:20 p.m. 1:30 p.m. 1:40 p.m.

Disk Usage

1:20 p.m. 1:30 p.m. 1:40 p.m.

14© Copyright 2010 EMC Corporation. All rights reserved.

VMware vStorage APISmart not Hard - Avamar

Key Features:

• Integrated with vStorage

API

• Single-step file & image-

level backups & restores

• Option to leverage

change block feature –

greatly reduces backup

processing

• Restore to the original,

new, or configure a new

virtual machine

capability

• Round-robin VM backup

capability across

multiple proxies

vStorage API virtual proxy server with Avamar agent

Avamar client software runs on the proxy server

ResourcePool

VMware Virtualization Layer

x86 Architecture

Physical server

Virtual Machines

SANstorage

Avamarserver

Mount

= Avamar Software Agent

VMware Image Backup

15© Copyright 2010 EMC Corporation. All rights reserved.

Data Domain Boost Integration

• Deduplication distributed to

backup servers and

Microsoft application clients– Increases backup speed

– Reduces network traffic

• Clone-controlled replication

– Schedules replication

– Catalog awareness of

replicated copies

• Ease of use– Automated configuration

– Monitoring and reporting

NetWorker Data Domain

DD Boost DD Boost

N E T W O R K E R A N D D A T A D O M A I N

16© Copyright 2010 EMC Corporation. All rights reserved.

Capacity management Single SIS make difference

• restriction on multiple SIS – No storage node/media svr load balancing between SIS

– Management overhead ( multiple instance of appliance and configs

eg. Replication)

– Efficiency (more reserved storage required)

• performance consideration– wise use of SAN/LAN infrastructure with:

client side deduplication

Dedup Replication

– leveraqe 10GbE with OST/DD Boost

• SLA improvement – Backup windows and Recovery Time objective

– Reduce backup jobs load on production systems

17© Copyright 2010 EMC Corporation. All rights reserved.

Industry‟s Most Scalable Inline Deduplication Systems

DD140 DD610 DD630 DD670 DD860 DD890Global

Deduplication ArrayDD Archiver

Speed (DD

Boost)490 GB/hr 1.3 TB/hr 2.1 TB/hr 5.4 TB/hr 9.8 TB/hr 14.7 TB/hr 26.3 TB/hr 9.8 TB/hr

Speed (other) 450 GB/hr 675 GB/hr 1.1 TB/hr 3.6 TB/hr 5.1 TB/hr 8.1 TB/hr 10.7 TB/hr 4.3 TB/hr

Logical capacity 9–43 TB 40–195 TB 84–420 TB 0.6–2.7 PB 1.4–7.1 PB 2.9–14.2 PB 5.7–28.5 PB 5.7–28.5 PB

Raw capacity 1.5 TB Up to 6 TBUp to 12

TBUp to 76 TB

Up to 192

TB

Up to 384

TBUp to 768 TB

Up to 768

TB

Usable capacity 0.86 TB Up to 3.98 TBUp to 8.4

TBUp to 55.9 TB

Up to 142

TB

Up to 285

TBUp to 570 TB

Up to 570

TB

Sof tware options:

DD Boost, DD Virtual Tape Library, DD Replicator,

DD Retention Lock, and DD Encryption

DD140 Remote

Off ice Appliance

DD600

Appliance Series

DD Archiver

Global Deduplication

ArrayDD800

Appliance Series

18© Copyright 2010 EMC Corporation. All rights reserved.

Replicate Smarter with Existing Networks99% bandwidth efficiency

• Move data offsite over existing networks for fastest time-to-DR readiness

• Map method to application recovery requirements and DR policies

Deduplicated backup

WAN

Flexible replication

On Premise

Home

DB

Off Premise

Home

DB

19© Copyright 2010 EMC Corporation. All rights reserved.

Recover Reliably from DiskHighest levels of data integrity

• Backups are the data store of last resort

• Improve your recovery SLAs with the advantages of disk-based data

protection

Verification

All data is read and

verified after it is written

Home

DB

VM

Home

DB

VM

Self-healing

Continuous on-the-fly

error detection and

correction

20© Copyright 2010 EMC Corporation. All rights reserved.

NetWorker with Data Domain

• Use with existing disk-based

or virtual tape library

capabilities

• Use with DD Boost

– Improved performance

– Clone-controlled replication

– Automated configuration

– Monitoring and reporting

Primary data center Remote site

Replication

WAN

Data Domain

File systems and applications

NetWorkerNetWorkerNetWorkerNetWorker

N E T W O R K E R A N D D A T A D O M A I N

24© Copyright 2010 EMC Corporation. All rights reserved.

Backup/archiving as a service:

• Application/databases native tools via:

– NFS

– CIFS

– VTL

• Rman „as copy” clause usecase for cloning

production to Dev & QA

• archiving and backing up

– SourceOne

– etc.

Backup and archives share same data.

25© Copyright 2010 EMC Corporation. All rights reserved.

“EMC Data Domain is just disk to me. Changing RMAN scripts to go

straight to Data Domain disk was simple.”DBA Manager

Oracle RMAN to DiskNational supermarket chain testimonial

“We used to have to go through our backup team for recovery requests and 90%

of our actual restore time was spent waiting on tape and administration. With Data

Domain, I don't have to wait for someone else to satisfy a restore request or a tape

recall.” DBA Manager

26© Copyright 2010 EMC Corporation. All rights reserved.

RMAN> ALLOCATE CHANNEL CH1 DEVICE TYPE DISK FORMAT „/dd/backup/ora.weekly/%U‟;RMAN> ALLOCATE CHANNEL CH2 DEVICE TYPE DISK FORMAT „/dd/backup/ora.weekly/%U‟;RMAN> BACKUP AS COPY TAG „MAY9‟ DATABASE INCLUDE CURRENT CONTROLFILE;RMAN> BACKUP ARCHIVELOGS TAG „MAY9‟ ALL NOT BACKED UP DELETE ALL INPUT;

Target DB

1 TB

Deduplication applied to

fulls requiring much less

disk

Weekly: Full image backups

Full

500 GB500 GB

After: With deduplication

Weekly Full Backup – With Deduplication

27© Copyright 2010 EMC Corporation. All rights reserved.

Data Domain Archiver Cost-optimized long-term retention

• Data Domain system for backup and archive

– Active tier: short-term data protection; less than 90 days

– Archive tier: scalable long-term retention; multiple years

• High-throughput deduplication storage

– Up to 9.8 TB/hr

• Cost optimized for long-term retention

– Up to 570 TB usable, 28.5 PB logical capacity

– Low cost per gigabyte while maintaining high throughput

– Fault isolation of archive units for long-term recoverability

• Easily integrates with all leading backup and archive

applications

• Leverage existing Data Domain system advantages

– Supports DD Replicator and DD Retention Lock software options

– Data Domain Data Invulnerability Architecture to ensure data integrity

28© Copyright 2010 EMC Corporation. All rights reserved.

“Deduplication has become a must-have feature for vendors in the

backup/recovery market. The value of data reduction technologies,

such as deduplication, cannot be understated.

In May 2007, Gartner called deduplication a transformational

technology with the potential for significant cost savings and

expanded QoS capabilities (see "Data Deduplication Is Poised to

Transform Backup and Recovery"). We reiterate this assessment,

and we frequently advise clients to investigate deduplication

technologies for use in addressing current and anticipated storage

challenges.”New Storage Solutions Can Modernize Data Life Cycle Management

Sheila Childs and Dave Russell, Gartner

February 24, 2010

Data Deduplication a Must-have Feature

29© Copyright 2010 EMC Corporation. All rights reserved.

40%

27%

24%

15%

9%

22%

4%

8%

12%

15%

9%

7%

14%

15%

16%

14%

12%

15%

22%

25%

28%

31%

25%

16%

20%

26%

21%

25%

45%

41%

Wave 13

Wave 12

***Wave 11

**Wave 10

**Wave 9

*Wave 8

In use now

In pilot/evaluation

In near-term plan

In long-term plan

Not in plan

Source: TheInfoPro Wave 13 Storage Study (Q4 2009), January 2010. F1000 Sample: Wave 8, n=148; Wave 9, n=150; Wave 10, n=151; Wave 11,

n=127; Wave 12, n=147; Wave 13, n=183

*Technology was previously categorized as deduplication

**Technology was previously categorized as deduplication/capacity optimized storage/single backup instance store

***Technology was previously categorized as single backup instance store sof tware

“Heat Index” Rank: 1

Storage Networking Wave 13 Study

“Deduplication is

now in use by

40% of F1000,

with use having

accelerated

rapidly over the

last year.”

The Move to Deduplication Is On!

30© Copyright 2010 EMC Corporation. All rights reserved.

Storage Networking Technology In Use Expansion Index

Lead in Use Vendors – F1000

Methodology

The TIP In Use Expansion Index is designed to illustrate levels of spending change for technologies with a minimum of 10% in use. It takes into account the size of an organization‟s total storage budget and provides a weight for current spending patterns. The weights range from -1.0 for the > 50% Less response to 1.0 for > 50% More. Technologies with 0% (No Change) receive no weight. The final score is normalized on a scale from 0 to 100, with the top score going to those technologies that have the greatest current spending within the TIP research network of users. A “!” vendor has at least twice the number of responses as the closest competitor.

(Gauges Changes in Spending on Already-adopted Technology)

Q4 '09

Rank

Q2 '10

RankTechnology

Wave 13 Lead

in Use Vendor

Wave 14 Lead

in Use Vendor

Wave 13 2nd in

Use Vendor

Wave 14 2nd in

Use Vendor

2 1 Solid-state Disk Drives (SSD) EMC! EMC! HDS/IBM Oracle

1 2 8Gbps Fibre Channel Brocade Brocade QLogic Emulex

N/A 3 Multiprotocol Storage Systems (FC/NAS/IP/FCoE) N/A NetApp N/A EMC

9 4 Backup Data Reduction/Deduplication EMC! EMC! NetApp NetApp

3 5 Virtual Server Image Storage VMware VMware EMC EMC

9 6 Serial-attached SCSI Drives (SAS) HP HP EMC EMC

N/A 7 TCP/IP Offload Engine (TOE) N/A Intel N/A HP

6 8 10Gbps Ethernet for Storage Cisco! Cisco NetApp EMC

13 9 File Replication (Sync) NetApp NetApp EMC EMC

7 10 NPIV – Virtualized I/O Brocade/IBM Cisco HP IBM

18 11 Block Replication (Sync) EMC! EMC! IBM IBM

11 12Remote Block Mirroring and/or Wide-area Replication

(Async)EMC! EMC! NetApp NetApp

12 13Fixed Content and/or Content-addressed Storage

(CAS) ArraysEMC! EMC! IBM IBM/HP

15 14 Virtual Tape Libraries (VTL) for Open Systems EMC! EMC! IBM EMC!

13 15Remote File Mirroring and/or Wide-area Replication

(Async)NetApp NetApp/EMC! EMC IBM

N/A 16 IP SAN/iSCSI Storage Arrays N/A EMC N/A NetApp

4 17 Online Data Reduction/Deduplication NetApp NetApp! EMC EMC

4 N/A Fabric-based Intelligence Cisco N/A EMC N/A

8 N/A NAS Gateways EMC N/A NetApp N/A

16 N/A Wide-area File Services (WAFS) Cisco N/A Riverbed N/A

17 N/A IP SAN Storage Arrays EMC N/A NetApp N/A

19 N/A 4Gbps Fibre Channel QLogic N/A Brocade N/A

31© Copyright 2010 EMC Corporation. All rights reserved.

Top 8 EMC BRS Deduplication Use Cases

Use

Case

EMC BRS

Solution

Challenge Areas Impacted

Avamar Resource

ContentionVM Sprawl

Reduce backup windows by 10X. Image level backup/restore. 98% less data

moved across the network. Free client agents.

Avamar Performance

File Recovery

Avamar NDMP accelerator node deduplicates native NDMP backup stream. No

client agents needed.

Data Domain

Dump & Chase

Frequent Log Backups

Native database backup tools create database and trans log dumps direct to

Data Domain deduplication file system. Efficient replication. No client agents required. One step backup and recovery by DBAs.

Avamar Bandwidth

Limitations

98% less data moved across the network. Perfect for low speed links. Free

client agents.

Avamar Field Teams Data

Loss

98% less data moved across the network. Perfect for low speed links. IT or

user directed backups/restores designed for end user systems.

Data Domain Tape Vaulting

Backward Compatibility

Auto tier active backups to fault isolated archive tier up to 550TB of

deduplicated data. Eliminate tape for long term retention.

Data Domain High Tape

UtilizationDisaster Recovery

Native iSeries BRMS backup facility writes to emulated IBM TS3500 tape

library over fiber channel. Efficient replication. Fast backup and recovery.

Data Domain Batch Operations &

Backups

Native FICON connection to zSeries mainframe with BusTech writing to Data

Domain deduplication file system.

32© Copyright 2010 EMC Corporation. All rights reserved.

Deduplication Benefits Summary

• Shrink storage requirements

• Increase retention periods

• Shorten backup/recovery windows

• Improve bandwidth efficiency

• Simplify data management

• Lower costs

– Less storage, power, bandwidth

– Reduce/eliminate use of tape

33© Copyright 2010 EMC Corporation. All rights reserved.

Deduplication Significantly Improves Business Efficiencies

• Control data protection costs– Storage/data center efficiency

– Reduced effort required on backup

• Simplify data management– Improved data recovery SLAs

– Automated data replication for ensured

disaster recovery readiness

• Improve risk management– Pass disaster recovery audits

– Reduce data loss

– Future-proofing

34© Copyright 2010 EMC Corporation. All rights reserved.

Before Data Domain…

18 Cabinets of IBM Tape

35© Copyright 2010 EMC Corporation. All rights reserved.

After Data Domain…

1 DD690 and 2 Expansion Shelves

36© Copyright 2010 EMC Corporation. All rights reserved.

THANK YOU


Top Related