1 © copyright 2010 emc corporation. all rights reserved. emc data domain : data protection and...

17
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC Data Domain : Data Protection and Deduplication

Upload: phillip-hines

Post on 24-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

1© Copyright 2010 EMC Corporation. All rights reserved.

EMC Data Domain :Data Protection and Deduplication

2© Copyright 2010 EMC Corporation. All rights reserved.

Why backup?

Goals– Backups are done for restores

Operational Disaster Recovery

– Disaster recovery requires offsite backup– Operational recovery requires onsite backup– Need both onsite and offsite copies on disk– Need quick restores

Don’t have time for moving physical assets

– Protection of personal data & intellectual property

3© Copyright 2010 EMC Corporation. All rights reserved.

Why So Much Interest in Data Deduplication?

Backup & Archive processes have been overwhelmed by information growth

Primary storage efficiency has become a necessity to cope with massive growth

ROI drives the compelling appeal of Dedupe– Reduced Storage Capacities– Lower Infrastructure Costs– Improved SLA’s – Efficient Replication for Business Continuance/DR

Very important

In use Evaluating / In Near – Long Term plan Not in Plan

DeduplicationOne of the top 10 Technology Considerations 59%

24% Deploying Deduplication 55% 21%

- Source: TheInfoPro Wave 11 Storage Study, 2008

4© Copyright 2010 EMC Corporation. All rights reserved.

Why Do Enterprises Still Use Tape?

• Low upfront cost

• Tape can store the massive amount of redundant data created by backups

• Transportable for offsite DRTAPE

DISK

Backup Storage5x-10x Primary

Primary Storage

5© Copyright 2010 EMC Corporation. All rights reserved.

EMC Data Domain: Leadership and Innovation

• Deduplication storage systems More than 12,000 systems installedMore than 4,300 customersMore than 2,600 PB under Data Domain protection worldwide

• A history of industry firsts

First Deduplication NAS

First Deduplication Volume Replication

Largest Deduplication

Array

First DeduplicationDirectory Replication

First Deduplication Virtual Tape Library

First Deduplication Nearline Storage

Fastest BackupController

Cascaded Replication

2003 2004 2005 2006 2007 2008 2009 2010

First Deduplication

Encryption

First Distributed Processing

6© Copyright 2010 EMC Corporation. All rights reserved.

Data Domain – works with what you have

Database

ArchiveBackup

VMware

7© Copyright 2010 EMC Corporation. All rights reserved.Confidential7

De-duplication principles

Unique segments (4KB-12KB) – varies “on-the-fly”

8© Copyright 2010 EMC Corporation. All rights reserved.Confidential8

De-duplication principles

Unique segments (4KB-12KB) – varies “on-the-fly”

9© Copyright 2010 EMC Corporation. All rights reserved.

Second Friday Full Backup

B C D E F L G H

Data Deduplication: Technology OverviewStore more backups in a smaller footprint

A B C D E F G H I J

Friday Full Backup

A B C D A E F G

Mon Incremental A B H

Tues Incremental C B I

Thurs Incremental A C K

Weds Incremental E G J

Backup Logical Estimated Physical Data Reduction

Monday Incremental 100 GB 7–10x 10 GB

Tuesday Incremental 100 GB 7–10x 10 GB

K L

Wednesday Incremental 100 GB 7–10x 10 GB

Thursday Incremental 100 GB 7–10x 10 GB

Second FRIDAY FULL 1 TB 50–60x 18 GB

TOTAL 2.4 TB 7.8x 308 GB

FRIDAY FULL 1 TB 2–4x 250 GB

10© Copyright 2010 EMC Corporation. All rights reserved.

Deduplication Dramatically Reduces Storage Capacity Requirements

Deduplication10–30 times less data stored versus fulls + incrementals with typical retention policies

0

10

20

30

1 5 10 15 20

Weeks in Use

Dat

a S

tore

d

Deduplication storage

Traditional storage

11© Copyright 2010 EMC Corporation. All rights reserved.

Multi-C

ontrolle

r Syste

ms

with G

lobal Deduplic

ation

1.25

1.5

0.04

ThroughputGB/sec.

Addressable Capacity in TBPost-RAID (Physical)

DD200 (2004)

2011 (est.)

Data Domain SISL™ Scalable Architecture: CPU-Centric

70 >PB

5

3

Distrib

uted Pro

cess

ing For

Single-contro

ller S

ystems

DD880, 7/09Industry’s Fastest

Backup Storage Controller

Data Domain Scale

6-Year Improvement• Throughput: ~90x• Capacity: ~225x

12© Copyright 2010 EMC Corporation. All rights reserved.

Inline vs Post-Process Deduplication:Provisioning & Admin

Post Process:Deduplication After Storing

Inline: Deduplication Before Storing

Other activities unimpeded− Predictable− Simpler

Process contention increases with #processes

− Copy to tape: Too slow to stream tape− Recovery: SLA predictability− Replication: Poor time-to-DR− Deduplication itself if interleaved with backup or

restore

More admin needed to fight these issues

At least 3x disk accesses to shared store

Store Dedupe Dedupe Restore

ReplicateRestore Replicate?

Updedupe?

13© Copyright 2010 EMC Corporation. All rights reserved.

Data Integrity: Data Invulnerability ArchitectureTrust but verify—”hope” is not a strategy

OtherRAID 6NVRAMSnapshots

Data verificationChecksumDeduplication, write to diskVerify

Self-healing file systemCleaningExpired dataDefragVerify

Global Compression

Local Compression

RAID

File System

GenerateChecksum

VerifyData

Verify the file system metadata integrity

Verify user data integrity

Verify stripe integrity

14© Copyright 2010 EMC Corporation. All rights reserved.

Network-Efficient Replication for True Disaster RecoveryLowers WAN costs; improves service level agreements

Source:Remote sites Destination:

Data Center Hub Supports hundreds

of remote sites

1–5%

1–5%

1–5%

Archive data

Backup data

Data Domain DDX Arraywith DD880s

Data Domain system

Flexible replication One-to-many Many-to-one Bi-directional System-to-

system Cascaded

Home

DB

WAN

Home

DIR A

95–99% cross-site bandwidth reduction

Data Domain system

Data Domain system

15© Copyright 2010 EMC Corporation. All rights reserved.

Industry’s Most Scalable Inline Deduplication Systems

DDX Array SeriesSoftware options:DD Boost, DD Virtual Tape Library, DD Replicator, Retention Lock, and DD Encryption

Up to 16 ControllersDD140 Remote Office Appliance

DD600 Appliance Series

DD880

Global Deduplication Array

New

DD140 DD610 DD630 DD660 DD690 DD880Global Deduplication Array

DDX Array

Speed (Other) 450 GB/hr 675 GB/hr 1.1 TB/hr 2.0 TB/hr 2.7 TB/hr 5.4 TB/hr 86.4. TB/hr

Speed (DD Boost) 490 GB/hr 1.3 TB/hr 2.1 TB/hr 2.7 TB/hr 3.9 TB/hr 8.8 TB/hr 12.8 TB/hr 140 TB/hr

Logical capacity 17–43 TB 75–195 TB 165–420 TB .520–1.31 PB .710–1.7 PB 2.8–7.1 PB 5.7–14 .2 PB 45.6–114 PB

Raw capacity 1.5 TB Up to 6 TB Up to 12 TB Up to 36 TB Up to 48 TB Up to 192 TB Up to 384 TB Up to 3.07 PB

Usable capacity 0.86 TB Up to 3.98 TB Up to 8.4 TB Up to 26.1 TB Up to 35.3 TB Up to 142.5 TB Up to 285 TB Up to 2.28 PB

16© Copyright 2010 EMC Corporation. All rights reserved.

Why Data Domain?

• Less disk to resource, less to manage– CPU-centric deduplication– Inline– Green

• Simple, mature, and flexible– Simple, mature appliance– Nearline tier: any fabric, any software, backup or nearline

applications

• Resilience and disaster recovery– Storage of last resort– Cross-site global compression: data center or remote office