venkatesh iyer - next generation backup and recovery with data deduplication - interop mumbai 2009
DESCRIPTION
Protecting critical data is a challenge for organizations of all sizes. According to the Enterprise Strategy Group (ESG), the amount of data requiring protection continues to grow at approximately 60 percent per year. Traditional backup solutions store data repeatedly, expanding total storage under management by five to 10 times. Customers need solutions to help manage the information explosion. In addition, government regulations and requests for legal discovery strain the resources and capabilities of traditional data protection solutions. Failure to comply or provide information in a timely fashion can result in significant costs and penalties. Furthermore, recent legislation has exposed the risk of shipping tapes—either encrypted or unencrypted—as one of the greatest security concerns in today’s IT infrastructure. This session will elaborate on how de-duplication technologies help in achieving lower costs, improving efficiencies, improving protection and simplifying management.TRANSCRIPT
1© Copyright 2009 EMC Corporation. All rights reserved.
Next Generation Backup and Recovery
With Data Deduplication
Venkatesh K. IyerHead – India & SAARCBackup, Recovery & Archival Solutions
Driving Down the Cost and Risk
2© Copyright 2009 EMC Corporation. All rights reserved.
Agenda
� Current Market Scenario
� Backup and Recovery Challenges
� Why is Data De-duplication so hot?
� What is Data Deduplication ?
� Next Gen Backup and Recovery Architecture
3© Copyright 2009 EMC Corporation. All rights reserved.
2009 – Market Conditions
Challenging times with global economic recession in 2009
Economic environment is a leading indicator of tech spending
� 71% of CIO’s anticipate flat or declining IT spending budgets
� IT budgets in developed countries set to decline by 5%
� TOTAL IT SPENDING IS THE LOWEST IN THE HISTORY OF THE SURVEY
� “The current environment has moved virtualisation toward the top of the priority list for CIOs”
� “TCO reductions will be a key driver of the acceleration in server virtualisation deployments as CIOs are forced to cut capital spending and reign in management, administrative, and power/cooling cost”
Source Goldman Sachs IT spending Survey Nov 2008 & Merrill Lynch CIO Survey Oct 2008
4© Copyright 2009 EMC Corporation. All rights reserved.
Industry Challenges & CIO’s Concerns
� Current Economic Crisis
� Cut Operating Costs
� Continued Information Growth (10X growth over next 5 years, EMC/IDC white paper)
5© Copyright 2009 EMC Corporation. All rights reserved.
Key Backup and Recovery Themes:
� ROI and TCO are #1 on CIO minds
� The data protection market continues to evolve:– Operational Savings through Automation and Integration
– Improvements in Service Levels (RPOs / RTOs) and IT Compliance
– Decreasing Reliance on Tape through B2D and Data Deduplication
� Traditional data protection methodologies don’t map well to virtualized servers
A perfect storm is brewing for a fundamental re-architecture of data protection environments in organizations
6© Copyright 2009 EMC Corporation. All rights reserved.
Today’s Backup and Recovery Challenges
Massive Data Growth
Shift to Virtual
ComplianceComplianceCompliance
CostsCosts
Complexity
7© Copyright 2009 EMC Corporation. All rights reserved.Confidential7
Hierarchy of Data Reduction Types for Backup
Regular Storage Array1:1
LZ Compression~ 2:1
Single Instance Storage~ 3:1
Fixed Block~ 3:1
File Level
Fixed Blocks,Snapshots
Variable Segments
Whitespace Reduction
Data
‘Dedupe’~ 20:1
To 500:1
Data Deduplication
Significantly Reduces- Power
- Heat
- Cooling
-Management
-Bandwidth
-.-.
8© Copyright 2009 EMC Corporation. All rights reserved.Confidential8
Gartner Dedupe Prediction: The Market is HUGE
�By 2012, deduplication will
be applied to 75% of backups
�Key Findings:
�Production deployments of deduplication for backups have progressed at an unusually high rate for such a recent technology; however, Gartner estimates that less than 5% of backups today use deduplication techniques.
�Market Implications:
�Gartner views this technology as transformational because it radically decreases the economics of disk-based backup and recovery………too compelling to ignore.
�Recommendations:
�There are several different implementations of deduplication, and some vendors have only recently released this technology and have a few dozen customers, others have been shipping it for several years and have more than 1,000 customers.
�…… ensure that your organization is comfortable with the robustness and maturity of the vendor's approach.
�Analysis by: Dave Russell
9© Copyright 2009 EMC Corporation. All rights reserved.
Why So Much Interest in Data Deduplication?
� Backup & Archive processes have been overwhelmed by information growth
� Primary storage efficiency has become a necessity to cope with massive growth
� ROI drives the compelling appeal of deupe– Reduced Storage Capacities
– Lower Infrastructure Costs
– Improved SLA’s
– Efficient Replication for DR
Very important
In use Evaluating / In Near – Long Term plan Not in Plan
DeduplicationOne of the top 10 Technology Consideration 59%
24% Deploying Deduplication 55% 21%
- Source: TheInfoPro Wave 11 Storage Study, 2008
10© Copyright 2009 EMC Corporation. All rights reserved.
Why so much Interest in Data De-Duplication?
• Data De-duplication – One of the hottest emerging segments within the storage and data protection market – Why?
– Network Bandwidth utilisation – Efficiently Move Data
– Massive reduction in Storage requirements – Efficiently Store Data
– Security – Data protection in transit
– Improving efficiencies in virtualised environments
• Market is under duress, Backup and Restore has not kept pace with enterprise growth
• Companies looking to protect more data – increasing desktop volumes, mobile employees, remote offices, data growth circa 50% +
• Data retained at the back-end for longer periods of time for internal reasons or external regulations – Need to archive
• Tape is not ideal for backup and restore, the industry is moving towards backup-to-disk
• De-duplication market opportunity $1B by 2009
• It’s here to stay, its based on compression which has been around for 20+ years
11© Copyright 2009 EMC Corporation. All rights reserved.
Deduplication 101
� Dedupe - storing only unique ‘chunks’ of data (blocks, objects, files)– Uses identification & comparison algorithms, content addressing, indexing or cataloging
– Unique “chunks” are reconstituted to original format from the de-duplicated state
� Compression– Minimizes empty space within files; but does not eliminate redundant data
– Compression is employed in conjunction with other dedupe processes
Data Set 3
Data Set 2
Data Set 1
De-duplication
Data Set 3
Data Set 2
Data Set 1
Data Set 3
Data Set 2
Data Set 1
De-duplicationDe-duplication
12© Copyright 2009 EMC Corporation. All rights reserved.
De-duplication at Target
� Moves ~ 200 percent of primary data weekly
� Up to 50 times reduction backup storage
� Backups are typically restored from full and incremental images
� De-dupe device viewed as file system and/or virtual tape library target for traditional backup software
De-duplication at Source
� Moves ~ 2 percent of primary data weekly
� Up to 50 times reduction in backup storage
� Up to 500 times less daily network impact
� Up to 10 times faster daily full backups
� Fast, daily full backups, single-step recovery
� Next-generation backup and recovery
Target- and Source-basedData De-duplication
Network Network
There are strong use cases for both technologies…but only source-based de-duplication reduces daily network bandwidth requirements and
decreases client resource utilization during backups.
EMC AvamarEMC Disk
Library
13© Copyright 2009 EMC Corporation. All rights reserved.
A B C D
Unique data stored on disk, available for immediate recovery
Only unique data segments are backed up
AB
CD
Data already backed up, so only a unique ID pointer is stored (20 bytes)
E
ENew data segment identified and backed up
Data De-Duplication: How it Works
� First Instance � Duplicate Instance � Modified Instance
A B
C D
A B
C D
B
C D
E
14© Copyright 2009 EMC Corporation. All rights reserved.
Potential Impact of Data De-duplication on a Backup
RAW DataTotal Capacity Stored Over 12 Weeks
Daily incremental, weekly full
3 MB 12 Fulls = 36 MB
Daily full 3 MB 84 Fulls = 252 MB
De-duplicated backup 3 MB 84 Fulls = 1.25MB
Data de-duplication reducesBackup to Disk capacity requirements
File 1 = 1 MB
A B
C D
File 2= 1 MB
A B
C D
File 3= 1 MB
B
C D
E
15© Copyright 2009 EMC Corporation. All rights reserved.
Backup and Recovery Use Cases
Target
Source
Avam
ar
Data
Do
main
+ D
L4000
VirtualizedEnvironments
Remote / BranchOffices
EdgeDevices
3rd PartyBackup
DatacenterNAS / SAN
HighTransaction Apps
Relieves backup bottlenecks, enables greater server consolidation ratios
Protects ROBOs with highest WAN efficiency and with consistent DC policies
Protects enterprise desktops / laptops with low device overhead
Heterogeneous target for existing backup applications
Enterprise infrastructure support
High-change rate, large data sets
Next G
enera
tion B
ackup a
nd A
rchiv
e
16© Copyright 2009 EMC Corporation. All rights reserved.
Next Generation Backup, Recovery and Archive - Take the StepsBetter Protection and Compliance. Less Cost.
• Reduce the size of backup• Free valuable primary storage capacity• Assure compliance, remove exposure• Reduce eDiscovery expenses
• Reduce time, bandwidth and infrastructure• Streamline D/R operations, infrastructure• Expedite disaster recovery• Eliminate remote office backup infrastructure
• Expedite application recovery• Reduce backup management overhead• Streamline problem detection, resolution• Lower backup management costs
Best-Practice Business Benefits
Archive
Manage
Backup
• Avoid time and expense of developing expertise• Identify maximum investment/benefit strategies
Assess
17© Copyright 2009 EMC Corporation. All rights reserved.
Alignment Attributes
Specification AvamarEDL
Tier 1 Tier 3
Scheme
Operational
Backup &
Recovery
Disaster
Recovery
Backup Time
LAN/CPU/DISK Impact
Retention on Disk (Typical)
Data shredding compliance
Verify Quality of Backup Data automaticallyAbility to encrypt backup dataHow long for a replicated copy
Amount of data loss
Ability to recover data
Length of time data is retained on disk
Backup Performance
Proposed technology
Deduplication
Amount of data loss
25% Data Restore
Instant Normal Fastest
None HIGHMinimal -
None
< 2 Days 4 WeeksWeeks / Months
No Yes Yes
None Protocol Daily
None None Integrated
Real Time 24-48 hours 30-90 Minutes
Last Transaction
> = 24 hours < 24 hours
100% 100% 100%
2 days 3 Weeks 12 Weeks
HighHighest
CDP to DiskBackup to
diskSource Dedup
to disk
NoneOptional at
TargetIntegrated at
Source
High
Tape
None
Medium
1-2 days
95%
N/A
Last Transaction
Last Backup Last Backup Last Backup
< 2 hours < 24 hours < 24 hours < 48 hours
Best use Case DB DB/Emails Files/NAS/VmWare/RO
Data Integrity Checks
Encryption
Retention & Disposition
Replication for DR
ArchitectureConsiderations / Impact
Operational Recovery Pt Obj
Recoverability
Retention period
Backup Performance
Concept
Disaster Recovery Pt Obj (RPO)
Disaster Recovery Time Obj (RTO)
Backup Service Tiering / Catalogue
Tier 2
Recoverpoint LTO4 Tape
Years
No
2-3 Days
OS
Cost per TB K / TB K / TB K / TB K / TB
HIGH
Medium
None
Optional
Archive
Disk
Integrated
NA
0
100%
Years
Last Replication
Years
K / TB
Minimal
Centera
Long term Retention
Yes
Daily
30-90 Min.
NA
Application
18© Copyright 2009 EMC Corporation. All rights reserved.
CDP
Archive
Next Generation Data Protection Architecture
Source De-Duplication
Source De-Duplication
Mailarchive & retrieval
Database Archive
70-80% data reduction trough shortcutting
Centralized Data Protection Management
Source De-Duplicated Backup
SAN Backup
Typically 80% of Data Typically 20% of Data
Pro
fessio
nal S
erv
ices
19© Copyright 2009 EMC Corporation. All rights reserved.
EmailXtender / DiskXtender + Centera
Next Generation Data Protection with EMC
Source De-Duplication
Source De-Duplication
Mailarchive & retrieval
Database Backup
70-80% data reduction trough shortcutting
EMC DPA
EMC | Avamar
EMC | Networker
Typically 80% of Data Typically 20% of Data
EMC | Recoverpoint
Avamar Centera TapeEDL Data DomainRecoverPoint
EM
C P
rofe
ssio
nal S
erv
ices
Email: [email protected]