data de duplication new

Data De-DuplicationTechnology

Presented By, Rashmi V

[email protected]

mailto:[email protected]

Digital Information Growth

• IDC predicts that the growth of information will increase ten times in volume between 2006 and 2011

• Protecting the Critical data is a challenge for all organizations

Traditional Backup

• Backs up same file repeatedly (full and incremental backup)

• Expands storage from 5 to 30 times more

• Risk of shipping physical tapes

• Cost of Bandwidth, Storage and Time increases significantly

Challenge towards Backup

• Protect the unique data from the Backup

• Save only new or unique data from the data set

• Reconstitute all content in its original form on demand

Data De-Duplication is the solution

• Discovers and removes Redundant data from the Data set

• It reconstitute all content in its original form with 100% reliability at disk speeds

• Economize the storage and DR requirements for data

• Create even more Recovery Points for "roll back" to earlier versions of files and system configurations

Traditional Backup vs. De-duplication

Misconception between the Three

Generalized De-duplication

Where Data De-Duplication is done?

• Source side (client side)

• Target side (Server side)

Client Server

Source side Target side

LAN

Source Based de-duplication

• Eliminates redundant data at source

• De-duplication is performed at the start of backup process

• Needs different backup software at client and Target

• Reduces Storage requirement and network bandwidth

Target Based De-duplication

• Happens at backup storage device

• Initially saves all backup images to the backup appliance

• No need to change client’s backup software

• Does not reduces the bandwidth

Types of data de-duplication

• File level de-duplication

• Block(sub-file) level de-duplication– Fixed block level

– Variable block level

File level de-duplication

• Each file is treated as a single chunk

• No detection of duplicate data at sub-file level

• Small change in a file leads to store two separate copies of slightly different files.

Block level – Fixed length

– Arbitrary fixed length of data to search for duplicate data within files

– Miss to detect redundant sub-file data

Ex. Addition of a person’s name to title of the document shifts the whole content causing failure of the de-duplication tool to detect equivalencies

Block level- Variable length

– Not locked to any arbitrary length segments– Catch all duplicate segments in the

document, no matter where changes occur

DD makes replication affordable

Continued…

•The reduced backup image after de- duplication directly reduces the amount of storage needed at secondary site

•Allows to achieve disaster recovery more economically

•Backup de-duplication lowers infrastructure, time, operational overhead and bandwidth cost

Benefits

• Reduces Storage requirements

• Reduces the amount of energy needed to power and cool the storage array

• Reduces network bandwidth

• Consumes less time to replicate the backup

• Longer retention period of disks

• Achieves disaster recovery economically

data de duplication new

Documents

backup deduplication

source deduplication

deduplication tool

duplication technology

traditional backup

backup storage device

unique data

data set