data deduplication

57
• Data Deduplication https://store.theartofservice.com/the-data-deduplication- toolkit.html

Upload: hilary-chase

Post on 30-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

• Data Deduplication

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Pinterest Usage

1 Users should also keep in mind that Pinterest stores actual copies (not just thumbnails and links) of the images

being pinned. This has caused controversy with regards to copyright

issues for photographers. The technical underpinnings of Pinterest are not

unique: Pinterest uses Amazon S3 cloud storage (running at large datacenters)

and data deduplication.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

DragonFly BSD - HAMMER file system

1 HAMMER supports configurable file system history, snapshots,

checksumming, data deduplication and other features typical for file systems of its kind. HAMMER is

recognised as an interesting perspective and option.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Btrfs - Cloning

1 While hard links can be taken as different names for the same

underlying group of disk blocks (known as a file), cloning in Btrfs

provides independent files that are sharing their disk blocks as a form of data deduplication on the disk block

level

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Btrfs - Cloning

1 Cloning can be especially effective in case of storing disk images of virtual machines or their snapshots. Those are large files differing only in small portions, where the cloning provides

both their faster (instantenous) copying and minimal consumption of

storage space due to data deduplication.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Backup - Storage media

1 Some disk-based backup systems, such as Virtual Tape Libraries,

support data deduplication which can dramatically reduce the amount of disk storage capacity consumed by

daily and weekly backup data

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Problem analysis - Computer Science and Algorithmics

1 In computer science and in the part of Artificial Intelligence that deals with

algorithms (algorithmics), problem solving encompasses a number of techniques known as algorithms, heuristics, root

cause analysis, etc. In these disciplines, problem solving is part of a larger process that encompasses problem determination,

Data deduplication|de-duplication, analysis, diagnosis, repair, etc.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication

1 Given that the same byte pattern may occur dozens, hundreds, or even

thousands of times (the match frequency is dependent on the chunk size), the amount of data that must

be stored or transferred can be greatly

reduced.[http://www.druva.com/blog/2009/01/09/understanding-data-

deduplication/ Understanding Data Deduplication] Druva, 2009

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication

1 With data deduplication, only one instance of the attachment is

actually stored; the subsequent instances are referenced back to the saved copy for deduplication ratio of

roughly 100 to 1.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Benefits

1 * Storage-based data deduplication reduces the amount of storage needed for a given set

of files

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Benefits

1 * Network data deduplication is used to reduce the number of bytes that

must be transferred between endpoints, which can reduce the

amount of bandwidth required. See WAN optimization for more

information.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Source versus target deduplication

1 Another way to think about data deduplication is by where it occurs.

When the deduplication occurs close to where data is created, it is often referred to as source deduplication, whereas when it occurs near where the data is stored, it is commonly

called target deduplication.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Deduplication methods

1 One of the most common forms of data deduplication implementations works by comparing chunks of data

to detect duplicates

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Deduplication methods

1 First, data deduplication requires overhead to discover and remove the duplicate data

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Deduplication methods

1 Data deduplication has been deployed successfully with primary storage in some cases where the system design does not require significant overhead, or impact

performance.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Drawbacks and concerns

1 By definition, data deduplication systems store data differently from how it was written

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Drawbacks and concerns

1 The computational resource intensity of the process can be a drawback of data

deduplication

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Major players and technologies

1 * Datastor holds US Patent 7,860,843 and AU Patent 2007234696 for the

firm’s core data deduplication technology known as Adaptive

Content Factoring™

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Major players and technologies

1 * The ExaGrid architecture provides grid scalability with data deduplication.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Major players and technologies

1 * Data deduplication was added to Oracle's - Sun Storage 7000 Unified Storage in July

2010.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Major players and technologies

1 * QUADStor's open source storage virtualization software has inline data

deduplication for primary storage SAN and NAS.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data deduplication - Major players and technologies

1 * Quantum Corp.|Quantum holds a patent for variable-length block data deduplication.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

CTERA Networks

1 p.47 Local network computers are automatically backed up to the

CTERA appliances on the LAN, which then perform incremental backups to

an off-site Data deduplication|deduplicated cloud storage service,

compressing and encrypting the data as it is transmitted

https://store.theartofservice.com/the-data-deduplication-toolkit.html

StorSimple - History

1 StorSimple marketed a computer appliance called Cloud-integrated

Storage (CiS). Their approach claimed to integrate primary storage data deduplication, automated tiered

storage of data (across local and cloud storage), data compression, encryption, and significantly faster data backup and disaster recovery

times.https://store.theartofservice.com/the-data-deduplication-toolkit.html

Deduplication

1 * Data deduplication, in computer storage, refers to the elimination of redundant data

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Dell, Inc. - Partnership with EMC

1 On December 9, 2008, Dell and EMC announced the multi-year extension, through

2013, of their strategic partnership that began in 2001. In addition, Dell plans to

expand its product line-up by adding the EMC Celerra NX4 storage system to the portfolio

of Dell/EMC family of networked storage systems, as well as partnering on a new line of data deduplication|de-duplication products as part of its TierDisk family of data storage

device|data-storage devices.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

File hosting service - Data encryption

1 Since secret key encryption results in unique files, it makes data

deduplication impossible and therefore uses more storage

space.Secure Data Deduplication, Mark W. Storer Kevin Greenan Darrell

D. E. Long Ethan L. Miller http://www.ssrc.ucsc.edu/Papers/stor

er-storagess08.pdf

https://store.theartofservice.com/the-data-deduplication-toolkit.html

File hosting service - Data encryption

1 This enables the cloud storage provider to data deduplication|de-

duplicate data blocks, meaning only one instance of a unique file (such as a document, photo, music or movie file) is actually stored on the cloud servers but made accessible to all

uploaders

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data backup

1 These include optimizations for dealing with open files and live data

sources as well as compression, encryption, and Data deduplication|

de-duplication, among others

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data backup - Storage media

1 Some disk-based backup systems, such as Virtual Tape Libraries,

support data deduplication which can dramatically reduce the amount of disk storage capacity consumed by

daily and weekly backup data

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data backup - Manipulation of data and dataset optimization

1 ; Data deduplication|Deduplication : When multiple similar systems are backed up to the same destination

storage device, there exists the potential for much redundancy within

the backed up data

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Filesystem in Userspace - Example uses

1 * [ http://www.lessfs.com/ Lessfs]: inline data Data deduplication|de-

duplicating filesystem for Linux that includes support for lzo or QuickLZ

compression and encryption.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

FalconStor - Products

1 FalconStor's Software includes Virtual Tape Library (Virtual tape library|VTL) with data deduplication, Continuous

Data Protector (Continuous Data Protection|CDP), File-interface Data deduplication|Deduplication System (FDS) and Network Storage Server

(NSS), each enabled with Wide area network|WAN-optimized replication for disaster recovery and remote

office protectionhttps://store.theartofservice.com/the-data-deduplication-toolkit.html

Cofio Software - AIMstor

1 It performs Data deduplication[

http://www.networkcomputing.com/deduplication/cofio---a-holistic-approach-to-deduplication.php

Cofio's Unique Approach To Deduplication - Network Solutions] as

well as possessing an indexing engine allowing for fast search of the

repository content.https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data Domain (corporation)

1 'Data Domain Corporation' was an Information Technology company from 2001-2009 specializing in

target-based data deduplication|deduplication solutions for disk based

backup.[ http://www.datadomain.com/company/ Data Domain, an EMC company.

Data Domain.]

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Data Domain (corporation) - History

1 Originally categorized as Capacity optimization|capacity optimization by

industry analysts, it later became more widely known as inline data

deduplication Also, unlike most non-archival computer storage products, it went to extreme technical lengths

to ensure data longevity (vs

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Quantum Corporation - Disk Backup and Recovery Products 2002–present

1 At the end of 2006, shortly after its acquisition of ADIC, Quantum announced the first of its

DXi-Series products incorporating data deduplication technology which ADIC had acquired from a small Australian company

called Rocksoft earlier that year.[http://www.itsecurity.com/press-releases/p

ress-release-quantum-back-up-recovery-121206/ Quantum press release] Since then, Quantum has expanded and enhanced this product line and now offers DXi solutions for

SMB, midrange and enterprise customershttps://store.theartofservice.com/the-data-deduplication-toolkit.html

Quantum Corporation - Disk Backup and Recovery Products 2002–present

1 DXi-Series products incorporate Quantum’s patented data deduplication technology,

providing typical data reduction ratios of 15:1 or 93%.[http://salestools.quantum.com/getDocPRe

triever.cfm?ext=.pdftype_mime=application/pdffilename=782735.pdf#search=

%22WP00163A%22 IDC Whitepaper: Demonstrating the Business Value of

Deduplication for Data Protection] The company offers both target and source-based

deduplication as well as integrated path-to-tape capability

https://store.theartofservice.com/the-data-deduplication-toolkit.html

ZFS - Deduplication

1 Data deduplication capabilities were added to the ZFS source repository

at the end of October 2009, and relevant OpenSolaris ZFS

development packages have been available since December 3, 2009

(build 128).

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Virtual Tape Library - History

1 DLm has been developed by EMC Corporation, while

Luminex_Software,_Inc.|Luminex has gained popularity and wide

acceptance by teaming with Data Domain to provide the benefits of

data deduplication behind its Channel Gateway platform

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Imation

1 The security news follows five acquisitions the company made in 2011 within scalable storage

and data security: Louisville, Colo.-based ENCRYPTX; Montreal-based MXI Security from Memory Experts International; the assets of Boulder, Colo.-based ProStor, including the InfiniVault tiered storage system; IronKey's

secure data storage hardware business; and intellectual property and other assets, including

key data deduplication technology from Middleboro, Mass.-based Nine Technologies

https://store.theartofservice.com/the-data-deduplication-toolkit.html

PureDisk

1 Symantec 'PureDisk' is a data deduplication product, initially sold

as a software installation and now as an Computer appliance|appliance.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

ReFS - Features

1 ReFS does not itself offer data deduplication

https://store.theartofservice.com/the-data-deduplication-toolkit.html

NetBackup - Main features

1 * Intelligent Data Deduplication and [http://www.symantec.com/docs/TEC

H211103 Auto Image Replication] (AIR)

https://store.theartofservice.com/the-data-deduplication-toolkit.html

NetBackup - Main features

1 **Client or server-side deduplication via data deduplication engine that can see into the backup streams

https://store.theartofservice.com/the-data-deduplication-toolkit.html

BackupPC

1 Data deduplication reduces the disk space needed to store the backups in the disk pool

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Rainstor - History

1 Originally named Clearpace, RainStor was founded in 2002 by engineering

specialists in the United Kingdom. The company was originally created

to exploit technology that was developed by the United Kingdom's

Ministry of Defence to store big data. The company released its NParchive software, which data deduplication|

deduplicated and archived rarely used data, in 2008.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Storage efficiency - Technologies

1 Data deduplication technology can be used to very efficiently track and

remove duplicate blocks of data inside a storage unit

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Information integration

1 'Information integration' (II) (also called data deduplication|

deduplication and referential integrity) is the merging of

information from heterogeneous sources with differing conceptual,

contextual and typographical representations

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Fingerprint (computing)

1 This fingerprint may be used for data

deduplication purposes.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Duplication (disambiguation) - Computing

1 * Data redundancy, either wanted or unwanted (in which case one resorts to data

deduplication)

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Distributed parallel fault-tolerant file systems - Disk file systems

1 *DDFS – Data Domain File System, the data deduplication file system

that ships in the Data Domain Deduplication Storage Systems

which are an alternative to tape for storing backups and archives.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

AIMstor - Major Features of AIMstor

1 ** Data deduplication

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Back-up

1 These include optimizations for dealing with open files and live data

sources as well as compression, encryption, and Data deduplication|

de-duplication, among others

https://store.theartofservice.com/the-data-deduplication-toolkit.html

Network Appliance, Inc. - Filers

1 In 2007 NetApp introduced its own Data deduplication|deduplication

technology: NetApp Dedupe, available for all current models of

NetApp filer.

https://store.theartofservice.com/the-data-deduplication-toolkit.html

ZPAQ

1 It compresses using Data deduplication|deduplication and

several algorithms (LZ77, BWT, and context mixing) depending on the

data type and the selected compression level

https://store.theartofservice.com/the-data-deduplication-toolkit.html