data deduplication fundamentals - dell emc ?? rainbow tables in deduplication ... •...

Download DATA DEDUPLICATION FUNDAMENTALS - Dell EMC  ?? rainbow tables in deduplication ... • High-throughput deduplication storage ... Gartner called deduplication a transformational technology with the potential

Post on 09-Apr-2018

216 views

Category:

Documents

4 download

Embed Size (px)

TRANSCRIPT

  • 1 Copyright 2010 EMC Corporation. All rights reserved.

    Next-Generation Data protection

    Deduplikacjakluczowy element backupu nowej generacji

    Piotr Noga

    BRS EMEA EE TC Manager

  • 2 Copyright 2010 EMC Corporation. All rights reserved.

    (8/11/10): F1000 Sample. Q4 07, n=151; Q3 08, n=140; Q2 09, n=155; Q4 09, n=185; Q2 10, n=166. *Note that due to

    multiple responses per interview, total exceeds 100%.

    F1000 Storage Professionals Pain Points

    What are your top storage-related pain points?

    0% 20% 40% 60% 80%

    Other

    Managing Storage Equipment

    Power Management

    Vendor Management

    Application Recoveries and/or Backup Retention

    Regulatory Compliance

    Data Mobility

    Storage Provisioning

    Archiving and Archive Management

    Dealing With Performance Problems

    Lack of Integrated Tools

    Managing Complexity

    Backup Administration and Management

    Managing Costs

    Proper Capacity Forecasting and Storage Reporting

    Managing Storage Growth

    Q4 '07

    Q3 '08

    Q2 '09

    Q4 '09

    Q2 '10

  • 3 Copyright 2010 EMC Corporation. All rights reserved.

    DataCenter wavespast and future

    storage consolidation

    servers consolidation

    server virtualization

    virtual environments

    protection: VMWARE (API)

    HyperV,

    XEN (host based)

    virtual partitions via FC/VTL

    need for hundred+ of tape

    drives emulation and high

    concurency

    storage virtualization dedupliacation appliances for

    backup

    dedupliacation on tier 1

    storage (primary)

    virtualization everywhere VDI

    cloud

  • 4 Copyright 2010 EMC Corporation. All rights reserved.

    Storage OPTIMIZATION Deduplication

    OPTIMIZATION

    Server &

    Primary

    Storage

    Server/Storage

    Consolidation

    /Virtualization

    Network OPTIMIZATIONWAN

    Optimization

    Optimization Technologies Center Stage

  • 5 Copyright 2010 EMC Corporation. All rights reserved.

    Backup Environments Transformationroot causes

    Unabated data growth

    Backup = 4 to 30 times production capacity

    Full backups kept for months or years

    New requirements to keep mo re data for longer periods

    0

    8

    12

    Zett

    ab

    yte

    s

    4

    10

    16

    Source: IDC Digital Universe Study, sponsored by EMC, May 2010;

    chart does not include data that does not need protection

    2010 2012 2014 2016 2018 2020

    0

    1,000

    2,000

    Exab

    yte

    s

    500

    1,500

    2,500

    Digital Information Created and Replicated WorldwideFive times growth in four years

    2008 2009 2010 2011 2012

    Source: IDC Digital Universe white paper, sponsored by EMC, May 2009

    Needing Protection

    Protected

    Unprotected in

    2010 = Size of

    entire digital

    universe in 2018

  • 6 Copyright 2010 EMC Corporation. All rights reserved.

    Major Trends Driving the Transformation of Backup EnvironmentsServer virtualization

    Increased complexity

    Virtual machine sprawl

    High utilization, little bandwidth for backup

    Old Paradigm20% resource utilization

    CP

    U U

    tiliz

    atio

    n

    100%

    80%

    40%

    0%

    60%

    20%

    New Paradigm80% resource utilization

    Shared Physical Resources

    CP

    U U

    tiliz

    atio

    n

    100%

    80%

    40%

    0%

    60%

    20%

    VMware ESX Server

    Hardware

  • 7 Copyright 2010 EMC Corporation. All rights reserved.

    The process of detecting and identifying

    the unique data segments within a given

    set of information, enabling the elimination

    of redundancy when stored or moved.

    Before:

    total segments = 39

    After:

    Unique segments = 6Data Set 3

    Data Set 2

    Data Set 1

    Deduplication

    What is Data Deduplication?

  • 8 Copyright 2010 EMC Corporation. All rights reserved.

    Replicate smarter.Move only deduplicated data over existing networks

    with up to 99% bandwidth efficiency for cost-effective

    disaster recovery.

    By designed plenty of duplicated data Standard backup schedule with 91 days retention

    (full +6 diff/incr) can contain same data 15+ times.

    Keep logical copies vs physical. Deduplicate for

    capacity and SLA

    Recover reliably.Continuous fault-detection and self-healing ensure

    data recoverability to meet SLAs.

    Why Deduplicated Backup?

    WAN

  • 9 Copyright 2010 EMC Corporation. All rights reserved.

    Prawie Robi Rnicdeduplication - principles

    Fixed vs Dynamic block capacity requirements

    Number of streams per appliance

    Robustness/security MD5 vs. SHA-1http://en.wikipedia.org/wiki/MD5#Collision_vulnerabilities

    http://en.wikipedia.org/wiki/SHA-1#Comparison_of_SHA_functions

    Time for SHA-1: HW support

    Multicore CPUs

    Smart not hard rainbow tables in deduplication algorithms

    http://pl.wikipedia.org/wiki/T%C4%99czowe_tablicehttp://kestas.kuliukas.com/RainbowTables/

    Reducing CPU and disk cycles

    http://en.wikipedia.org/wiki/MD5http://en.wikipedia.org/wiki/SHA-1http://en.wikipedia.org/wiki/SHA-1http://en.wikipedia.org/wiki/SHA-1http://pl.wikipedia.org/wiki/T%C4%99czowe_tablicehttp://kestas.kuliukas.com/RainbowTables/

  • 10 Copyright 2010 EMC Corporation. All rights reserved.

    Architecture AdvantageVariable vs. Fixed

    Variable Segment deduplication significantly reduces:

    Power, Cooling, Management, Complexity

    100TB lives on

    50TB

    100TB lives on 33TB

    100TB lives on 25TB

    (4:1 is 25TB)

    100TB lives on 5TB

    100TB lives on

    100TB

    File Level

    Fixed Block

    Variable Block

    Whitespace

    Reduction

  • 11 Copyright 2010 EMC Corporation. All rights reserved.

    Type of dataMore user created, unstructured, content*

    = higher deduplication ratio

    *Encrypted and compressed data not ideal

    deduplication candidates

    Factors Impacting Deduplication RatiosSmall variations can have big impact

    Data change rateLess change = higher deduplication ratio

    Retention policyLonger retention policy

    = higher deduplication ratio

    Full to incremental

    backup ratioMore full backups = higher deduplication ratio

  • 12 Copyright 2010 EMC Corporation. All rights reserved.

    Real World ResultsAvamar daily full backups vs. traditional daily full backups

    Data TypeAmount of Primary

    Data Backed Up

    Amount of Data

    Moved Daily

    Windows file systems 3,573 GB 6.1 GB

    Mix of Windows, Linux, and UNIX file systems 5,097 GB 11.7 GB

    Engineering files on NAS (NDMP backups) 3,265 GB 24.2 GB

    Mix of 20% databases, 80% file systems (Windows and UNIX) 9,583 GB 80.0 GB

    Mix of Linux file systems and databases 7,831 GB 104.2 GB

    Source: EMC

    While results will vary by data type and mix, Avamar can

    dramatically improve backup performance and efficiency

  • 13 Copyright 2010 EMC Corporation. All rights reserved.

    VMware Guest BackupSmart not Hard - Avamar backup versus traditional backup

    Traditional Avamar

    CPU Usage

    1:20 p.m. 1:30 p.m. 1:40 p.m.

    Network Usage

    1:20 p.m. 1:30 p.m. 1:40 p.m.

    Disk Usage

    1:20 p.m. 1:30 p.m. 1:40 p.m.

  • 14 Copyright 2010 EMC Corporation. All rights reserved.

    VMware vStorage APISmart not Hard - Avamar

    Key Features:

    Integrated with vStorage

    API

    Single-step file & image-

    level backups & restores

    Option to leverage

    change block feature

    greatly reduces backup

    processing

    Restore to the original,

    new, or configure a new

    virtual machine

    capability

    Round-robin VM backup

    capability across

    multiple proxies

    vStorage API virtual proxy server with Avamar agent

    Avamar client software runs on the proxy server

    ResourcePool

    VMware Virtualization Layer

    x86 Architecture

    Physical server

    Virtual Machines

    SANstorage

    Avamarserver

    Mount

    = Avamar Software Agent

    VMware Image Backup

  • 15 Copyright 2010 EMC Corporation. All rights reserved.

    Data Domain Boost Integration

    Deduplication distributed to

    backup servers and

    Microsoft application clients Increases backup speed

    Reduces network traffic

    Clone-controlled replication

    Schedules replication

    Catalog awareness of

    replicated copies

    Ease of use Automated configuration

    Monitoring and reporting

    NetWorker Data Domain

    DD Boost DD Boost

    N E T W O R K E R A N D D A T A D O M A I N

  • 16 Copyright 2010 EMC Corporation. All rights reserved.

    Capacity management Single SIS make difference

    restriction on multiple SIS No storage node/media svr load balancing between SIS

    Management overhead ( multiple instance of appliance and configs

    eg. Replication)

    Efficiency (more reserved storage required)

    performance consideration wise use of SAN/LAN infrastructure with:

    client side deduplication

    Dedup Replication

    leveraqe 10GbE with OST/DD Boost

    SLA improvement Backup windows and Recovery Time objective

    Reduce backup jobs load on production systems

  • 17 Copyright 2010 EMC Corporation. All rights reserved.

    Industrys Most Scalable Inline Deduplication Systems

    DD140 DD610 DD630 DD

Recommended

View more >