technical report- netapp a-sis deduplication - deployment and implementation guide-4th revision

28
NetApp, Inc. Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide Network Appliance, Inc. | Bill May, Data Protection and Retention Technical Marketing | 16 April 2008 | TR-3505 4th Revision Abstract This guide introduces the NetApp deduplication for FAS technology and describes in detail how to implement and utilize it. It should prove useful for customers requiring assistance in understanding and architecting solutions with deduplication for FAS and NetApp storage systems. TECHNICAL REPORT NetApp, a pioneer and industry leader in data storage technology, helps organizations understand and meet complex technical challenges with advanced storage solutions and global data management strategies.

Upload: harisjebarajb

Post on 11-Aug-2015

54 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

NetApp, Inc.

Technical Report:

NetApp Deduplication for FAS

Deployment and Implementation Guide Network Appliance, Inc. | Bill May, Data Protection and Retention Technical Marketing | 16 April 2008 | TR-3505

4th Revision

Abstract

This guide introduces the NetApp deduplication for FAS technology and describes in detail how to implement and utilize it.

It should prove useful for customers requiring assistance in understanding and architecting solutions with deduplication for FAS and NetApp storage systems.

TECHNICAL REPORT

NetA

pp, a pioneer and industry leader in data storage technology, helps organizations understand and m

eet com

plex technical challenges with

advanced storage solutions and global data m

anagement strategies.

Page 2: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

NetApp, Inc.

This page is intentionally blank.

Page 3: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. ii

Table of Contents

1 Introduction............................................................................................................1 1.1 Intended Audience...................................................................................................... 1 1.2 Purpose....................................................................................................................... 1 1.3 Prerequisites and Assumptions ................................................................................. 1 1.4 Document Conventions.............................................................................................. 1

2 Overview.................................................................................................................2 2.1 NetApp Deduplication Technologies ......................................................................... 2

2.1.1 SnapVault for NetBackup™.................................................................................................. 3 2.1.2 NetApp Deduplication for FAS.............................................................................................. 3

2.2 Dense Volumes .......................................................................................................... 3 2.3 Deduplication Features and Functions ...................................................................... 4

2.3.1 General Deduplication Operational Considerations ............................................................ 5 3 Configuration and Operation...............................................................................6

3.1 Requirements Overview............................................................................................. 6 3.2 Installing and Licensing Deduplication....................................................................... 6

3.2.1 Deduplication Licensing in a Clustered Environment.......................................................... 7 3.3 Command Summary .................................................................................................. 7 3.4 Deduplication Quick Start Guide................................................................................ 8 3.5 Monitoring Deduplication Status ................................................................................ 8 3.6 End-to-End Deduplication Configuration Example.................................................. 10 3.7 Configuring Deduplication Schedules...................................................................... 14

4 Operating Characteristics ..................................................................................16 4.1 Deduplication Target Environment .......................................................................... 16 4.2 Deduplication Performance...................................................................................... 16 4.3 Deduplication Storage Savings................................................................................ 16 4.4 Additional Deduplication Considerations ................................................................. 16

4.4.1 Number of Deduplication Processes.................................................................................. 17 4.4.2 Deduplication and Active/Active Configuration .................................................................. 17 4.4.3 Deduplication and Space Savings on Existing Data ......................................................... 17 4.4.4 Deduplication Best Practices .............................................................................................. 18

Page 4: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. iii

5 Common Problems and Troubleshooting .......................................................19 5.1 Licensing................................................................................................................... 19 5.2 Volume Sizes............................................................................................................ 19 5.3 Logs and Error Messages........................................................................................ 19 5.4 Other Issues.............................................................................................................. 19 5.5 Not Seeing Space Savings ...................................................................................... 20 5.6 Undeduplicating a Flexible Volume ......................................................................... 20 5.7 Additional Reporting with “sis stat –l”....................................................................... 21 5.8 Deduplication and Reboots...................................................................................... 21

6 Deduplication and Replication ..........................................................................22 6.1 Replicating a Deduplicated Flexible Volume for DR ............................................... 22 6.2 Replicating Primary Data to a Deduplicated Flexible Volume ................................ 23

Page 5: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Introduction 1

1 Introduction

1.1 Intended Audience This technical report is designed for customers who seek education on the NetApp deduplication for FAS capability introduced in Data ONTAP® 7.2L1, with the current minimum requirement of Data ONTAP 7.2.4.

It will be most beneficial to those who are already familiar with NetApp hardware and software.

1.2 Purpose The purpose of this paper is to present a guide for implementing NetApp deduplication for FAS. It will address step-by-step configuration examples, introduce known caveats and recommendations to assist the reader in designing optimal solutions, and prepare the audience for performing deployments of the technology in customer environments.

Its use is threefold:

Provide detailed information to all interested parties.

Educate prior to performing deployments.

Serve as a reference for resolving issues that could arise.

This document is not:

A sales guide (although some high-level thoughts are covered in the “Solutions Overview” section)

A competitive comparison

A complete product design document

1.3 Prerequisites and Assumptions For various details and procedures described in this document to be most useful to the reader, the following assumptions are made:

The reader has general knowledge of NetApp platforms and products, particularly in the area of data protection.

The reader has general knowledge of backup protection, data retention, and disaster recovery solutions.

1.4 Document Conventions While “NetApp deduplication for FAS” is the official solution name, for brevity’s sake in this

document it will typically be simply called “deduplication.”

Note that the original name was “Advanced Single Instance Storage (A-SIS),” so when referring to other documents which may still use that name, be aware that it is synonymous with NetApp deduplication for FAS.

Page 6: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Overview 2

2 Overview This section provides a quick overview of deduplication in general and then introduces what NetApp deduplication for FAS is and how it works at a high level.

2.1 NetApp Deduplication Technologies Since its beginning NetApp has been an innovator in delivering storage solutions and continues to invent new capacity optimizing technologies that reduce the cost of data storage. The following are some of the basic products/features that deliver the value:

Snapshot™ for disk- and network-efficient recovery copies

SnapVault® for disk- and network-efficient backups

FlexVol® for space-efficient volume provisioning

FlexClone® for space-efficient test and development copies

While all these technologies offer the benefit of reducing the amount of required storage, in the marketplace they are often not considered “deduplication” technologies when compared to solutions offered by other vendors. That sentiment, while not entirely accurate, is understood, and NetApp continues to expand its portfolio with several technologies for further deduplication of data. The following subsections cover two of the solutions that are available as of the writing of this paper; additional deduplication technologies are coming in both the short term and the more distant future.

Before delving into technical solutions, it makes sense to understand the value of deduplication to customers. The primary advantage of data deduplication is that it conserves physical disk space when storing data on disk. The average UNIX® or Windows® disk volume contains thousands of duplicate data strings. Traditionally, when copies of these volumes are created, every duplicate data string is also copied, resulting in an inefficient use of secondary storage. Deduplication helps to remove this inefficiency and yields a more effective cost per gigabyte in the data center.

Figure 1) Reduced storage costs with deduplication.

Page 7: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Overview 3

2.1.1 SnapVault for NetBackup™ The first industry-recognized deduplication technology from NetApp was SnapVault for NetBackup, which provides space savings similar to those provided by SnapVault to traditional NetBackup environments. This solution integrates the NetApp secondary storage system as an optimized backup repository for heterogeneous (not NetApp) primary storage. Its value is based on the assumption that a file in the same data set and path but in different backups is likely to have a lot of blocks in common.

Backups written to a NetApp storage unit utilize less disk space when compared to traditional disk storage units. After an initial client backup is performed, the Network Appliance™ Write Anywhere File Layout (WAFL®) file system saves only changed blocks when subsequent backups are performed for the same client, providing single-instance storage (SIS) deduplication of the additional backup images.

To NetBackup, the backup on the NetApp system looks like a standard NetBackup TAR image backup, allowing most normal NetBackup operations (duplication, synthetics, vaulting, and so on) to be performed. To end users, the backup on the NetApp system looks like a standard WAFL file system, accessible through NFS and CIFS.

SnapVault for NetBackup (SV-NBU) was released as a joint solution as part of Data ONTAP 7.1 and NetBackup 6.0, with a focus on data protection.

2.1.2 NetApp Deduplication for FAS Unlike SV-NBU, which performs block-level deduplication only for the same client/policy/directory/file, and only for use with NetBackup, NetApp deduplication for FAS deduplicates blocks anywhere in the active file system within the entire flexible volume, regardless of how the data got there.

In its initial release, deduplication primarily had a focus on data retention/archiving of file system data on secondary storage NetApp systems. Substantial storage savings can be achieved with deduplication in some tier 2 primary storage environments as well.

While NetApp deduplication for FAS is really part of a suite of deduplication technologies offered by NetApp, it is the sole focus of the remainder of this paper.

2.2 Dense Volumes Despite the introduction of less expensive ATA disk drives, one of the biggest challenges for disk-based backup today continues to be the storage cost. There is a desire to reduce storage consumption (and therefore storage cost per megabyte) by eliminating duplicated data through sharing across files.

The core NetApp technology to accomplish this goal is the dense volume, a flexible volume that contains shared data blocks. The NetApp Data ONTAP file system, WAFL, is a file system structure that supports shared blocks in order to optimize storage space consumption. Basically, within one file system tree there is the ability to have multiple references to the same data block, as shown in Figure 2.

Page 8: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Overview 4

Figure 2) Dense volumes.

To keep track of the many indirect blocks (“IND” in Figure 2) that are pointing to it, each data block has a block count reference kept in the volume metadata. As additional indirect blocks point to it or existing ones stop pointing to it, this value is incremented or decremented accordingly. When no indirect blocks point to a data block, it is released.

Deduplication uses dense volume technology to allow duplicate blocks anywhere in the flexible volume to be deleted.

2.3 Deduplication Features and Functions Deduplication provides block-level deduplication within the entire flexible volume on NetApp NearStore® storage systems. The depiction of how this works, at the highest level, is shown in Figure 3.

Figure 3) How NetApp deduplication for FAS works.

Page 9: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Overview 5

Essentially, deduplication only stores unique blocks in the flexible volume and creates a small amount of additional metadata in the process. Notable features of NetApp deduplication for FAS include:

Works with a high degree of granularity, at the block level.

Operates on the active file system of the flexible volume. Snapshot copies created after running deduplication enjoy the same storage savings benefits.

Is a background process that can be configured to run automatically, scheduled, or run manually through the command-line interface.

Is application transparent and therefore can be used for deduplication of data originating from anywhere in the data center.

Is enabled and managed using a simple command-line interface.

Can be enabled on and deduplicate blocks on flexible volumes with existing data too.

The remainder of this document goes into great detail on the operation of deduplication, but in general the following occurs:

Newly saved data on the NearStore is stored in blocks as usual by Data ONTAP. Each block of data has a digital fingerprint, which is compared to all other fingerprints in the flexible volume. If two fingerprints are found to be the same, a byte-for-byte comparison is done of all bytes in the block, and, if there is an exact match between the new block and the existing block on the flexible volume, the duplicate block is discarded and its disk space is reclaimed.

2.3.1 General Deduplication Operational Considerations Is enabled on a per flexible volume basis.

Can be enabled on any number of flexible volumes.

Can be run one of three ways:

Scheduled on specific days and at specific times

Manually via the command line

Automatically, when 20% new data has been written to the volume

Only one deduplication process runs on a flexible volume at a time.

Up to eight deduplication processes can run concurrently on the same NetApp storage array.

Deduplication is supported in an active/active clustered failover configuration. For clarifying details, see the “Deduplication and Active/Active Configuration” section.

Page 10: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Configuration and Operation 6

3 Configuration and Operation This section discusses what is required to install deduplication, how to configure it, and various aspects of managing it. Although this section discusses some basic things, in general it assumes both that the NetApp storage system is already installed and running, and that the reader is familiar with basic NetApp Data ONTAP administration.

3.1 Requirements Overview Table 1 specifies the requirements for deduplication.

Table 1) Deduplication requirements overview.

Hardware NearStore R200

FAS2020, FAS2050

FAS3020, FAS3040, FAS3050, FAS3070

FAS6030, FAS6040, FAS6070, FAS6080

IBM: N5200, N5300, N5500, N5600, N7600, N7800

Data ONTAP Data ONTAP 7.2.4 or later

Software nearstore_option (for all platforms except R200) license

a_sis license

Maximum Flexible Volume Size

FAS6070, FAS6080, N7800: 16TB

FAS6030, FAS6040, N7600: 10TB

FAS3070, N5600: 6TB

NearStore R200: 4TB

FAS3040, N5300: 3TB

FAS3050, N5500: 2TB

FAS3020, N5200: 1TB

FAS2050: 1TB

FAS2020: 0.5TB

Protocols All file-based and block-based protocols supported by Data ONTAP

Applications Refer to the “Deduplication Target Environment” section

3.2 Installing and Licensing Deduplication Deduplication is included in Data ONTAP and just needs to be licensed. Add the deduplication license using the following command:

license add <a_sis>

If you want to run deduplication on any of the FAS platforms you will also need to add the nearstore_option license:

license add <nearstore_option>

Page 11: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Configuration and Operation 7

3.2.1 Deduplication Licensing in a Clustered Environment Deduplication is a licensed option behind the NearStore option license. Hence, in a clustered environment, both nodes must have the NearStore option and deduplication licensed.

3.3 Command Summary Table 2 provides a description of all deduplication (related) commands. Cells that are shaded indicate those commands that are only available via “priv set diag”.

Table 2) Deduplication command summary.

sis on <vol> Activates deduplication on the flexible volume specified.

sis start -s <vol> Begins deduplication process on the flexible volume specified.

Using the -s option tells the deduplication operation to scan the flexible volume specified and process existing data.

This option should only be used upon initial configuration and deduplication on a flexible volume.

sis start <vol> Begins deduplication process on the flexible volume specified.

sis status [-l] <vol> Returns current status of deduplication for the specified flexible volume.

The -l option causes a long listing to be displayed.

df –s <vol> Returns the value of deduplication space savings in the active file system for the specified flexible volume.

sis config [-s sched]\ <vol>

Creates an automated deduplication sched(ule).

The syntax follows the SnapVault syntax model.

When deduplication is first enabled on a flexible volume, a default schedule is configured, running it each day of the week at midnight.

sis stop <vol> Suspends the deduplication process (if one is running) on the flexible volume specified.

sis off <vol> Deactivates deduplication on the flexible volume specified. This means there will be no more change logging or deduplication operations, but the flexible volume will remain a dense volume, and the storage savings will be kept.

If this command is used, and then deduplication is turned back on for this flexible volume, the flexible volume will need to be rescanned with the ”sis start –s” command.

Page 12: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Configuration and Operation 8

sis check <vol> Verifies and updates the fingerprint database for the specified flexible volume and includes purging stale fingerprints.

sis stat <vol> Displays the statistics of flexible volumes that have deduplication enabled.

sis undo <vol> Converts an deduplication-enabled flexible volume to a normal flexible volume.

3.4 Deduplication Quick Start Guide This section provides a quick run-through of the steps to configure and manage deduplication.

Table 3) Deduplication quick overview.

New Flexible Volume Flexible Volume with Existing Data

Flexible Volume Configuration

Create flexible volume.

Enable Deduplication on Flexible Volume

sis on <vol>

Initial Scan Not applicable. Scan/deduplicate the existing data. sis start -s <vol>

Create, Modify, Delete Schedules (if not doing manually)

Delete or modify the default deduplication schedule that was configured when deduplication was first enabled on the flexible volume or create desired schedule.

sis config [-s sched] <vol>

Manually Run Deduplication (if not using schedules)

sis start <vol>

Monitor Status of Deduplication

sis status <vol>

Monitor Space Savings

df –s <vol>

3.5 Monitoring Deduplication Status This section describes the meaning of various status messages about deduplication. The “sis status” command is the primary command used to report on the status of deduplication for a specific flexible volume or all the flexible volumes.

Page 13: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Configuration and Operation 9

Below, from the sis man page, you see the various State, Status, and Progress messages that can be returned when running sis status. Note that if you don’t provide a flexible volume name, the status for all flexible volumes that have deduplication enabled will be displayed.

toaster> sis status

Path State Status Progress

/vol/dvol_1 Enabled Idle Idle for 10:45:23

/vol/dvol_2 Enabled Pending Idle for 15:23:41

/vol/dvol_3 Disabled Idle Idle for 37:12:34

/vol/dvol_4 Enabled Active 25 GB Scanned

/vol/dvol_5 Enabled Active 25 MB Searched

/vol/dvol_6 Enabled Active 40 MB (20%) Done

/vol/dvol_7 Enabled Active 30 MB Verified

/vol/dvol_8 Enabled Active 10% Merged

And following is a textual description of the meaning for each flexible volume:

dvol_1 is Idle. The last deduplication operation on the flexible volume was finished 10:45:23 ago.

dvol_2 is Pending for resource limitation. The deduplication operation on the flexible volume will become Active when the resource is available.

dvol_3 is Idle because the deduplication operation is disabled on the flexible volume.

dvol_4 is Active. The deduplication operation is doing the whole flexible volume scanning (initiated with “sis start –s”). So far, it has scanned 25GB of data.

dvol_5 is Active. The operation is searching for duplicate data, and 25MB of data has already been searched.

dvol_6 is also Active. The operation has saved 40MB of data. This is 20% of the total duplicate data found in the searching stage.

dvol_7 is Active. It is verifying the metadata of processed data blocks. This process will remove unused metadata.

dvol_8 is Active. Verified metadata are being merged. This process will merge together all verified metadata of processed data blocks to an internal format that supports fast sis operation.

The general flow of the phases deduplication goes through and the correlating sis status messages when actively running on a flexible volume are shown in Figure 4.

Page 14: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Configuration and Operation 10

Figure 4) Deduplication status messages and their correlation to deduplication phases.

For additional information, the -l option will display detailed status, as shown below.

toaster> sis status -l /vol/dvol_6

Path: /vol/dvol_6

State: Enabled

Status: Active

Progress: 41020 KB (20%) Done

Type: Regular

Schedule: sun-sat@0

Last Operation Begin: Thu Mar 24 13:30:00 PST 2005

Last Operation End: Fri Mar 25 00:34:16 PST 2005

Last Operation Size: 4732932 KB

Last Operation Error: -

3.6 End-to-End Deduplication Configuration Example This section steps through the entire typical process of creating a flexible volume and configuring, running, and monitoring deduplication on it. (Note that steps are spelled out in detail, so it appears a lot lengthier than it would be in the real world.)

In this example we want a place to archive a number of large PST files various users have created and are maintaining. The destination NetApp storage system is called r200-rtp01, and it is assumed that deduplication has been licensed on this machine. As NetApp storage arrays are multiprotocol boxes, in this example we’ll actually be using a UNIX server to copy the PST data.

Page 15: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Configuration and Operation 11

1. Begin by creating a flexible volume (keeping in mind the maximum allowable volume size for the platform, as specified in the requirements table at the beginning of this section).

r200-rtp01*> vol create VolPST aggr0 200g

Creation of volume 'VolPST' with size 200g on containing aggregate

'aggr0' has completed.

2. Now, as a best practice, we’ll disable scheduled Snapshot copies. An alternative to what’s shown below would be to use the command “snap sched VolPST 0 0 0”.

r200-rtp01*> vol status VolPST

Volume State Status Options

VolPST online raid_dp, flex

Containing aggregate: 'aggr0'

r200-rtp01*> vol options VolPST nosnap true

r200-rtp01*> vol status VolPST

Volume State Status Options

VolPST online raid_dp, flex nosnap=on

Containing aggregate: 'aggr0'

3. Now we’ll enable deduplication on the flexible volume and verify that it’s turned on. The vol status command will show a sis attribute for flexible volumes that have deduplication turned on. (It can be a bit confusing, since sis is also indicated for those flexible volumes that have been written to by SnapVault for NetBackup.) Note that there needs to be space available in the flexible volume for the sis on command to complete successfully. That is, if the sis on command were attempted on a flexible volume that already had data and was completely full, it would fail (since there is no room to create the required metadata). Note that after turning deduplication on, Data ONTAP lets you know that if this were an existing flexible volume that already contained data prior to deduplication being enabled, you would want to run sis start –s; in this example it’s a brand-new flexible volume, so that’s not necessary.

r200-rtp01*> sis on /vol/VolPST

SIS for "/vol/VolPST" is enabled.

Already existing data could be processed by running "sis start -s /vol/VolPST".

r200-rtp01*> vol status VolPST

Volume State Status Options

VolPST online raid_dp, flex nosnap=on

sis

Containing aggregate: 'aggr0'

Page 16: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Configuration and Operation 12

4. Another way to verify that deduplication is enabled on the flexible volume is to just check the output from running sis status on the flexible volume.

r200-rtp01*> sis status /vol/VolPST

Path State Status Progress

/vol/VolPST Enabled Idle Idle for 00:00:20

5. Next we’ll turn off the default deduplication schedule. Since in this example the administrators will be moving large quantities of PST files in as time permits, we’ll want to let them run deduplication manually at opportune times.

r200-rtp01*> sis config /vol/VolPST

Path Schedule

/vol/VolPST sun-sat@0

r200-rtp01*> sis config -s - /vol/VolPST

r200-rtp01*> sis config /vol/VolPST

Path Schedule

/vol/VolPST -

At this point, in our example, the administrator NFS-mounted the flexible volume to /testPSTs on a Solaris™ host, sunv240-rtp01, and copied lots of PST files from their users’ directories into our new PST archive directory flexible volume. The result from the host perspective is shown below. (Obviously the same sort of thing could be accomplished by mapping a CIFS share to a Windows host.)

root@sunv240-rtp01 # pwd /testPSTs root@sunv240-rtp01 # df -k . Filesystem kbytes used avail capacity Mounted on r200-rtp01:/vol/VolPST 167772160 33388384 134383776 20% /testPSTs

The example continues with examining the flexible volume, running deduplication, and monitoring the status.

6. Use df –s to examine the storage consumed and the space savings provided. Note that no space savings have been achieved by simply copying data to the flexible volume even though deduplication is turned on. What has happened is that all the blocks that have been written to this flexible volume since deduplication was turned on have had their fingerprints written to the change log file.

r200-rtp01*> df -s /vol/VolPST

Filesystem used saved %saved

/vol/VolPST/ 33388384 0 0%

7. Start deduplication running on the flexible volume. This causes the change log to be processed, fingerprints to be sorted and merged, and duplicate blocks to be found.

r200-rtp01*> sis start /vol/VolPST

Page 17: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Configuration and Operation 13

The SIS operation for "/vol/VolPST" is started.

8. Use sis status to monitor the progress of deduplication. r200-rtp01*> sis status /vol/VolPST

Path State Status Progress

/vol/VolPST Enabled Active 9211 MB Searched

r200-rtp01*> sis status /vol/VolPST

Path State Status Progress

/vol/VolPST Enabled Active 11 MB (0%) Done

r200-rtp01*> sis status /vol/VolPST

Path State Status Progress

/vol/VolPST Enabled Active 1692 MB (14%) Done

r200-rtp01*> sis status /vol/VolPST

Path State Status Progress

/vol/VolPST Enabled Active 10 GB (90%) Done

r200-rtp01*> sis status /vol/VolPST

Path State Status Progress

/vol/VolPST Enabled Active 11 GB (99%) Done

r200-rtp01*> sis status /vol/VolPST

Path State Status Progress

/vol/VolPST Enabled Idle Idle for 00:00:07

9. Once sis status indicates the flexible volume is once again in the Idle state, deduplication has finished running, and we can now check the space savings it provided in the flexible volume.

r200-rtp01*> df -s /vol/VolPST

Filesystem used saved %saved

/vol/VolPST/ 24072140 9316052 28%

That’s all there is to it.

Page 18: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Configuration and Operation 14

3.7 Configuring Deduplication Schedules This section provides some specifics about configuring schedules with deduplication.

The sis config command is used to configure and view deduplication schedules for flexible volumes. Usage syntax is shown below.

r200-rtp01*> sis help config sis config [ [ -s schedule ] <path> | <path> ... ] - Sets up, modifies, and retrieves the schedule of SIS volumes.

Run with no arguments, sis config will return the schedules for all flexible volumes that have deduplication enabled. The example below shows the four different formats the reported schedules can have.

toaster> sis config

Path Schedule

/vol/dvol_1 -

/vol/dvol_2 23@sun-fri

/vol/dvol_3 auto

/vol/dvol_4 sat@6

The meaning of each of these schedule types is as follows.

On flexible volume dvol_1 deduplication is not scheduled to run.

On flexible volume dvol_2 deduplication is scheduled to run every day from Sunday to Friday at 11 p.m.

On flexible volume dvol_3 deduplication is set to auto schedule. This means deduplication will be triggered by the amount of new data written to the flexible volume, specifically when there are 20% new fingerprints in the change log.

On flexible volume dvol_4 deduplication is scheduled to run at 6 a.m. on Saturday.

When the -s option is specified, the command will set up or modify the schedule on the specified flexible volume. The schedule parameter can be specified in one of four ways:

[day_list][@hour_list]

[hour_list][@day_list]

-

auto

The day_list specifies which days of the week deduplication should run. It is a comma-separated list of the first three letters of the day: sun, mon, tue, wed, thu, fri, sat. The names are not case sensitive. Day ranges such as mon-fri can also be given. The default day_list is sun-sat.

The hour_list specifies which hours of the day deduplication should run on each scheduled day. The hour_list is a comma-separated list of the integers from 0 to 23. Hour ranges such as 8-17 are allowed.

Page 19: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Configuration and Operation 15

Step values can be used in conjunction with ranges. For example, 0-23/2 means "every two hours." The default hour_list is 0 (that is, midnight on the morning of each scheduled day).

If "-" is specified, there won't be a scheduled deduplication operation on the flexible volume.

The “auto” schedule causes deduplication to run on that flexible volume whenever there are 20% new fingerprints in the change log. This check is done in a background process and occurs every minute.

When deduplication is enabled on a flexible volume the first time, an initial schedule is assigned to the flexible volume. This initial schedule is sun-sat@0, which means "once every day at midnight."

To configure the schedules shown earlier in this section, the following commands would be issued:

toaster> sis config -s - /vol/dvol_1

toaster> sis config -s 23@sun-fri /vol/dvol_2

toaster> sis config –s auto /vol/dvol3

toaster> sis config –s sat@6 /vol/dvol_4

Page 20: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Operating Characteristics 16

4 Operating Characteristics This section discusses where deduplication makes sense and the behavior that you can expect.

4.1 Deduplication Target Environment This section discusses where deduplication is a good fit.

Deduplication supports flexible volumes that have data written to them using CIFS or NFS, or as LUNs accessed using FCP/iSCSI. Basically it doesn’t matter how the data got on the NetApp storage system; deduplication will deduplicate it.

Deduplication was initially targeted to data retention/archival environments in its first release (Data ONTAP 7.2L1), focusing on archives of file data: for example, home directories, engineering development, Microsoft® Office, e-mail archive, SharePoint, technical and general publications, and so on.

Substantial benefit can be achieved in some tier 2 primary storage environments as well. Typically, Home Directory and VMware environments are especially well-suited.

Deduplication is supported in disaster recovery configurations where SnapMirror® is used; see the “Replication and SnapMirror” section for specific details.

4.2 Deduplication Performance Deduplication is tightly integrated with Data ONTAP and the WAFL file structure. Because of this, deduplication is performed with extreme efficiency. Complex hashing algorithms and look-up tables are not required. Instead, deduplication is able to leverage the internal characteristics of Data ONTAP to create and compare digital fingerprints, redirect data pointers, and free up redundant data areas, all with a minimal amount of performance impact.

4.3 Deduplication Storage Savings While deduplication can deduplicate any blocks in a flexible volume of the NetApp storage system, the storage savings achieved can vary based on the data set.

Running deduplication one time on a single data set can provide the storage savings that cover the spectrum of 10% to 90%, with 30% to 50% being typical.

In cases where customers are backing up or archiving data over and over again, the realized storage savings deduplication can provide get better and better, achieving 20:1 (95%) and higher over time.

4.4 Additional Deduplication Considerations This section provides some discussion on other deduplication-related topics. Some of this information may be covered elsewhere, but it bears reiterating here.

First, refer to the deduplication requirements table (Table 1) in the beginning of section 3 for specific supported hardware and software and necessary licenses.

Page 21: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Operating Characteristics 17

4.4.1 Number of Deduplication Processes A maximum of eight deduplication processes can be run at the same time on the same NearStore device.

If another flexible volume is scheduled to have deduplication run while eight deduplication processes are already running, deduplication for this additional flexible volume will be queued. For example, say a user sets a default schedule (sun-sat@0) for 10 deduplication volumes. Eight will run at midnight, and the remaining two will be queued.

As soon as one of the eight current deduplication processes completes, one of the queued ones will start, and when another deduplication process completes, the second queued one will start.

Next time deduplication is scheduled to run on these same 10 flexible volumes, a round-robin paradigm will be used so the same ones aren’t always the first ones run.

For manually triggered deduplication runs, if eight deduplication processes are already running when a command is issued to start another one, the request will fail, and the operation will not be queued.

4.4.2 Deduplication and Active/Active Configuration NetApp cluster services are supported with deduplication in the following manner upon failover to the partner node.

Writes to the flexible volume will have fingerprints written to the change log.

No sis administration operations or deduplication will function.

Upon failback, normal deduplication operations can continue and the updated change log processed.

Deduplication is a licensed option behind the NearStore option license. Our best practice recommendation is to have both nodes in an active/active configuration licensed with the NearStore option and deduplication.

4.4.3 Deduplication and Space Savings on Existing Data A major benefit of deduplication is that it can be used to deduplicate existing data on previously used flexible volumes (after upgrading to Data ONTAP 7.4). It is completely realistic to assume that Snapshot copies may exist. What happens when you run deduplication in this case?

When you first run deduplication on this flexible volume, the storage savings will probably be rather small or even nonexistent because existing Snapshot copies are not deduplicated.

Previous Snapshot copies will expire, and as they do some small savings will be realized, but they too will probably be pretty low.

During this period of old Snapshot copies expiring, it is fair to assume new data is being created on the flexible volume and Snapshot copies being created.

Thus the storage savings may stay rather flat (that is, very low).

When the last Snapshot copy that was created before deduplication was run is deleted, the storage savings should increase noticeably.

Page 22: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Operating Characteristics 18

4.4.4 Deduplication Best Practices This section contains general rules of thumb that might not have been covered elsewhere in this document.

If there is very little new data, run deduplication infrequently, because it doesn't make sense to unnecessarily consume CPU resources. How often you run it will depend on the change rate of the data in the flexible volume.

The best options are:

Use the auto mode so that deduplication only runs when significant additional data has been written to each particular flexible volume (this will tend to naturally spread out when deduplication runs).

Stagger deduplication schedules for the flexible volumes so it runs on alternative days.

Run deduplication manually.

Run deduplication before creating Snapshot copies, as this will ensure no undeduplicated data gets locked in Snapshot copies. If a Snapshot copy is created on a flexible volume before deduplication has a chance to run/complete on that flexible volume, this could result in lower space savings.

The Snapshot reserve should be greater than 0 if Snapshot copies are to be used. (An exception to this might be in a SAN environment, where often it is set to zero for thin provisioning of LUNs.)

There must be some free space in the flexible volume to allow deduplication to operate and create the metadata it requires. As necessary, flexible volumes can be resized, with no impact to data access, to accommodate this.

Page 23: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Common Problems and Troubleshooting 19

5 Common Problems and Troubleshooting This section covers issues that have been known to come up when configuring and running deduplication.

5.1 Licensing Make sure deduplication is properly licensed and, if the platform is not an R200, make sure the NearStore option is also properly licensed:

fas3070-rtp01*> license

a_sis <license>

nearstore_option <license>

5.2 Volume Sizes Adhere to the deduplication volume size limits presented in the “Requirements Overview” section. If you exceed them you will not be able to enable deduplication on that volume. Below is an example of the message displayed if the volume is too large to enable deduplication.

london-fs3> sis on /vol/projects Volume or maxfiles exceeded max allowed for SIS: /vol/projects

Also note that there needs to be free space available in the flexible volume for the “sis on” command to complete successfully. If a flexible volume is full, deduplication will not run. However, as noted earlier, flexible volumes can be resized with no impact to data access to accommodate this.

5.3 Logs and Error Messages New error log: /etc/log/sis

New error messages

Registry errors: Check if vol0 is full.

Metafile op errors: Check if the deduplication flexible volume is full.

License errors: Check if license is installed.

Change log full error: Perform a “sis start” operation that will empty the change log metafile when finished.

5.4 Other Issues Refer to the Data ONTAP 7.4 release notes for complete information.

Page 24: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Common Problems and Troubleshooting 20

5.5 Not Seeing Space Savings If you’ve run deduplication on a flexible volume that you’re confident contains data that should deduplicate well, yet you are not seeing any space savings, there’s a good chance a number of Snapshot copies exist and are locking a lot of data. This especially tends to happen when people run deduplication on existing flexible volumes of data.

Use the “snap list” command to see what Snapshot copies exist and the “snap delete” command to remove them. Alternatively, wait for the Snapshot copies to expire, and the space savings will appear.

5.6 Undeduplicating a Flexible Volume It is possible, and easy, to “undeduplicate” a flexible volume that has deduplication enabled, by backing out deduplication and turning it back into a “regular” (non-dense) flexible volume. This can be done while the flexible volume is online and is accomplished as described below.

Turn deduplication off on the flexible volume. (Note that this command stops fingerprints from being written to the change log as new data is written to the flexible volume. If this command is used, and then deduplication is turned back on for this flexible volume, the flexible volume will need to be rescanned with the ”sis start –s” command.)

sis off <flexvol>

Use the following command1 to recreate the duplicate blocks in the flexible volume.

sis undo <flexvol>

When this command completes, it will delete the fingerprint file and the change log files.

Below is an example of undeduplicating a flexible volume. r200-rtp01*> df –s /vol/VolReallyBig2

/vol/VolReallyBig2/ 20568276 3768732 15%

r200-rtp01*> sis status /vol/VolReallyBig2

Path State Status Progress

/vol/VolReallyBig2 Enabled Idle Idle for 11:11:13

r200-rtp01*> sis off /vol/VolReallyBig2

SIS for "/vol/VolReallyBig2" is disabled.

r200-rtp01*> sis status /vol/VolReallyBig2

Path State Status Progress

/vol/VolReallyBig2 Disabled Idle Idle for 11:11:34

r200-rtp01*> sis undo /vol/VolReallyBig2

Wed Feb 7 11:13:15 EST [wafl.scan.start:info]: Starting SIS volume scan on volume VolReallyBig2.

r200-rtp01*> sis status /vol/VolReallyBig2

Path State Status Progress

/vol/VolReallyBig2 Disabled Undoing 424 MB Processed

r200-rtp01*> sis status /vol/VolReallyBig2

1 Note that the undo option of the sis command is only available in the diag mode, accessed using the command “priv set diag”.

Page 25: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Common Problems and Troubleshooting 21

No status entry found.

r200-rtp01*> df -s /vol/VolReallyBig2

Filesystem used saved %saved

/vol/VolReallyBig2/ 24149560 0 0%

Note that if sis undo starts processing and then there is not enough space to undeduplicate, it will stop, complain with a message about insufficient space, and leave the flexible volume dense. All data is still accessible, but some block sharing is still occurring. Use “df –s” to understand how much free space you really have and then either grow the flexible volume or delete data or Snapshot copies to provide the needed free space.

5.7 Additional Reporting with “sis stat –l” For additional status information, you can do “priv set diag” and use the “sis stat –l” command for long detailed listings.

5.8 Deduplication and Reboots If a NetApp storage system is rebooted when deduplication is running, when it reboots deduplication will be in the “Idle” state for that flexible volume. When the next deduplication processing for that flexible volume starts, it will clean up any remaining intermediate metadata that was created by the previous deduplication operation.

Page 26: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Deduplication and Replication 22

6 Deduplication and Replication Although there are substantial benefits to be achieved with deduplication, a complete solution will most likely involve the need to additionally mirror it to another location for disaster recovery purposes.

Replication of the deduplication-enabled flexible volume is supported using NetApp SnapMirror in two ways, as discussed in the next two subsections.

6.1 Replicating a Deduplicated Flexible Volume for DR A deduplicated flexible volume can be replicated to a secondary storage system (destination) using Volume SnapMirror (VSM) as shown in Figure 5.

Figure 5) VSM of a deduplicated flexible volume for disaster recovery.

Key points in this scenario are:

The nearstore_option must be licensed on both the source and destination.

Deduplication must be licensed at the primary location (source).

Deduplication does not need to be licensed at the destination. However, if there is a situation in which the primary site is down and the secondary location becomes the new primary, deduplication needs to be licensed for continued deduplication to occur. Thus, the best practice is to have deduplication licensed at both locations.

Deduplication is only enabled, run, and managed from the primary location.

The flexible volume at the secondary location will “inherit” all the deduplication attributes and storage savings through SnapMirror.

Only unique blocks are transferred, so deduplication reduces network bandwidth usage too.

Page 27: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

NetApp, Inc. Deduplication and Replication 23

6.2 Replicating Primary Data to a Deduplicated Flexible Volume

A production primary flexible volume can be replicated to a deduplication-enabled flexible volume on a secondary storage system using Qtree SnapMirror (QSM), as shown in Figure 6.

Figure 6) QSM of production data to a deduplicated flexible volume.

Key points in this scenario are:

The nearstore_option must be licensed on the destination.

Deduplication is only licensed at the secondary location (destination).

Deduplication is enabled, run, and managed on a flexible volume at the secondary location.

Deduplication doesn’t yield any network bandwidth savings as QSM works at the logical layer.

Storage savings benefit at the QSM destination is achieved by running deduplication on the destination after QSM has finished transferring the data.

Page 28: Technical Report- Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision

Technical Report: 16 April 2008 NetApp Deduplication for FAS TR-3505 Deployment and Implementation Guide 4th Revision

24

NetApp, Inc.

© 2008 NetApp, Inc. All rights reserved. Specifications subject to change without notice. NetApp, the NetApp logo, Data ONTAP, FlexClone, FlexVol, NearStore, SnapMirror, SnapVault, and WAFL are registered trademarks and NetApp and Snapshot are trademarks of NetApp, Inc. in the U.S. and other countries. Solaris is a trademark of Sun Microsystems, Inc. Windows and Microsoft are registered trademarks of Microsoft Corporation. UNIX is a registered trademark of The Open Group. NetBackup is a trademark of Symantec Corporation or its affiliates in the U.S. and other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.