emc celerra ns series · celerra server ... (dr) ... the rainfinity fma on celerra ns series...

75
EMC CONFIDENTIAL INTERNAL USE ONLY EM USE ONLY C CONFIDENTIAL INTERNAL AND PARTNER ETE IF THIS IS A PUBLIC DOCUMENT EMC CONFIDENTIAL DEL INTERNAL USE ONLY CONFIDENTIAL INTERNAL AND PARTNER USE ONLY EMC T EMC Solutions for Rainfinity File Management Appliance Solution Guide EMC ® Celerra ® NS Series EMC NAS Product Validation Corporate Headquarters Hopkinton, MA 01748-9103 1-508-435-1000 www.EMC.com www.EMC.com

Upload: voliem

Post on 04-Jun-2018

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

EMC CONFIDENTIAL – INTERNAL USE ONLY EM USE ONLY C CONFIDENTIAL – INTERNAL AND PARTNER

ETE IF THIS IS A PUBLIC DOCUMENT EMC CONFIDENTIAL

DEL– INTERNAL USE ONLY

CONFIDENTIAL – INTERNAL AND PARTNER USE ONLY EMCT

EMC Solutions for Rainfinity File Management Appliance

Solution Guide

EMC® Celerra® NS Series

EMC NAS Product Validation Corporate Headquarters

Hopkinton, MA 01748-9103 1-508-435-1000 www.EMC.com

www.EMC.com

Page 2: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Copyright © 2008 EMC Corporation. All rights reserved.

Published August, 2008

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

All other trademarks used herein are the property of their respective owners.

Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

P/N H5669

2 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 3: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Contents

About this Document ............................................................................................................................................. 9 Chapter 1 Solution Overview......................................................................................................................... 11

The technology solution overview ................................................................................................ 12 Terminology.................................................................................................................................. 12

Chapter 2 Solution Reference Architecture ................................................................................................... 15 Overall architecture ....................................................................................................................... 16 Network architecture ..................................................................................................................... 17

Subnets ...............................................................................................................................17 Switches ...............................................................................................................................18 EMC NS40 Data Mover ports ..............................................................................................18

Dataset .......................................................................................................................................... 18 Celerra server and FileMover architecture .................................................................................... 19

Celerra server........................................................................................................................19 FileMover .............................................................................................................................20

Centera architecture....................................................................................................................... 20 File Management Appliance architecture...................................................................................... 21 High availability and failover........................................................................................................ 21

Primary Celerra failure .........................................................................................................21 Primary Centera failure.........................................................................................................22 File Management Appliance failure......................................................................................22

Hardware and software resources.................................................................................................. 22 Hardware resources ..............................................................................................................22 Software resources................................................................................................................24

Chapter 3 Solution Best Practices.................................................................................................................. 25 File Management environment...................................................................................................... 26

Appliances of the File Management product line .................................................................26 Configuring the File Management Appliance.......................................................................26 Configuring Celerra Data Movers as an archiving source ....................................................27 Configuring Celerra Virtual Data Movers as an archiving source........................................34 Configuring Centera as an archiving destination..................................................................34

Archiving and recalling data ......................................................................................................... 35 FMA Software Architecture .................................................................................................35 Archiver System Architecture ..............................................................................................36

Rainfinity File Management Appliance EMC Celerra NS Series 3

Solution Guide

Page 4: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Contents

Archiving a Celerra source ...................................................................................................36 Overview of the archiving process........................................................................................37 Overview of the recall process..............................................................................................39 Archiving to Centera.............................................................................................................40 Archiving to Celerra .............................................................................................................40 Multi-protocol interoperability .............................................................................................41

Implementing an archiving strategy ..............................................................................................41 Developing archiving policies ..............................................................................................42 Creating and monitoring archiving tasks ..............................................................................47

Managing archived data ................................................................................................................50 Managing Celerra repositories ..............................................................................................50 Managing Centera data .........................................................................................................50 Stub re-creation.....................................................................................................................51 Stub scanner..........................................................................................................................52 Orphan file management.......................................................................................................53

Disaster Recovery—High availability and failover.......................................................................53 High Availability (HA) .........................................................................................................53 Disaster Recovery (DR)........................................................................................................54 The role of FMA in HA and DR environments ....................................................................54 Failover best practices...........................................................................................................54 Celerra failback.....................................................................................................................68 Centera replication................................................................................................................72

FMA Backup .................................................................................................................................74 The FMA database ........................................................................................................................74

4 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 5: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Figures

Figure 1 Rainfinity FMA on EMC Celerra NS Series system architecture.............................. 17 Figure 2 EMC NS40 Data Mover ports and traffic types......................................................... 18 Figure 3 FMA Configuration ................................................................................................... 31 Figure 4 Create New File Server.............................................................................................. 32 Figure 5 Select Celerra as the File Server ................................................................................ 32 Figure 6 Celerra Properties ...................................................................................................... 33 Figure 7 Centera File Server .................................................................................................... 34 Figure 8 Centera Properties...................................................................................................... 35 Figure 9 Create an archiving task............................................................................................. 48 Figure 10 Task schedule and detailed report .............................................................................. 49 Figure 11 Delay Stubbing .......................................................................................................... 51 Figure 12 Stub scanner............................................................................................................... 53 Figure 13 Create a Celerra Network Server ............................................................................... 56 Figure 14 New Data Mover Interconnect................................................................................... 57 Figure 15 New Replication ........................................................................................................ 59 Figure 16 New File System Replication..................................................................................... 60 Figure 17 File System Failover .................................................................................................. 62 Figure 18 Create a VDM............................................................................................................ 64 Figure 19 Replicate a VDM ....................................................................................................... 65 Figure 20 New VDM Replication .............................................................................................. 66 Figure 21 Create a File System Replication for VDM ............................................................... 67 Figure 22 Celerra failover status ................................................................................................ 68 Figure 23 Start a replication ....................................................................................................... 69 Figure 24 Start a replication ...................................................................................................... 70 Figure 25 Reverse a replication.................................................................................................. 71 Figure 26 Replication status....................................................................................................... 72

Rainfinity File Management Appliance EMC Celerra NS Series 5

Solution Guide

Page 6: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Figures

6 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 7: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Tables

Table 1 Dataset description ........................................................................................................... 18 Table 2 Hardware specifications ................................................................................................... 22 Table 3 Software specifications..................................................................................................... 24

Rainfinity File Management Appliance EMC Celerra NS Series 7

Solution Guide

Page 8: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Tables

8 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 9: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

About this Document

This solution guide provides an overview of the architecture, best practices, and implementation strategies of an EMC Solution for Rainfinity File Management Appliance developed by the EMC NAS Product Validation group.

Purpose

This guide provides an overview of a Rainfinity File Management Appliance using a Celerra NS server as the production data source. This guide demonstrates disaster recovery scenarios in the event of component or entire production site failures. Information in this document can be used as the basis for a solution build, white paper, best practices document, or training. Information in this document can also be used by other EMC organizations.

Audience

This document is intended for EMC personnel, partners, and customers looking for a cost-effective way to implement a tiered-storage environment using the Rainfinity File Management Appliance in a Celerra and Centera environment. As such, readers should already be familiar with the installation and administration of the Celerra operating environment, the Centera operating environment, and the Rainfinity File Management Appliance.

Related documents

The following documents provide additional, relevant information and are located on Powerlink at http://Powerlink.EMC.com. Access to these documents is based on your login credentials. If you do not have access to the content listed below, contact your EMC representative:

♦ Using Celerra Replicator (V2) Technical Module located at the following path on Powerlink:

Home > Support > Technical Documentation and Advisories > Hardware/Platforms Documentation > Celerra Network Server > Installation/Configuration

♦ Rainfinity File Management Application Installation and User Guide located at the following path on Powerlink:

Home > Support > Technical Documentation and Advisories > ~ P-R ~ Documentation > Rainfinity File Management Appliance > Maintenance/Administration

Rainfinity File Management Appliance EMC Celerra NS Series 9

Solution Guide

Page 10: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

About this Document

♦ Using Celerra FileMover Technical Module located at the following path on Powerlink:

Home > Support > Technical Documentation and Advisories > Hardware/Platforms Documentation > Celerra Network Server > Installation/Configuration

♦ EMC Rainfinity File Management Appliance Version 7.2 Best Practices Planning white paper located at the following path on Powerlink:

Products > Software P – R > Rainfinity Family > Rainfinity File Management Appliance > White Papers

♦ Centera Online Help

10 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 11: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Chapter 1 Solution Overview

This chapter presents these topics:

The technology solution overview ................................................................................................ 12 Terminology.................................................................................................................................. 12

Rainfinity File Management Appliance EMC Celerra NS Series 11

Solution Guide

Page 12: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Overview

The technology solution overview The Rainfinity® File Management Appliance (FMA) is a policy-based file system archiving appliance for the EMC® Celerra® NS Series. File system archiving is an intelligent process for moving inactive or infrequently accessed files from production storage to secondary tiers of storage.

The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a disaster recovery site that can take over should a catastrophic event occur that would render the primary site unusable. To protect against minor or major component failures at the production site, the solution is designed with high-availability built-in to each of the configuration’s components.

The configuration scenario includes Microsoft Windows and Linux client systems that access CIFS or NFS data on the Celerra at the production site, a Rainfinity FMA policy engine that implements archiving policies, and an EMC Centera® configured as the secondary storage repository for archived data. The production site Celerra is configured with Celerra Replicator™ which is an asynchronous remote replication infrastructure tool for Celerra and supports all types of replications—File Systems, Virtual Data Movers, and iSCSI LUNS. It produces a read-only, point-in-time copy of a source file system, an iSCSI LUN or a Virtual Data Mover and periodically updates this copy, making it consistent with the source object. In the event of a Celerra failure, the disaster recovery site Celerra will take over as the production Celerra until the failure is resolved and failback has occurred. The iSCSI LUN replication feature is not used in the tested configuration.

The validated solution archives data residing on a Celerra to a Centera as the secondary storage. Source data on a Celerra can alternatively be archived to another Celerra. This guide includes configuration information for archiving to Centera as well as Celerra.

The role of the FMA policy engine is to archive data to and recall data from secondary storage. In the event of a failure, the FMA will fail over to an FMA-HA device from which files can be recalled from the Centera secondary storage—data cannot be archived using the FMA-HA device. Archiving can resume when operation has been restored to the primary FMA. In the event of a total site failure, the FMA at the primary site will failover to the FMA at the disaster recovery site.

EMC Celerra FileMover is a Celerra feature that delivers dynamic file mobility—the ability to automate the archiving of files in a network-attached storage environment across a hierarchy of storage platforms. When Celerra FileMover is used with external policy and archiving software, it automatically archives infrequently used files, or files of a particular type, to slower, less-expensive storage devices.

Centera is also configured for high availability. In the event of a Centera failure, the disaster recovery Centera will become the active secondary storage device.

Terminology As you use this solution guide, it is important to understand certain key terms related to the components of the solution. The following list provides definitions of the terms used in this document.

BLOB: A binary large object (BLOB) is an unstructured file, typically an image or sound file that is stored in a relational database table potentially along with columns of other data types. A

12 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 13: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Overview

BLOB on Centera is also used for storing unstructured data. The BLOBs in Centera are not part of a relational database table and are referenced by their Content Address

C-Clip: A package containing the user’s data and associated metadata. When a user presents a file to the Centera system, the system calculates a unique Content Address (CA) for the data and then stores the file. The system also creates a separate XML file containing the CA of the user’s file and application-specific metadata. Both the XML file and the user’s data are stored in the C-Clip™.

DHSM: Distributed Hierarchical Storage Management is the former name for Celerra FileMover.

FLR: File-level retention is an EMC Celerra Network Server software feature for setting file-based permissions on a file system to limit write access for a specified retention period. Using file-level retention, you can archive data to file-level retention (FLR) storage on standard rewritable magnetic disks through NFS or CIFS operations. File-level retention allows you to create a permanent, unalterable set of files and directories and ensure the integrity of data.

FMA: File Management Appliance

FMA-HA: File Management Appliance High Availability

FQDN: Fully Qualified Domain Name is the full name of a system, containing the domain name of the organization and its highest sub-domain.

NDMP: The Network Data Management Protocol (NDMP) allows you to control the backup and recovery of an NDMP server through a network backup application, without installing third-party software on the server. In a Celerra Network Server, the Data Mover functions as the NDMP server. NDMP-based backups are used for high-capacity backups and in environments where multi-protocol support is required. NDMP separates the control and data transfer components of a backup or restore. The actual backups are handled by the Data Mover, which minimized network traffic.

PEA: A Centera Pool Entry Authorization (PEA) file is generated while creating or updating an access profile. It is a clear-text, XML-formatted, non-encrypted file which can be used by system administrators to communicate and distribute authentication credentials to application administrators.

QoS: Quality of Service. This feature is inherent in the Celerra Replicator software and allows the Celerra administrator to specify a schedule of times, days, and bandwidth limits on the source to destination IP network interconnects.

RPQ: Request for Product Qualifier.

Rainfinity File Management Appliance EMC Celerra NS Series 13

Solution Guide

Page 14: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Overview

14 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 15: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Chapter 2 Solution Reference Architecture

This chapter presents these topics:

Overall architecture ....................................................................................................................... 16 Network architecture ..................................................................................................................... 17 Dataset .......................................................................................................................................... 18 Celerra server and FileMover architecture .................................................................................... 19 Centera architecture....................................................................................................................... 20 File Management Appliance architecture...................................................................................... 21 High availability and failover........................................................................................................ 21 Hardware and software resources.................................................................................................. 22

Rainfinity File Management Appliance EMC Celerra NS Series 15

Solution Guide

Page 16: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Reference Architecture

Overall architecture This Rainfinity FMA solution configuration, as shown in Figure 1, is characterized by the usage of a Celerra NS40 as the data source for Windows and Linux clients. The clients access CIFS and NFS data over a Gigabit Ethernet network.

Based on established FMA policies, inactive or infrequently accessed data is archived to the Centera which is also configured on the Gigabit Ethernet network. Archived files are replaced on the Celerra with stub files. Stub files contain all the necessary metadata required for file recall by the users.

The solution is configured for high availability in the event of a component failure and for disaster recovery in the event of an entire site failure.

Component failure

Should the FMA device fail, the FMA-HA device is configured to take over and process recall requests from the clients.

The Celerra is configured with two Data Movers: a primary and a standby. Each Data Mover is a completely autonomous file server with its own operating system image. During normal operations the clients interact directly with the Data Mover. If a hardware or software failure occurs, a primary Data Mover fails over to a standby and the standby Data Mover assumes the identity and functionality of the failed Data Mover.

The Centera is configured with four access nodes. If a single node fails, the remaining nodes will continue to access the storage.

Site failure

This solution comprises two sites, production and remote standby sites. At any time, only one site serves client connectivity and the other acts as a hot standby. Both sites are set up with identical hardware and software components. This is only a recommendation; it is not an absolute requirement to match every component in the entire solution stack between the production and remote sites.

This solution uses Celerra Replicator (V2) for asynchronously replicating the source data on the production Celerra NS40 to the NS40 at the disaster recovery site. Additionally, the solution uses Centera replication to replicate a copy of the archived data from the source Centera to the disaster recovery Centera.

The File Management Appliance at the disaster recovery site will take over for the FMA at the production site. Clients will have the ability to recall files from the DR FMA. Files can only be archived if a copy of the production FMA database is restored to the FMA at the disaster recovery site.

16 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 17: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Reference Architecture

Figure 1 Rainfinity FMA on EMC Celerra NS Series system architecture

Network architecture The File Management technology is IP based and does not have any non-standard networking requirements. As with any client that connects to a Celerra or Centera, the FMA and FMA-HA appliances must be able to establish IP connectivity to the systems they will interact with.

This section describes the network architecture of the validated solution.

Subnets

A subnet is any network that is part of a larger IP network as identified by the IP address and the subnet mask. The validated Rainfinity FMA solution components at the production site are configured in the production subnet. In the event of a site failure, the clients will be re-directed to the disaster recovery subnet for access to files on the disaster recovery Celerra NS40.

Rainfinity File Management Appliance EMC Celerra NS Series 17

Solution Guide

Page 18: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Reference Architecture

Switches

For IP switches on which the configuration components are configured, EMC recommends that these switches support Gigabit Ethernet (GbE) connections.

The production and disaster recovery subnets are inter-connected by a 24-port Gigabit IP Switch.

EMC NS40 Data Mover ports

This section describes the network ports on the rear of an EMC NS40 Data Mover as shown in Figure 2.

Figure 2 EMC NS40 Data Mover ports and traffic types

Port cge0 handles all I/O required between the clients and the Celerra while port cge3 handles the file system replication load between the production and disaster recovery Celerra NS40. Ports cge1 and cge2 are left open for future growth.

Dataset For each of the three clients, the production Celerra contained 1 GB of mixed file data including text files, PDF files, and office-type documents such as Microsoft Word and Powerpoint files. The dataset ratio is broken down as shown in Table 1:

Table 1 Dataset description

File size File type Overall ratio 10 MB Text files 20% 100 MB Office-type files 30% 500 MB Executable files 40% 1000 MB Other 10%

18 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 19: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Reference Architecture

Celerra server and FileMover architecture This section presents an overview of the Celerra and FileMover architectures. FileMover software is required by the File Management Appliance to archive data from Celerra to Centera.

Celerra server

Celerra is a network-attached storage server that enables clients on a network to store and retrieve files over the network. Dedicated file servers, like Celerra, offer high availability and disaster recovery options that are not commonly available with general-purpose servers.

The Celerra is made up of several key components—the Data Movers, Control Station, storage system, and network.

Additionally, Celerra is available in two distinct configurations: integrated and gateway. The Celerra integrated platform is comprised of one or more Data Movers and a dedicated storage processor enclosure (SPE). The SPE manages the back-end CLARiiON® storage system. The gateway Celerra is also comprised of one or more Data Movers and connects into an existing SAN storage system.

Data Mover

Data Movers move data between the storage system and the client computer. Administrators typically do not manage Data Movers directly but work with the Control Station that in turn sends commands to the Data Mover. A Data Mover can be active or a standby for other Data Movers. Each Data Mover is independent; there is no communication between Data Movers. If a Data Mover fails, the Control Station manages the failover to the standby Data Mover, assuming a standby has been configured. Each Data Mover is connected to the storage system, Control Station, and network.

Control Station

The Control Station is a dedicated management computer that monitors and sends commands to Data Movers. It connects to each Data Mover and to the storage systems through the Data Movers. The Control Station can be connected to a public or private network for remote administration. After the Data Movers are booted, they do not depend on the Control Station for normal operation. In the unlikely event the Control Station fails, Data Movers continue to serve files to clients.

Storage system

The storage system is an array of physical disk devices used by the Celerra. It contains one or more disk arrays. The storage system stores and retrieves blocks of data for the Data Movers and is made up of Fibre Channel or Serial-ATA disks. The Celerra system offers two options to allocate and manage the storage system:

♦ Automatic Volume Management (AVM)

The AVM feature automatically creates and manages usable file system storage. File systems are created using system-defined or user-defined storage pools. System-defined storage pools are dynamic by default, and AVM adds or removes volumes automatically from the storage pool as needed.

Rainfinity File Management Appliance EMC Celerra NS Series 19

Solution Guide

Page 20: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Reference Architecture

♦ Manual Volume Management (MVM)

MVM provides additional flexibility in creating and aggregating different volume types into usable file-system storage that meets required configuration needs. Creating and managing volumes manually provides greater control over the location of storage allocated to a file system. There are a variety of volume types and configurations from which to choose to optimize a file system’s storage potential. Volumes can be divided, combined, and grouped to meet specific configuration needs.

Network

The network enables clients to communicate with the Celerra. Each Data Mover has its own set of network connections. There are also network connections between the Data Movers and the Control Station. Each Data Mover has one or more network interface cards (NICs).

FileMover

Celerra FileMover is an API that allows the automatic migration of Celerra files to a secondary tier of storage such as Centera. In the Celerra FileMover environment, the Celerra system is the primary data storage platform, while the secondary platform may be the same or another Celerra, or it may be a Centera as is demonstrated in this paper. Celerra FileMover is included with the Celerra NAS operating system. FileMover configuration parameters are included in Chapter 3.

Centera architecture This section presents a brief description of the Centera architecture and the basic components required to enable Centera as an archiving destination.

Centera is built upon a no-single-points-of-failure Redundant Array of Independent Nodes (RAIN) architecture. The base Centera is configured with four nodes and can be expanded in two-node increments configured as storage nodes or access nodes. Each storage node contains processing power, up to 4.0 TB raw storage capacities, and is interconnected with all other nodes in the cluster through a private local area network. Each node executes its own instance of the CentraStar® operating environment. Two access nodes are recommended per four-node/eight-node Centera as a minimum configuration. Adding additional access nodes is contingent upon how the application is driving the Centera. Access nodes are configured with Centera Viewer and they are configured one at a time.

FMA utilizes the Centera SDK to interact with Centera storage. When archiving to Centera, the File Management appliance connects and authenticates with a specific username and password, a PEA file, or anonymously, depending upon the credentials supplied when the Centera is set up in FMA. A minimal amount of configuration is required before an application can start archiving data to the Centera. You must ensure that access nodes are configured and available on the external network, they are accessible by the application servers, and either an anonymous profile is enabled to allow 'anonymous' access or PEA files are generated for application to authenticate to Centera and start writing data.

When configuring a Centera in File Management appliances, you must supply the following:

♦ A unique name to identify a particular pool of storage on a specific Centera cluster

♦ The IP addresses of all access nodes

♦ PEA file for authentication (optional)

20 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 21: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Reference Architecture

♦ Username and password for authentication (optional)

Note that you can configure the same Centera cluster multiple times in FMA using different names. This is common when FMA is used to archive data into multiple storage pools on a single Centera cluster.

File Management Appliance architecture The File Management Appliance provides the ability to archive and recall data, perform orphan file management, and stub file recovery. Additionally, it features a robust reporting interface which provides valuable insight into the efficacy of archiving policies.

The File Management High Availability (FMA-HA) appliance complements an existing File Management Appliance by adding high-availability capabilities when recalling archived data to primary storage. FMA-HA appliances can only be used for recalling data and cannot be used for archiving, stub file recovery, or orphan file management.

High availability and failover The validated solution provides high-availability and failover for each of the components in the configuration depicted in Figure 1. This section describes the high-availability and failover scenarios tested in the validated solution. Additional failover information can be found in Solution Best Practices section of this paper.

Primary Celerra failure

The Celerra can have multiple Data Movers to provide high availability and load balancing. In this solution, primary and standby Data Movers provide seamless failover capabilities for the Celerra storage. The RAID disk configuration on the Celerra back end provides protection against hard disk failure.

Should the entire Celerra system fail, Celerra Replicator software will failover the production Celerra to the disaster recovery Celerra. In the validated solution, Celerra Replicator is configured to replicate the source file systems to the disaster recovery Celerra every 30 minutes. The clients will be redirected to the disaster recovery Celerra.

With Celerra Replicator, users can set granular Recovery Point Objectives (RPOs) for each of the objects being replicated, allowing business compliance service levels to be met especially when scaling a large NAS infrastructure.

Celerra Replicator can maintain up to 1024 replication sessions. Administrators can protect extremely large IP storage deployments or have finer granularity in segmenting their information based on RTO/RPO requirements.

Users can implement policy-based, adaptive QoS by specifying a schedule of times, days, and bandwidth limits on the source to destination IP network interconnects.

Celerra Replication supports 1 to N replication and cascading. With 1 to N replication, an object can be replicated from a single source to up to four remote locations. With cascading, replication allows a single object to be replicated from a source site (Site A) to a secondary site (Site B) and from there to a tertiary site (Site C). Cascading replication is typically used in a multi-tiered disaster recovery strategy. The first local hop is used to allow operational recovery with a short Recovery Point Objective where there may be network locality to allow a local office to actually run applications from the Celerra located at the secondary location. Typically, RPOs will be on

Rainfinity File Management Appliance EMC Celerra NS Series 21

Solution Guide

Page 22: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Reference Architecture

the order of minutes. The tertiary location is used for major and wide reaching disaster scenarios and protection of the local disaster recovery site and RPOs from the secondary site to this tertiary site would be on the order of hours.

To achieve the best possible performance when recalling files from the Centera, the active Centera is reconfigured to be the disaster recovery Centera. The disaster recovery FMA is used for recalling files.

Primary Centera failure

Centera asynchronous replication is the process whereby data is automatically copied from a source cluster to a replica cluster. In case an entire cluster becomes inoperable, the application can failover its operations to the replica cluster until the source cluster is fixed. If the production site Centera fails, data will be recalled from the disaster recovery Centera. In this scenario, the active Celerra is the production Celerra and the active FMA is the production site FMA.

Replication complements Content Protection Mirrored (CPM) and Content Protection Parity (CPP) by putting copies of data on another cluster. C-Clips written to a source cluster will first be queued on the source cluster before they are written to the replica cluster (asynchronous replication). It is possible to replicate all data on a cluster or only data belonging to one or more pools (selective replication).

File Management Appliance failure

The validated solution includes an FMA-HA at the production and disaster recovery sites. When an FMA-HA is deployed alongside an FMA, it leverages the underlying APIs of Celerra to create a highly available environment for data recall. Should the FMA fail, the FMA-HA will be used to recall data from the Centera secondary storage. When the secondary storage is a Celerra using FileMover, the FMA and FMA-HA are not required to recall archived data.

Should the entire production site become unavailable, the active FMA will be the disaster recovery FMA.

Hardware and software resources

Hardware resources

The Rainfinity File Management Appliance on EMC Celerra NS Series solution uses the hardware resources listed in Table 2.

Table 2 Hardware specifications

Equipment Quality Configuration Dell 1750 servers Three

• Dual 2.79-GHz Xeon processor

• 2 GB of memory

• One 36 GB 10k internal SCSI disk

• Two onboard 10/100/1000 MB Ethernet NICs

22 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 23: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Reference Architecture

Equipment Quality Configuration Gigabit Ethernet switch

Three • VLAN support

• Optional jumbo frame support

• Optional LACP or EtherChannel support

Celerra NS40G Network Server

Two (one for each site)

• Two Data Movers

• DART OS 5.6

• Four GbE network connections per Data Mover

• One Control Station CLARiiON CX3-80 Two

(one per NS40G) • One FC shelf

• One SATA shelf

• (Five 750 GB ATA disks)

• RAID 5 RAID Group FMA Two

(one for each site) Six 160 GB (7.2k) SATA disks

FMA-HA Two (one for each site)

Six 160 GB (7.2k) SATA disks

Centera Two (one for each site)

Four access nodes

Rainfinity File Management Appliance EMC Celerra NS Series 23

Solution Guide

Page 24: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Reference Architecture

Software resources

The list of software resources used for the Rainfinity File Management Appliance on EMC Celerra NS Series solution is shown in Table 3.

Table 3 Software specifications

Software title Software revision Celerra NS40G DART OS 5.6 Centera 3.1.1 FMA & FMA-HA 7.2.2 Celerra Replicator V2

24 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 25: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Chapter 3 Solution Best Practices

This chapter presents these topics:

File Management environment...................................................................................................... 26 Archiving and recalling data ......................................................................................................... 35 Implementing an archiving strategy .............................................................................................. 41 Managing archived data ................................................................................................................ 50 Disaster Recovery—High availability and failover....................................................................... 53 FMA Backup................................................................................................................................. 74 The FMA database ........................................................................................................................ 74

Rainfinity File Management Appliance EMC Celerra NS Series 25

Solution Guide

Page 26: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

This chapter discusses the best practices for running Rainfinity File Management Appliance using a Celerra as the source data server and Centera as the secondary storage for archival of source data. The topics covered include recommendations for configuration, installation, and usage of the following areas:

♦ File Management environment

♦ Archiving and recalling data

♦ Implementing an archiving strategy

♦ Managing archived data

♦ Disaster recovery—high availability and failover

File Management environment This section describes the File Management environment and the best practices for configuring the FMA, the Celerra, and the Centera.

Appliances of the File Management product line

The File Management product line consists of two appliances. The capabilities and features available on these appliances differ and one or more of each type may be deployed to create a full solution.

The File Management Appliance is the foundation of every deployment. It provides the full range of features available from the product line and includes the ability to archive and recall data, perform orphan file management and stub file recovery. It features a robust reporting interface which provides valuable insight into the efficacy of archiving policies. A File Management Appliance is created by loading the fm_clean ISO image onto an EMC-supplied File Management hardware platform.

The File Management High Availability (FMA-HA) appliance complements an existing FMA by adding high availability and load balancing capabilities when recalling archived data to primary storage. FMA-HA appliances can only be used for recall and cannot be used for archiving or orphan file management. An FMA-HA appliance is created by loading the fmha_clean ISO image onto an EMC supplied FMA-HA hardware platform.

An optional method for achieving high availability is to leverage the Rainfinity Global File Virtualization appliance.

Configuring the File Management Appliance

Software requirements

The software versions of all appliances must match. Thus, you should not run a configuration where the FM appliance is running 7.2b43 and the FMA-HA appliance providing high availability is running 7.2b44.

Network requirements

The File Management technology is IP based and does not have any non-standard networking requirements. As with any client that needs to connect to a Celerra or Centera system, the FMA

26 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 27: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

and FMA-HA appliances must be able to establish IP connectivity to the systems with which they will interact.

Specific sites may have performance requirements for archiving and recall speed. In such cases it is recommended to ensure that the FMA has an appropriate amount of bandwidth between itself and the primary and secondary storage tiers. In addition you will want to ensure that the level of network latency produces acceptable performance. As a best practice, Rainfinity FMA and FMA-HA appliances should be deployed at the same site as the primary and secondary storage tiers that will be part of the HSM solution and they should be connected to a switch that is close to the storage systems whenever possible.

Note: The word close in the previous sentence is meant to imply that there are relatively few switch/router hops separating the devices.

Specific care should be taken when the File Management appliances are not at the same site as the primary and secondary storage tiers. In such scenarios you should evaluate the WAN performance carefully to determine if it will meet your performance requirements. Customers interested in such a deployment should submit an RPQ for solution validation.

Configuring the FM appliance

When the FMA or FMA-HA is initially installed, a setup utility will prompt you to configure each of the network interfaces. Subsequently, if the interface configuration requires changing, the rfhsetup utility is launched when logging in to the FM appliance CLI using a utility such as PuTTY. This utility will allow you to configure each of the network interfaces of the FM appliance.

Note: If additional network interfaces are added to the machine after the initial ISO installation occurs, the rfhsetup utility will not allow you to configure them until you have run the command touch /etc/sysconfig/network-scripts/ifcfg-ethX where X is the interface name you have added.

Link aggregation is not currently supported for FM appliance network interfaces.

Configuring the FMA-HA appliance

The network configuration process for an FMA-HA appliance is identical to that of an FM appliance.

Configuring Celerra Data Movers as an archiving source

Required ports and FileMover settings

In order to archive any data from a Celerra Data Mover, File Management appliances will require access to the FileMover API (TCP port 5080).

To archive NFS data, File Management appliances require:

♦ Portmap v2 RPC server (TCP port 111)

♦ Mount v3 RPC service

♦ NFS v3 RPC service

Rainfinity File Management Appliance EMC Celerra NS Series 27

Solution Guide

Page 28: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

♦ NLM v4 RPC service

♦ Root and read/write export permissions for all NFS data that will be archived

To archive CIFS data, File Management appliances require:

♦ SMB over NetBIOS (TCP port 139)

Direct command line access to the Celerra Control Station is not used by File Management appliances.

When configuring a Celerra Data Mover in FMA, you will need to supply:

♦ IP addresses of all the Data Mover network interfaces

♦ Credentials for a FileMover API user

♦ Credentials for local administrator access through CIFS (for CIFS archiving only)

♦ The NetBIOS name of the Celerra (for CIFS archiving only)

To create the FileMover API user, you need to log in to the Celerra Control Station CLI as root and run the command:

/nas/sbin/server_user <movername> -add -md5 -passwd <user>

Allow the IP addresses of the FM appliance to open connections to the FileMover interface. While logged in to the Celerra Control Station as an administrator, run the following command for all IP addresses of the Rainfinity appliances that will perform archiving or service recall requests for the Data Mover:

server_http <movername> -append dhsm -users <user> -hosts <ip_address>

Note that a single Celerra Data Mover can be configured in multiple FM appliances as an archiving source but more than one FM appliance should never be used to archive data from a single file system.

Prepare the file systems for archiving by enabling DHSM (FileMover) for specific file systems. Log in to the Celerra control station and enter the following command:

fs_dhsm -modify <primary_fs> -state enabled

Create one or more connections from the Data Mover to the secondary storage location(s) for each file system that will be archived. When archiving any type of data to Centera CAS, recall requests will flow from the Data Mover to FMA. FMA will then service the recall by reading the data stored on Centera and writing it back to primary storage. To create the connection, log in to the CLI of the Celerra Control Station and run the command:

fs_dhsm -connection <primary_fs> -create -type http -secondary

'http://<fqdn_of_fm_appliance>/fmroot' -httpPort 8000 -cgi n

28 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 29: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

When archiving CIFS data to Celerra, you need to archive to a CIFS repository configured in FMA. To create the connection, log in to the CLI of the Celerra Control Station and run the command:

fs_dhsm -connection <primary_fs> -create -type cifs –admin ‘<fqdn>\<domain_administrator>' –secondary '\\<fqdn_of_secondary_server>\<repository_path>' -local_server <local_cifs_server>

When archiving NFS data to Celerra, you need to archive to an NFS repository configured in FMA. To create the connection, log in to the CLI of the Celerra Control Station and run the command:

fs_dhsm -connection <primary_fs> -create -type nfsv3 –secondary

‘<secondary_server>:/<repository_path>’ -proto TCP -useRootCred

Note: The FQDN, or fully qualified domain name, is case sensitive.

When archiving to Centera, FMA is configured with Centera as an archiving destination and a connection string consisting of Centera’s access node IP addresses establishes the relationship between the FMA and the Centera.

Note: FLR-enabled file systems cannot be used as an archiving source.

FMA and FMA-HA appliances that need to service Celerra recall requests for data stored on Centera will need to run the celerracallback service. If the celerracallback service on the appliance has never been initialized, you should connect to the appliance through a utility such as PuTTY and run the following command:

/opt/rainfinity/filemanagement/bin/ccdsetup init_rffm

You will be prompted as follows:

By default the Celerra Callback Daemon will connect to the File Management service on the local machine.

Do you wish to configure another File Management machine? (y/n)

If the command is run on the FMA-HA, enter Yes, and enter the IP address of the primary FMA.

If the command is run on the FMA, enter No.

Rainfinity File Management Appliance EMC Celerra NS Series 29

Solution Guide

Page 30: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Additionally, entries must be added to the local host file of each Celerra Data Mover hosting a primary file system to direct the Data Mover to use the FMA servers at the local site. The Data Mover hosts files must contain the IP address and hostname for the FMA and FMA-HA at the production and disaster recovery sites. The following commands are provided as an example of how to add entries to the Data Mover hosts file.

Log in to the production site Celerra Control Station as the nasadmin user and type the following:

$ server_file server_2 –get hosts hosts

This command will retrieve the hosts file from the Data Mover and place it on the Control Station.

Next, use the vi command to enter the IP address and hostname for the production and disaster recovery FMA and FMA-HA.

$ vi hosts

# <IP address> <hostname>

10.0.0.100 rainccd.domain # HTTP server 1 - Production site FMA

10.0.0.101 rainccd.domain # HTTP server 2 - Production site FMA-HA

10.0.0.200 rainccd.domain # HTTP server 1 - Disaster Recovery site FMA

10.0.0.201 rainccd.domain # HTTP server 2 - Disaster Recovery site FMA-HA

When the hosts file has been modified, enter the following command to replace the hosts file on the Data Mover with the modified hosts file from the control station.

$ server_file server_2 –put hosts hosts

Repeat the above process for the disaster recovery Celerra. Log in to the disaster recovery site Celerra control station as the nasadmin user and enter the following:

$ server_file server_2 –get hosts hosts

This command will retrieve the hosts file from the Data Mover and place it on the Control Station.

Next, use the vi command to enter the IP address and hostname for the disaster recovery and production FMA and FMA-HA.

$ vi hosts

# <IP address> <hostname>

10.0.0.200 rainccd.domain # HTTP server 1 - FMA

10.0.0.201 rainccd.domain # HTTP server 2 – FMA-HA

10.0.0.100 rainccd.domain # HTTP server 1 – FMA-DR

10.0.0.101 rainccd.domain # HTTP server 2 – FMA-HA DR

30 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 31: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

When the hosts file has been modified, enter the following command to replace the hosts file on the Data Mover with the modified hosts file from the control station.

$ server_file server_2 –put hosts hosts

FMA settings for a source Celerra

Configure the FMA at the production and DR sites to archive data from the production Celerra:

1. From the Rainfinity Home Page, select File Management and then click the Configuration tab.

2. From the Server Configuration options, select File Servers as shown in Figure 3.

Figure 3 FMA Configuration

3. Click New to create a new source file server as shown in Figure 4.

Rainfinity File Management Appliance EMC Celerra NS Series 31

Solution Guide

Page 32: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 4 Create New File Server

4. From the File Server Properties window, select Celerra as the source for archiving, as shown in Figure 5.

Figure 5 Select Celerra as the File Server

32 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 33: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

5. The Celerra Properties window will be displayed as shown in Figure 6.

Figure 6 Celerra Properties

6. Type the name of the Celerra CIFS Server in the Name field. If archiving multiple CIFS servers, repeat this procedure, starting with step 1, for each CIFS server.

7. Enter the DART version in DART version field.

8. Add the Celerra interface IP address that corresponds with the CIFS server. For load balancing and increased bandwidth, you must add the IP address of additional Celerra interfaces connected to the network.

9. If the CIFS server is mounted on a VDM, the Virtual Data Mover box must be selected.

10. Update the CIFS Specific settings with the domain name, domain username, and password.

11. Select Celerra as the archiving source.

Rainfinity File Management Appliance EMC Celerra NS Series 33

Solution Guide

Page 34: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

12. In the FileMover Settings, enter the DHSM username and password.

13. Enter the hostname of the FMA devices as recorded in the Data Mover hosts file. The FMA hostname should be identical for each FMA device in the configuration.

14. Click Commit.

The advantage of configuring the Callback agent is to keep the Rainfinity FMA device in the callback path. The recall process is triggered by read and write access to archived files. During archiving of a CIFS share or NFS export, FMA is used to convert file data on primary storage to stub data. Data archived to Centera require the involvement of FMA/FMA-HA appliances during the recall as Data Movers do not currently support the Centera SDK. Celerra Data Movers will open an HTTP connection to an FMA-/FMA-HA appliance (TCP port 8000) and perform read operations. The appliance will then translate those read requests to retrieve the data using the Centera SDK and pass it back to the Data Mover.

Configuring Celerra Virtual Data Movers as an archiving source

All of the previous information regarding Celerra Data Movers also applies to Virtual Data Movers and includes file systems mounted on VDMs through NFS. However, NFS export management for such file systems must be done through the physical Data Mover hosting the VDM itself.

Configuring Centera as an archiving destination

FMA settings for archiving to Centera

Configure the FMA at the production and DR sites for archiving to a Centera.

1. From the File Server List, as shown in Figure 4, select New.

2. Select Centera as the destination for file archiving as shown in Figure 7.

Figure 7 Centera File Server

After selecting Centera as the destination File Server, the Centera Properties window will be displayed as shown in Figure 8.

34 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 35: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 8 Centera Properties

3. In the Name field, enter a name that will be common to the production and DR Centeras.

4. In the Connection String field, enter the IP address of the Centera Access Nodes at the production site. When configuring the FMA at the DR site, enter the Centera Access Nodes for the DR Centera.

5. Select the desired authentication. In order to archive data to Centera, the FM appliance must have access to a storage pool. Centera clusters support three methods of controlling access to a storage pool. The PEA file, credentials, or anonymous access details are passed to FMA-HA appliances that will be responsible for processing recall operations.

6. Click Commit to save the configuration.

Archiving and recalling data FMA Software Architecture

The FM appliance is installed with software referred to as the File Management Application (FMA). The FMA consists of a Linux-based operating system with specialized software installed.

Aside from the operating system, the core component of FMA is the File Management Daemon (FMD), a process which is part of the group of components referred to as the filemanagement service. The FMD accepts input through an XML-RPC interface from a handful of sources including the CLI, GUI, and other processes running on the system.

Rainfinity File Management Appliance EMC Celerra NS Series 35

Solution Guide

Page 36: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

NOTE: The FMD does not monitor the CLI and GUI for events. The CLI and GUI components send requests to the XML-RPC interface of the FMD.

Archiver System Architecture

The archiver is a piece of software which will only run on an FM appliance. It is responsible for carrying out archiving tasks. FMA-HA appliances are not capable of archiving files and cannot run this piece of software.

The archiver is a component of the filemanagement service which is spawned and controlled by the FMD. The archiver itself is broken down into two major components, the filewalker and a pool of archiving threads.

When an archiving task is run, the FMD creates a thread which spawns and manages an archiver process. The archiver will instruct the filewalker to collect and analyze metadata from files within the archiving source. Filewalking threads will use CIFS or NFS operations based upon the protocol specified for the archiving task. Filewalking threads compare the file metadata to the archiving policy and note files which should be archived by creating an entry in a queue in the appliance memory. Archiving threads monitor the queue and are responsible for carrying out an archiving process detailed in the later sections of this guide.

When multiple archiving tasks are running concurrently, they will compete for the threads in these two pools. The FMA software is designed with performance in mind, and the entire system resources can be dedicated to the quick completion of an archiving task provided there are no other bottlenecks in an environment. Archiving performance can typically be increased by running multiple archiving tasks concurrently. This is a delicate balancing game and an administrator should monitor the CPU and memory resources of the appliance and other environment equipment closely when running concurrent tasks. As a general best practice, care should be taken to prevent the concurrent execution of more than five tasks (of any type: archiving, data collection, stub scanner, or policy preview).

Note that files and data streams smaller than 8 KB will not be archived unless the minimum file size override option is enabled. From the File Management Appliance console, enter the following command to allow files smaller than 8 KB to be archived:

rffm minFileOverride [--Enable=ENABLE]

This rule prevents FMA from archiving data to secondary storage and then replacing the data on primary storage with a stub file that takes up the same number of blocks on disk as the original.

Archiving a Celerra source

Celerra uses the FileMover API which is a purpose-built interface designed to facilitate Distributed Hierarchical Storage Management (DHSM). FileMover provides a range of API calls that standardize the creation and management of stub files. This ability to provide a standard interface for controlling stub files and a standard format for stub data makes the FileMover interface and DHSM architecture well suited for archiving purposes.

Authentication and authorization

In order to archive data from CIFS shares, FMA requires CIFS administrative access over Data Movers and Virtual Data Movers that will be the source of archiving tasks. When configuring a Data Mover/Virtual Data Mover in FMA you need to supply CIFS credentials for a user in the local Administrators group.

36 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 37: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

FMA uses NTLMv2 for authentication when opening CIFS connections. It does not currently support Kerberos. Kerberos support will be added to FMA in a future release. Clients can connect to primary storage using any authentication method supported by the file server, including Kerberos.

The FMA does not need write access to the Celerra production file system in order to archive data from it or for the recall of data to work. However, the FMA does need read/write access to the Celerra file system in order to perform stub file re-creation. In this case the IP addresses of the FM appliance network interfaces must be given root and read/write permissions to the production file system on the Celerra. For CIFS, the user credentials supplied when a Celerra is configured in FMA are passed to the other FMA-HA appliance which services recall requests.

FMA also requires access to the FileMover interface of the defined Celerra Data Movers. Access is obtained using the credentials from the server_http <movername> -append dhsm -users <user> -hosts <ip_address> in Configuring Celerra Data Movers as an archiving source. Note that Celerra Virtual Data Movers (VDM) do not have a FileMover interface and FM appliances will connect to the host Data Mover FileMover interface.

Some of these requirements are checked when an administrator first attempts to define a server and an error will be returned to inform the administrator that nothing was added to the FMA configuration. Some problems cannot be detected until an archiving or data collection task is run or until a file needs to be recalled. As a best practice, before archiving any file from a production file system, you should create, archive, and recall a test file on the same file system. This will allow you to detect configuration problems that prevent data recall without impacting users in the production environment.

You should also note that when archiving to Celerra, Celerra Data Movers and VDMs will recall archived data directly from NAS repositories using CIFS and NFS. When preparing a Celerra for archiving, you need to supply CIFS credentials that will be used to recall archived data. You will also need to add the IP addresses of all Celerra Data Mover/VDM network interfaces to the read export permission list for NFS NAS repositories.

Overview of the archiving process

Delay stubbing

Rainfinity guards against DU situations at the location where FMA archiving is run by completing the write to secondary storage before the file is replaced by a stub on primary storage. Because replication between Centera devices can be delayed, especially in large environments and those where locations are separated by great distances, there is a risk that replication of a stub on primary storage could complete before replication of the file on secondary completes. Delayed stubbing resolves this issue by splitting the archiving operation into two passes. In the first pass, the file is archived to Centera and the action is logged in the FMA, but the file on primary storage is not stubbed. On a future pass, when the delay interval has been reached, the file on primary storage is replaced with a stub.

When a file is archived to an EMC Celerra, it is immediately stubbed on the source Celerra. Delay stubbing is not supported when archiving to a Celerra. This feature will be available in a future release. To guard against potential data loss when using a Celerra as the archiving target, consider data protection options such as replicating the archiving Celerra to a disaster recovery Celerra after an archive session has completed. You could also backup the archiving Celerra using a Celerra NDMP backup application.

Rainfinity File Management Appliance EMC Celerra NS Series 37

Solution Guide

Page 38: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Archiving threads perform the following actions when delayed stubbing is disabled:

1. An exclusive lock is obtained on the source file to be archived.

2. The source filename and CIFS DACL or NFS owner ID, GID, and mode bits are read.

3. The source file data is read.

4. Alternate data streams are read (CIFS Only).

5. The timestamps of the source file are read.

6. The FileMover API is used by FMA to convert the source file into a stub file and this also results in the CIFS offline bit being set (but not directly by FMA).

7. An entry is inserted into the FMA database to record the success of the archiving operation.

With delayed stubbing enabled, an archiving thread will query the FMA database to determine if a file matching the archiving policy also matches the stubbing policy. The stubbing policy classifies files into three categories based upon the last modified timestamp, file size, filename, and location within the source share/export. The FMA database is searched for an entry which matches these attributes, indicating that the files have already been copied to the secondary storage location.

If no matching entry is present in the FMA database, this version of the file has never been copied to the secondary storage location and the following sequence of events will occur:

1. An exclusive lock is obtained on the source file to be archived.

2. The source filename and DACL or owner ID/GID/mode bits are read.

3. The source file data is read.

4. (CIFS Only) Alternate data streams are read.

5. The timestamps of the source file are read.

6. An entry is inserted into the FMA database to record the date/time when the file data was originally copied to the secondary storage location.

If a matching entry exists in the FMA database, it will be associated with a timestamp that indicates the date on which the file was originally copied to the secondary storage location. If the time difference between the system clock of the FMA appliance and the timestamp exceeds the delayed stubbing period, the following sequence of events will occur to stub the file:

1. An exclusive lock is obtained on the source file to be stubbed.

2. The FileMover API is used by FMA to convert the source file into a stub file and this also results in the CIFS offline bit being set (but not directly by FMA).

3. An entry is inserted into the FMA database to record the success of the stubbing operation.

38 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 39: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

If a matching entry exists in the FMA database and the time difference does not exceed the delayed stubbing period, then no further action will be taken.

Overview of the recall process

The process of recalling files is triggered by read and writes to files which have been archived. During archiving of a CIFS share or NFS export, the FileMover API is used to convert file data on primary storage to stub data. The stub data indicates the protocol and the unique path or ID that should be used by the Data Mover to access the archived data. When a Celerra is used as the secondary storage for archiving, file data archived to CIFS shares or NFS exports on a Celerra will be read back directly by the Celerra Data Movers when a recall is triggered without the involvement of FMA, FMA-HA appliances. Data archived to Centera require the involvement of FMA/FMA-HA appliances during recall as Data Movers do not currently support the Centera SDK.

Entries must be added to the local host file of each Celerra Data Mover hosting a primary file system to direct the Data Mover to use secondary servers at the local site. The Data Mover hosts files must contain the IP address and hostname for the FMA and FMA-HA at the primary and disaster recovery sites.

Celerra Data Movers open an HTTP connection to an FMA, FMA-HA appliance when data must be recalled from Centera storage. The Data Mover will establish the connection by resolving a fully qualified domain name and connecting to the IP addresses returned from a query of the Data Mover hosts file. The IP addresses of multiple FMA, FMA-HA can be returned in response to the IP address query from the Data Mover, thus allowing for high availability.

If the file has been archived to Centera storage, the Data Mover will perform a query to the Data Mover hosts file to resolve the IP address of an FMA, FMA-HA appliance. The Data Mover will then use HTTP to read the archived data from the appliance, which in turn reads it from Centera using the Centera SDK. If an appliance does not respond to the HTTP read requests, the Data Mover will use an alternate IP address of another appliance configured in the hosts file.

In response to read operations, Data Movers offer fine grained control over how file data is recalled if it is not stored locally and whether or not it will be saved back to primary storage. The option is referred to as the read policy and there are four methods for processing recall events:

None – The Full, Passthrough, or Partial method as indicated in the stub data, connection string, file system, or Data Mover configuration.

Full – All file data is copied back from secondary storage to primary storage.

Passthrough – File data is read from secondary storage and passed to clients without being written to primary storage.

Partial – Only the blocks of file data requested by clients are transferred from secondary storage to primary storage.

Note: Write operations always trigger a Full recall.

After a file has undergone a Full recall, the offline attribute will be unset by the Data Mover.

The FileMover API allows files from a single share/export to be archived to multiple secondary storage locations.

Rainfinity File Management Appliance EMC Celerra NS Series 39

Solution Guide

Page 40: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Archiving to Centera

Centera authentication and authorization

In order to archive data to Centera, the FM appliance must have access to a storage pool. Centera clusters support three methods of controlling access to a storage pool and a particular method is specified when the cluster is configured in FMA based upon a PEA file, user credentials, or anonymous access. The PEA file, credentials, or anonymous access details are passed to FMHA appliances that will be responsible for processing recall operations.

Centera archiving process overview

When archiving to Centera, the archiving threads utilize the Centera SDK to create C-Clips and blobs.

When archiving an NFS file, a blob is created to hold the file data and it is attached to a C-Clip. The clip has keys which hold the original last modified timestamp of the source file, owner UID, and privileged GID and mode bits.

For CIFS files, a blob is created to hold file data and it is linked to a C-Clip. The last modified timestamp is created as a key in the clip. The NTFS ACL and alternate data streams are created as embedded blobs inside the clip as long as they are less than 100 KB. When alternate data streams or the NTFS ACL exceed 100 KB, they are written into additional blobs and attached to the C-Clip.

Centera recall process overview

Data archived to Centera is always recalled by an FM/FMA-HA appliance using the Centera SDK. Stub files on primary storage contain the Centera Clip ID which uniquely identifies the object containing the archived file data.

Archiving to Celerra

Celerra authentication and authorization

In order to archive data to a Celerra CIFS share, FMA requires CIFS administrative access over Data Movers and virtual Data Movers that will be the destinations of archiving tasks. When configuring a Data Mover/Virtual Data Mover in FMA you will need to supply CIFS credentials for a user in the local Administrators group. These CIFS credentials will be passed to FMA-HA appliances which will service recall requests in order to allow those appliances to read data from the secondary storage.

FMA uses NTLMv2 for authentication when opening CIFS connections. It does not currently support Kerberos.

In order to archive data to a Celerra NFS export, the IP addresses of the FM appliance network interfaces will need to be given root and read/write access.

Some of these requirements are checked when an administrator first attempts to define a server and an error will be returned to inform the administrator that nothing was added to the FMA configuration. Some problems cannot be detected until an archiving or data collection task is run or until a file needs to be recalled. As a best practice, before archiving any file from a production file system, you should create, archive, and recall a test file on the same file system. This will allow you to detect configuration problems that prevent data recall without impacting users in the production environment.

40 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 41: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Celerra archiving process overview

Archiving threads perform the following actions when a file needs to be archived to a NAS repository on a Data Mover or a Virtual Data Mover:

1. A new file is created in the repository

2. Retention is set if the destination is a FLR-enabled file system using CIFS/NFS

3. The name of the file and its location in the repository is generated by an algorithm; the original name and location are not used

4. The CIFS DACL is set identically to that of the source file for a CIFS archiving task, the owner ID, GID and mode bits are set for NFS archiving

5. File data is copied from the source to the new file

6. Alternate data streams are copied from the source to the new file (CIFS only)

7. The last modified timestamp is copied from the source file to the new file

8. If the NAS repository was created on a file system with Celerra FLR-enabled, the retention period is set according to the archiving policy.

Celerra recall process overview

File data archived to a NAS repository on a Celerra Data Mover/Virtual Data Mover is recalled by the Data Mover using CIFS or NFS operations.

Multi-protocol interoperability

The archiving and recall architecture of File Management preserves file data. During archiving, file data and metadata is read using CIFS or NFS and stored on a secondary storage tier. During recall, only the file data is retrieved from secondary storage; the metadata is not used. In this respect, the File Management software supports multi-protocol data.

However, when using features such as stub re-creation, file security and last modified timestamps are restored to primary storage using the metadata stored on secondary storage. The saved metadata is single protocol and thus the File Management software does not fully support multi-protocol data. However, core archiving and recall features are fully functional.

Implementing an archiving strategy There are a number of common motives for implementing tiered storage but most are based on reducing the Total Cost of Ownership (TCO) of storage equipment while maintaining or increasing service levels and useable disk space. File archiving is one of the most common and efficient methods for implementing tiered storage, allowing individual files in a large dataset to transition to a storage tier that meets the unique performance requirements of particular stages of the information lifecycle.

The quality of an archiving strategy is often judged on how it affects users of primary storage tiers since file data reaching the end of its lifecycle is likely to be migrated/accessed elsewhere. Key factors include the number/percent of files a user must work with which have been archived and the impact to the average service response time for operations the user performs to those files. While the performance aspect is based mostly upon environmental factors, the number of archived files a user has to deal with is directly related to the quality of the archiving policies

Rainfinity File Management Appliance EMC Celerra NS Series 41

Solution Guide

Page 42: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

created by storage administrators. Good archiving policies qualify the probability that a file will be used in the future while bad archiving policies result in excessive amounts of data being recalled from secondary storage.

Good archiving strategies ensure that the performance impact associated with accessing the archived data results in the minimal impact to users of primary storage. File data should be kept on storage with appropriate performance levels based upon the stage in the information lifecycle that it has reached. As data transitions through the stages of the lifecycle, the performance requirements decrease.

The File Management technology allows administrators to model and then implement archiving strategies to create tiered storage systems and realize lower TCO. It can be used to:

♦ Classify files sprawled across a NAS environment into distinct datasets

♦ Manage the locations where individual file data is stored

♦ Shrink large datasets and file systems by migrating file data to alternate locations, hence decreasing backup window requirements

Using a fast and transparent archiving and recall architecture leads to benefits without causing significant impact to the performance experienced by end users.

Developing archiving policies

Classifying datasets

The key to designing a good archiving policy is being able to determine the probability that a file will need to be read or written over time. To be able to make this determination, we must define the boundaries of a dataset on the primary storage location and classify the files it contains. This may be an iterative process, one wherein while classifying the files in a dataset, it becomes apparent there is more than one distinct dataset, forcing the boundaries to be reset and the classification to begin again.

Classification of files in a dataset requires knowledge from two sources and goes hand-in-hand with the creation of archiving policies. First, the metadata maintained on the primary storage tier provides valuable information about the last time files were modified and accessed as well as file sizes, names, and locations in a dataset. Second, the end users of the storage can provide insight into how the dataset is used in general. This information will provide guidance to an administrator when designing archiving policies.

As an example, consider the NAS environment of a typical hospital. The first step in designing an archiving policy is to examine the layout of the existing primary storage tier, identify the contacts which have requested storage and determine how the end users are accessing their storage. During this step, you may find that various departments throughout the hospital are using NAS to store digital images of X-rays, MRIs, and CT scans. Other departments are using it to store copies of patient bills and payments or medical records. Still others are using the Celerra to hold general home directory data.

In further discussion with the contacts from the hospital it is discovered that the X-ray, MRI, and CT images are used for very different purposes and are typically needed for different lengths of time. Due to the ways in which the hospital applies X-ray technology, these images are typically required for the treatment of patients for about 6 months. However, MRI and CT images are only

42 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 43: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

required for treatment over a period of a few days. Due to regulatory requirements, the hospital must keep bills and payments on file for 5 years. However, once a bill has been paid, the finance department reviews the copy at the end of each month and the copy is not typically accessed again. The stored medical records are not needed unless a patient visits the hospital again and typically, only the last 2 years of records are required. However, because the immediate availability of patient records is critical, the hospital contact requests that only medical records older than 2 years be archived.

Given these conditions, a good archiving policy will need to account for one or more of the following:

♦ X-ray images must not be archived until they are 6 months old.

♦ MRI and CT images must not be archived until they are 2 weeks old.

♦ Copies of bills and payments must not be archived until they are 45 days old.

♦ Medical records must not be archived until they are 2 years old.

♦ Data in user home directories should not be archived if it has been accessed recently.

Note that the first four conditions are all based upon the length of time since a file was last modified and that the last condition is based upon when a file was last accessed. This is because the images and records affected by the first four conditions are static files which will not be changed over time, but the home directories contain dynamic files. Therefore, in designing an archiving policy for dynamic files, we assume that the more recently a file was accessed, the more likely it is to be accessed again soon. When designing an archiving policy for static files, we assume that as data ages it becomes less likely to be accessed again soon.

Metadata stored on Celerra will typically be required in order to translate the conditions listed above into an archiving policy. As an example, let us assume that the X-ray, MRI, and CT images are stored on a CIFS share of file system “fs1” inside a directory named “repository”. Filenames for X-ray images begin with the letter ‘X’, MRIs begin with the letter ‘M’, and CT images begin with the letter ‘C’. Copies of bills and payments are stored in the “scans” directory in the same share on “fs1”. Medical records are stored in the “mr” directory and home directories are stored in the “hd” directory on file system “fs2” which is both shared and exported. The medical records are written and accessed through NFS while the home directories are accessed through CIFS.

In this scenario, a CIFS archiving task would be developed for the “repository” directory on the “fs1” file system utilizing an archiving policy with the following rules:

♦ Archive a file if the name begins with “X” and the last modified timestamp is > 6 months old.

♦ Archive a file if the name begins with “M” or “C” and the last modified timestamp is > 2 weeks old.

♦ Or else do not archive the file.

A second CIFS task would be developed for the “scans” directory on the “fs1” file system utilizing an archiving policy with the following rules:

♦ Archive a file if the last modified timestamp is > 45 days old.

♦ Or else do not archive the file.

Rainfinity File Management Appliance EMC Celerra NS Series 43

Solution Guide

Page 44: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

An NFS task would be developed for the “mr” directory on the “fs2” file system utilizing an archiving policy with the following rules:

♦ Archive a file if the last modified timestamp is > 2 years old.

♦ Or else do not archive the file.

In order to prepare a task for the user home directories, we must qualify what it means for a file to be accessed “recently.” We define recent as a particular length of time since a file was last accessed and we then classify files in a dataset based on whether the last accessed timestamp falls within that length of time. As we increase or decrease the length of time we will select a larger or smaller percentage of the files in the dataset. With this premise in mind, we would develop a third CIFS task for the “hd” directory on the “fs2” file system using an archiving policy with the following rules:

♦ Archive a file if the last accessed timestamp is > some period of time.

♦ Or else do not archive the file.

Now we are faced with the challenge of quantifying some period of time. A common method for doing this involves choosing a percentage of the total number of files or total amount of file data within a dataset that should be archived and then adjusting the length of time used in the policy to reach the exact percentage. This method utilizes data collection tasks and previews and is discussed elsewhere in this guide. This method does not necessarily result in optimal performance for users or maximum reduction in TCO. However, for very dynamic datasets (any file data can change or be accessed at any time with no discernable trends) this may be the preferred method for the sake of simplicity.

A more advanced method involves the analysis of the last accessed times of all files within a dataset in order to gain a more accurate understanding of how the age of a last accessed timestamp correlates to the likelihood that a file will be used again in the future. For typical datasets there is an exponential relationship between the age of a last accessed timestamp and the probability of future use of a file.

An example of a dataset with this type of exponential relationship would be digital copies of newspapers and other periodic publications such as magazines. There may be many users who access the latest publication of a newspaper or magazine. However, when a publication is released for a new period, nearly all users will cease accessing the old publication (who reads yesterday’s newspaper?). Some users will continue to access the old publication (perhaps to catch a story they missed) but the number of those users will steadily decrease. As a counterpoint, certain publications become significant and even as the content ages, they are still referenced heavily by users. This is an example of information lifecycle.

Given such a dataset, we can be very precise and design an archiving policy that balances both the age of a file and how recently it was accessed. As an example, consider a repository of magazines where it is likely that users will want to access any issue published in the last 12 months but it is very unlikely that a user will access issues older than 12 months. We may design an archiving policy with this logic:

♦ Do not archive a file if the last modified timestamp is < 12 months old.

♦ Archive a file if the last accessed timestamp is > 1 month old.

♦ Or else do not archive the file.

44 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 45: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Performance requirements

The users of a Celerra system typically have some form of performance requirement. As an example, a user trying to read back a video file encoded at 256 KB/s from Celerra is unlikely to accept a data transfer rate of 5 KB/s. However, if archiving is not applied correctly, this could be the end result.

The key to meeting performance requirements is to ensure that data is stored on an appropriate tier of storage based upon its progress through the information lifecycle. Referring back to the example of a repository of newspapers, you would never want to store today’s paper in a location that will take users a long time to reach as this will cause a significant cumulative impact.

Archiving policies should be developed to select files at particular stages of the information lifecycle and place them on appropriate storage tiers. You should determine the performance requirements for data that will be selected as you develop the archiving policy. Keep in mind the following considerations:

♦ Will the disk storage provide sufficient performance when clients need to access archived data?

♦ Will the primary and secondary storage servers and FM/FMA-HA appliances have sufficient system resources (CPU, RAM) to provide the required performance levels for recall operations?

♦ Will the network environment separating primary and secondary storage servers and FM/FMA-HA provide sufficient bandwidth? Will latency cause an excessive negative effect?

Creating file matching expressions

A File Matching Expression (FME) is one or more conditions used by an FM appliance when data is being archived. A statement in an FME consists of an attribute, an operator, and a value. Supported attributes by the FM policy engine are the CIFS/NFS last accessed timestamp and last modified timestamp, as well as the size of file data and the format of a filename.

Operators are applied against the specified file attribute and compared against a value in order to return a true or false value. As an example, operators that can be applied to file size are greater than (>), less than (<), greater than or equal to (>=) and less than or equal to (<=). The value supplied when evaluating the file size attribute is a number of bytes. This allows rules to be crafted that select files of varying ranges of sizes.

These same operators can be applied to the last accessed and last modified timestamps but the value will be applied as a period of time, allowing rules to be crafted that select files that have not been accessed or modified during a period of time.

When using the filename attribute, operators allow exact filenames to be specified (“equals”) or compared to a regular expression (“matches regex”).

An FME can consist of one or more statements. When multiple statements are part of a single FME, the statements will be logically combined using the AND operator. Therefore, the evaluation of an FME will be true if and only if all statements within the FME evaluate to be true.

Rainfinity File Management Appliance EMC Celerra NS Series 45

Solution Guide

Page 46: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Archiving policies and rules

An archiving policy consists of a collection of ordered rules and an archiving destination. When creating archiving tasks, an archiving policy is applied against a source dataset to define which files are to be archived and to what location.

A building block of an archiving policy, a rule, consists of an FME and whether or not a file which matches the rule should be archived. Administrators will define a number of rules and then order them to build an archiving policy. When a policy is applied to a source dataset by running an archiving task, files will be compared to the rules in order. As soon as a file matches the FME of a rule, it will not be checked against any further rules and the indicated action will be taken to either archive the file or skip it. When a file does not match any FME in the rule base, the default action will be taken and the file will not be archived.

As an example, consider the following rule base:

1. Archive files that are larger than 1 MB and the last accessed timestamp is > 3 months old.

2. Archive files that are larger than 512 KB and the last accessed timestamp is > 4 months old.

3. Do not archive files that have a last accessed timestamp < 5 months old.

4. Archive files that are larger than 256 KB.

5. <Implied> Do not archive all files.

If we applied these rules against a file that was 600 KB and has not been accessed for the last 4.5 months, the file would not match rule 1 because it is not larger than 1 MB, it would match the conditions of rule 2 and be archived. Note that rule 3 states that no file with a last accessed time of less than 5 months should be archived. However, the file matched a previous rule and rule 3 was never applied.

In addition to rules and a destination, an archiving policy allows you to specify a retention period (in days, weeks, months, or years) that will be set when archiving to a Celerra FLR-enabled file system or a Compliance Edition version of Centera CentraStar. If archiving to Centera, you can also specify a delayed stubbing period (in days) that will allow file data migrated to secondary storage to be backed up/replicated prior to removing the file data from primary storage.

Data collection tasks and previews

FMA allows administrators to simulate the effects of applying an archiving policy against a dataset to determine how many files, how much file data, and which specific files would have been archived. This allows administrators to judge the efficacy of an archiving policy and decide whether it will provide the desired results.

It is important that the statistics and information provided by this feature be interpreted carefully. A typical goal when implementing an archiving policy is to reduce file data on primary storage by a particular percent and it would be very simple to design a policy that meets this requirement. However, the quality of an archiving policy is in its ability to meet this requirement by archiving only the least utilized file data.

Depending upon the design of the archiving policy and the type of data being archived, the overall statistics for number of files and bytes of file data to be archived may not be most relevant for determining efficacy. For well-classified datasets, administrators will typically want to analyze the list of individual files which would have been archived. In either case, this is typically an iterative process in which the archiving policy is modified and a new preview is generated until the desired results are achieved.

46 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 47: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

When using FMA, simulating the effects of applying an archiving policy to a dataset is referred to as a preview. A preview is logically very similar to an archiving task with two particular differences. First, previews identify whether a file should be archived in an identical manner to an archiving task, however a preview will not archive any data. Second, previews apply a policy against file metadata which has been saved to the FM database as a point-in-time copy whereas archiving tasks query Celerra servers using CIFS/NFS. It is important for administrators to note that policies are being applied against a point-in-time view of file metadata and when a policy is actually applied it may not produce identical results to the preview. An example would be a policy that archives files not accessed for 3 months and after generating the preview a user accessed a very old file which will prevent the file from matching the particular rule within the policy.

In order to generate a preview, file metadata must first be collected for a dataset and saved to the FM database. This process is performed by data collection tasks. Similar to archiving tasks, a data collection task will read file metadata for a source dataset and can be scheduled to run periodically. Data collection tasks consist of a source path on a Celerra server and a protocol that will be used to read file metadata (CIFS or NFS).

Note that the creation of preview is not instantaneous. When simulating the effects of applying a policy to a large dataset, it may take multiple minutes to generate a preview. Particular care should be taken to generate previews at times when there are no other multiple tasks already running such as the stub scanner, data collection, and archiving tasks. Since all types of tasks must utilize the same database and FM system resources, particular care should be taken when scheduling tasks and executing previews.

Creating and monitoring archiving tasks

Creating and running a new archiving task

An archiving task in FMA applies a policy to a source dataset and results in selected files being archived to a secondary storage tier. Creating an archiving task is simple and requires the specification of a source dataset, an archiving policy, and a schedule for running the task. Tasks can be run as soon as they are submitted or run on a periodic schedule. After a task is submitted it can be run at any time without the need to wait for a scheduled time to pass.

Rainfinity File Management Appliance EMC Celerra NS Series 47

Solution Guide

Page 48: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 9 Create an archiving task

Monitoring the progress of a running task

The status of a running task can be obtained through the Rainfinity GUI. In most cases it is not necessary to perform any monitoring of tasks as they are running. Administrators will, however, want to review the final report for archiving tasks which have run, detailing the number of files/bytes that were archived as well as any archiving operations which did not complete successfully. Failed archiving operations will not result in data being unavailable but administrators will want to review log files to determine the root cause of the failure.

48 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 49: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 10 Task schedule and detailed report

A log file is created for every execution of every archiving task and the logs are stored in the /var/log/rainfinity/filemanagement/fws/support directory.

Administrators can halt running tasks using the GUI. An example of when an administrator may choose to stop a task would be when all archiving operations are failing against a large dataset. To save time in such a scenario an administrator may not want to wait for the task to complete before beginning the troubleshooting process.

When running multiple concurrent tasks, an administrator will want to monitor the system performance and resources closely. In particular, the CPU utilization should be monitored using the CLI top command and free RAM should be monitored using the CLI free command.

In addition, you should avoid overloading the FMA database with excessive numbers of connections which can result from running many tasks concurrently. As a general guideline and best practice, you should avoid running more than five tasks concurrently of any type (data collection, stub scanner, policy preview, or archiving).

Scheduling archive tasks

When an administrator creates a new archiving task, it is recommended to run the task and review the archiving summary in order to ensure that the policy can be applied successfully and it does not result in excessive numbers of recall operations (indicating the policy is not archiving the least utilized data). Once these points have been confirmed, the policy can be scheduled to run periodically,

Rainfinity File Management Appliance EMC Celerra NS Series 49

Solution Guide

Page 50: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

When deciding to schedule a policy to recur, it is important to set an appropriate frequency and to avoid the possibility that other scheduled tasks will overlap in the times they are running. Running an archiving policy too frequently, such as every day, is not useful as it will require a large amount of scanning and processing time that results in very few new files being archived. Tasks should be scheduled to run such that the amount of data archived is an acceptable tradeoff for the amount of time and system resources that are required to execute the task.

It is also important to note and estimate the amount of time it will take for an archiving task to complete. This will help prevent concurrent execution of tasks. As an example, if a task is scheduled for 11 P.M. and another new task that takes 10 hours to complete needs to be scheduled as well, the new task should be scheduled to start at 1 P.M. or earlier.

Managing archived data The management of archived data refers to a handful of activities. Stub re-creation manages stub data on primary storage, orphan file management controls file data stored on secondary storage, and the stub scanner manages the relationship between the two tiers. In addition, data archived to a Celerra or Centera will typically need to be backed up on a regular basis.

Managing Celerra repositories

A common reason for implementing tiered storage through file archiving is to shrink the size of a large dataset and to lower required backup windows. The primary storage shrinks and can be backed up more quickly. The secondary storage now requires backup as well.

Data in a Celerra repository is only modified by certain features of an FM appliance. After running an archiving task, new data will be created in the repository and a backup will be required. During orphan file management, data may be deleted from the repository, again requiring backup. In this latter case a backup is less critical as failure to complete the backup would not result in data loss; it would result in excess data being left in the backup copy.

Other than these activities, the data in a Celerra repository will not be changed and a backup would be unnecessary.

When scheduling archiving tasks it may be desirable to have the expected time of completion coincide with the start of a backup process.

Managing Centera data

As with Celerra repositories, the data archived to Centera should be replicated. Centera protects the data using the CPM or CPP protection schemes and as such there is no need to back up the data stored on Centera on regular basis.

You can replicate data from one Centera to another, using the Centera CentraStar Replication feature, an asynchronous technology that replicates new data typically within a few hours of its creation.

As with Celerra repositories, the data stored on a Centera will not be modified by anything other than archiving tasks and orphan file management.

To ensure that Centera data is replicated (using Centera Replicator) prior to removing it from primary storage, the delayed stubbing option should be used for archiving tasks. Administrators should also be monitoring the replication of new Centera data periodically as FMA does not ensure that data has been successfully replicated prior to removing it from primary storage.

50 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 51: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 11 Delay Stubbing

Stub re-creation

Stub re-creation is a feature which allows administrators to quickly restore access to archived data through the primary storage tier. When a file is archived, an FM appliance replaces an existing file with a stub file containing stub data. FM appliances save all the information required to re-create the stub file at a later time and the stub re-creation feature allows administrators to select individual files to restore on primary storage.

When stubs are re-created, FM appliances will create a file at the same location as the original and utilize the same security settings (NTFS ACL, UNIX mode bits) with the same last modified time. This action can only be performed by the FM appliances which created the stub originally, or a new appliance which was configured through the use of fmbackup/fmrestore utilities.

Rainfinity File Management Appliance EMC Celerra NS Series 51

Solution Guide

Page 52: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

The process for re-creating a stub file is as follows:

1. An administrator selects a stub to re-create.

2. The original location of the file on primary storage is checked to determine if there will be a naming conflict, in which case additional strings will be attached to the re-created stub filename.

3. The new stub file is created and locked.

4. The original security settings for the file set at the time of archiving are read back from secondary storage and applied to the new file.

5. The stub data is written into the new file and the file is marked as a stub using the FileMover API.

6. The last modified timestamp is set for the new file identically to the timestamp from when it was originally archived.

Stub scanner

The stub scanner is a task which detects stub files on all datasets that have been involved in archiving on all configured primary storage servers. The stub scanner scans all datasets involved in archiving and all the stub files which can trigger file recalls. If a file is archived to secondary storage and no stub files exist which reference that data, then that file data on secondary storage is considered to be an orphan file.

The stub scanner task uses filewalking threads to check file metadata and identify stub files. Stub data is then read from any identified stub file. The stub data indicates the location of associated file data on secondary storage. The FM database is then queried to determine if the file data on secondary storage was archived by a particular FM appliance. If an FM appliance identifies a stub file pointing to data that it archived, it will update a timestamp in its database to indicate that a stub was found that referenced a particular piece of archived data.

The stub scanning task is created and scheduled automatically and does not require any configuration by an administrator. It runs automatically at 6 P.M. every Friday.

52 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 53: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 12 Stub scanner

Orphan file management

Orphan file management allows administrators to manage the data stored on secondary storage from the FM appliance. Through the use of the stub scanner, file data which is no longer needed is identified on secondary storage and the orphan file management features allows that data to be removed. This frees up space on secondary storage to allow for more data to be archived.

Note that data stored on a Celerra FLR-enabled file system or a Compliance Edition Centera cannot be deleted until the retention period has expired. Data stored on a Governance Edition Centera can be deleted before the retention has expired provided the application uses the Privileged Delete function. This is not controlled by the FM appliance; the secondary storage servers will not accept operations to delete such data.

Disaster Recovery—High availability and failover High Availability (HA)

High Availability is the ability to continue processing, after the failure of a component, with very little or no interruption. An HA system can have brief periods of data unavailability (DU) that will be no more than a few seconds with no data loss (DL) at all.

Rainfinity File Management Appliance EMC Celerra NS Series 53

Solution Guide

Page 54: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Disaster Recovery (DR)

Disaster Recovery (DR) is the ability to restart processing after the failure of critical components, which might include the loss of all processing capabilities at a given location. Disaster recovery requires that copies of business data be maintained so that after a loss of processing capability, the copy of the data can be used to restart processing. The copy may be maintained continuously through a process of simultaneous writes to more than one storage device, which is known as synchronous copying or by delayed copying of the data. Delayed or asynchronous data copying includes writing to online storage devices at a secondary location. For further protection, the data can be copied to removable media and stored offline. Asynchronous data copying can have very short or very long delays depending on the amount of changed data and the type, speed, and availability of the network.

Disaster recovery systems can be designed to have very small periods of data unavailability, sometimes as little as a few seconds, but more commonly DR systems will be designed with DU intervals ranging from a few minutes to several hours or longer. This document is focused on environments where there is an FM appliance installed at the production site with DR protection provided by an FMA-HA and by continual asynchronous replication of the source and archived data for the Celerra and Centera.

Disaster recovery systems may experience data loss if the copying of data is not synchronous. The amount of data lost is entirely dependent on the rate of change and the delay interval length of the DR system design.

The role of FMA in HA and DR environments

Rainfinity FMA is designed to be highly available. Should the primary FMA fail, the FMA design allows files to be recalled from the Centera by accessing an alternate FMA such as an FMA-HA device. When EMC Celerra archives to another Celerra, access to the FMA for file recall is not necessary.

Rainfinity FMA does not cache any user data and is not part of the replication process. When FMA is involved in file recall, as is the case when archiving to Centera, it is important to carefully follow the guidance of this document in addition to the guidance included in the EMC Rainfinity User Guide and the Rainfinity Release Notes in order to maintain the DU interval of the disaster recovery design. Both the production and archive data will be replicated, but the configuration of FMA does not depend on the type of replication used.

This document assumes that regaining access to both primary and archived storage data is the first priority and that resuming archiving operations is a secondary priority, but required. There is no replication between FMA devices, so immediately after DR procedures are implemented there will be access to both primary and archive storage, but additional steps will be required before archiving operations can resume at the DR site.

Failover best practices

The failover best practices described in this section address configuration requirements necessary to recall data should the FMA, Celerra, or Centera fail at the production site or should the entire production site fail.

As shown in Figure 1, the validated solution was configured for high availability and disaster recovery. This section describes the steps necessary to recall data should one of the components or the entire site fail.

54 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 55: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

FMA devices FMA_A and FMA_B are configured to archive data to Centera with the IP address that is appropriate for the specific site. In practice, each Centera will have multiple IP addresses and all or most of the Centera IP addresses at a site will be configured in the FMA at that site. Although the production and disaster recovery sites in the tested solution are connected by way of Ethernet, disaster recovery configurations will likely be installed at a remote location accessible by way of a WAN. The WAN environment will impose potential performance limitations for recall and archiving of files because of WAN bandwidth and latency. It is important to assess network availability when planning for disaster recovery. For instance, if the production Centera fails, users can continue to recall files from the disaster recovery Centera but archiving should be postponed until the production Centera is available since writing to the disaster recovery Centera will require more bandwidth and latency than reading data during the recall process.

Production FMA failover and failback

Should the production FMA fail, archived data can be recalled from the FMA-HA. As described in Overview of the recall process, the IP addresses of all FMA and FMA-HA must be included in the Data Mover host file. The Data Mover will perform a query to the Data Mover hosts file to resolve the IP address of an FMA or FMA-HA appliance. The Data Mover will use HTTP to read the archived data from the appliance, which in turn reads it from Centera using the Centera SDK. If the primary FMA does not respond to the HTTP read requests, the Data Mover will use an alternate IP address of another appliance, the FMA-HA, configured in the hosts file. Transition from FMA to FMA-HA is instantaneous.

After failing over to the FMA devices at the DR site, recalls will be successful but no archiving of the production file systems from the production site will take place. If FMA functionality will not be restored at the production site before the next desired archiving cycle, the DR FMA device must be reloaded in order to restore the production site FMA database to the DR FMA.

Production Celerra failover and failback

In the validated solution, Celerra Replication software was configured with a 30-minute recovery-point-objective (RPO) ensuring that all data changes on the production Celerra are synchronized with the disaster recovery Celerra within a 30-minute window. Data is transferred across Data Mover interconnects, which are configured with bandwidth schedule settings. An interconnect is defined per primary/seconday Celerra Data Mover relationship, and is shared by all the objects being replicated. The Celerra Replicator adaptive scheduler determines the size and frequency of updates based on the bandwidth settings, the incoming data loads, and the concurrency of data transfers to ensure this recovery-point-objective service level is maintained. Your specific RPO will vary based on the recovery needs of your organization.

In a CIFS environment, Celerra can replicate a file system or a Virtual Data Mover (VDM). A VDM is an EMC Celerra Network Server software feature that enables the grouping of CIFS file systems and servers into virtual containers. Each VDM contains all the data necessary to support one or more CIFS servers and their file systems. The servers in a VDM store their dynamic configuration information such as local groups, shares, security credentials, audit logs, and so on in a configuration file system. A VDM can be replicated to a remote Data Mover as an autonomous unit. The CIFS servers, their file systems, and configuration data are available in one virtual container. VDMs are not required for NFS only environments.

Rainfinity File Management Appliance EMC Celerra NS Series 55

Solution Guide

Page 56: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Common configuration steps

The following steps outline the initial requirements to set up replication regardless of whether the replication is of an NFS file system, a CIFS file system, or a CIFS VDM. This process assumes that the Celerra Replicator software license is enabled on the production and disaster recovery Celerras and that CIFS and NFS file systems are already created on the primary Data Mover:

1. Create a Celerra Network Server on the production and the disaster recovery Celerras to establish a trusted relationship between the two Celerras. This step must be performed on the production Celerra and repeated on the disaster recovery Celerra.

Figure 13 Create a Celerra Network Server

2. On the production Celerra, enter the Network Server Name of the DR Celerra.

3. On the production Celerra, enter the IP address of the DR Celerra.

56 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 57: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

4. Enter a passphrase. The secure passphrase is used for connection validation between Celerras. It must be the same of both sides of the connection.

Repeat steps 1 through 4 on the DR Celerra. Enter the name and IP address of the production Celerra and enter the same passphrase.

Create a new Data Mover Interconnect from the production Celerra to the disaster recovery Celerra and another from the disaster recovery Celerra to the production Celerra. The interconnect is only considered established after it is set up on both sides. Once the interconnect is established for a Data Mover pair, replication sessions can use the interconnect.

5. From the Replication page, select Data Mover Interconnects and the New Data Mover Interconnect window will appear as shown in Figure 14.

Figure 14 New Data Mover Interconnect

6. Enter the Celerra Network Server name of the production Celerra.

7. Enter an Interconnect Name.

8. Select a Data Mover for the local side of the interconnect.

9. Select at least one IP address or one Name Service Interface Names that is available for the local side of the interconnect.

The Service Interface Names field is optional. It is a comma-separated list of the name service interface names available for the local side of the interconnect. These hostnames

Rainfinity File Management Appliance EMC Celerra NS Series 57

Solution Guide

Page 58: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

must resolve to a single IP address. Although not required, these names should be fully qualified names

10. Select the Data Mover for the peer side of the interconnect.

11. Select at least one IP address or Peer Name Service that is available for the peer side of the interconnect.

12. Modify the bandwidth schedule to accommodate your organization’s needs. You may choose to reduce interconnect bandwidth during certain peak hours to reduce network impact on the users. Alternatively, you may determine that a small reduction in user performance is an acceptable tradeoff to ensure that changed data is replicated as fast as possible.

File system replication

Setting up a file system replication is described in the following steps. These steps apply equally to CIFS and NFS file systems. When the configuration is complete, Celerra Replication will begin replicating the file system to the DR Celerra attempting to keep the data on the destination synchronized with the data on the source depending on the desired recovery-point-objective as represented by the Time Out of Sync value defined during the replication session creation. For example, if this value is set to 30 minutes, then the content of (and latest checkpoint on) the destination should not be older than 30 minutes and that checkpoint time is identical to the time of the corresponding checkpoint on the source.

To create a file system replication session, perform the following steps:

1. Select Replications from the Celerra Manager drop-down menu and select New. The New Replication window will appear as shown in Figure 15.

2. Select Replicate a File System and click Continue.

3. The New File System Replication window will be displayed as shown in Figure 16.

58 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 59: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 15 New Replication

Rainfinity File Management Appliance EMC Celerra NS Series 59

Solution Guide

Page 60: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 16 New File System Replication

4. Enter a replication name.

5. Select a Source File System name from the list box.

6. Select a Storage Pool from the list box.

7. Select the Destination Celerra Network Server for the file system replication.

8. Select the Source Data Mover Interconnect name created earlier.

9. Select any interface defined for the source and destination local side of the interconnect, or select a specific IP address or name service interface name to use for this session.

10. Select to either create the destination file system using a storage pool (the default), or use an existing file system at the destination. If you specify an existing file system, you can select to

60 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 61: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

overwrite any changes that may have occurred at the destination since the last update. An existing destination file system must be read-only and the same size as the source.

11. Select the storage pool to use to create the destination file system.

12. Select the storage pool to use to create checkpoints at the destination.

13. Indicate whether to update the file system on the destination manually by selecting Manual Refresh (default), or automatically by selecting the Time Out of Sync option. If you select Time Out of Sync, specify a value in the Time Out of Sync field.

Specify the elapsed time window within which the system attempts to keep the data on the destination synchronized with the data on the source. For example, if this value is set to 30 minutes, then the content of (and latest checkpoint on) the destination should not be older than 30 minutes and that checkpoint time is identical to the time of the corresponding checkpoint on the source.

File system failover

Setting up a file system failover is described in the following steps. These steps apply equally to CIFS and NFS file systems. When the failover completes, users will continue to access their data as if it were on the production Celerra. Celerra Replication will begin replicating the file system to the disaster recovery Celerra attempting to keep the data on the destination synchronized with the data on the source depending on the desired recovery-point-object as represented by the Time Out of Sync value defined during the replication session creation. For example, if this value is set to 30 minutes, then the content of (and latest checkpoint on) the destination should not be older than 30 minutes

The failover process changes the destination object from read-only to read/write and stops the transmission of replicated data. The source file system, if available, becomes read-only. When the production Celerra goes offline, failover must be initiated from the DR Celerra.

VDM replication is not available for NFS file systems; therefore, the file systems which will be replicated to the DR Celerra must be exported with an available interface on the DR Celerra. The NFS client must be updated with the new mount point. Update the NFS client with the new mount point.

When the DR Celerra becomes the active Celerra, users will be able to recall files from the FMA at the DR site. This is possible because a recall request will be processed by the host file on the DR Data Mover. This host file contains the IP addresses of the FMA and FMA-HA located at the DR site.

The following process shows how to failover an existing file system. In the tested solution, the DR Celerra is named rtpsol11. Figure 17 lists the file systems and VDMs available for failover.

To failover a particular file system, perform the following steps:

1. From the DR Celerra Replications page of Celerra Manager, click the name of the file system to failover.

2. Click Failover at the bottom of the screen as shown in Figure 17.

Rainfinity File Management Appliance EMC Celerra NS Series 61

Solution Guide

Page 62: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 17 File System Failover

3. When the file system has successfully failed over, the Status column will change from OK to Failed Over.

Successful client access to file systems when failed over requires that you maintain DNS, Active Directory, user mapping, and network support of the data recovery (disaster recovery) site. The Celerra Network Server depends on the DNS, Active Directory, user mapping, and network systems at your destination site to function correctly upon failover.

62 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 63: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

VDM replication

Prior to creating a VDM replication session, the following prerequisites must be met:

♦ The production and DR side interface names must be the same for CIFS servers transition.

♦ The production and DR side mount points must be the same for the share names to resolve correctly. This will ensure that CIFS shares can recognize the full path to the share directory and users will be able to access the replicated data after failover.

♦ When the production replicated file system is mounted on a VDM, the DR file system should also be mounted on the VDM that is replicated from the same source VDM.

♦ The local groups in the production VDM are replicated to the DR side in order to have complete access control lists (ACLs) on the DR file system.

♦ A Data Mover Interconnect must be set up for use by the failover session as shown in Figure 14.

♦ A Virtual Data Mover must be created.

To create a Virtual Data Mover, perform the following steps:

1. Select Data Movers from the Celerra Manager list box.

2. Click Virtual Data Movers and click New as shown in Figure 18.

3. Select the Data Mover on which the Virtual Data Mover will reside.

4. Enter a unique VDM name.

5. Selecting the Default Storage Allocation option instructs the system to take a 128 MB slice from the best matching storage pool. If you select the Storage Pool option, the page changes to display a list of available storage pools that can be used to create a new Virtual Data Mover.

Rainfinity File Management Appliance EMC Celerra NS Series 63

Solution Guide

Page 64: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 18 Create a VDM

64 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 65: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Next, create a VDM replication session using the following steps:

1. Select Replications from the Celerra Manager list box.

2. Select New. The New Replication window will appear as shown in Figure 19.

3. Select Replicate a Virtual Data Mover and click Continue. The New VDM Replication screen will appear as shown in Figure 20.

Figure 19 Replicate a VDM

Rainfinity File Management Appliance EMC Celerra NS Series 65

Solution Guide

Page 66: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 20 New VDM Replication

4. Enter a Name for this replication session.

5. Select the Source VDM created above.

6. Select the Destination Celerra Network Server.

7. Select the locally defined interconnect to use for this VDM replication session.

8. Select any interface defined for the local-side interconnect.

9. Select any destination interface defined for the peer-side interconnect.

10. Select to either create the destination VDM using a storage pool (the default), or use an existing VDM at the destination. If you specify an existing VDM, you can select to overwrite any changes that may have occurred at the destination since the last update.

66 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 67: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

11. For Time Out of Sync, specify the elapsed time window within which the system attempts to keep the data at the DR site synchronized with the data at the production site.

After creating the session to replicate the VDM, you can create a session to replicate the file system mounted to the VDM.

Figure 21 shows the steps required to create a file system replication. The required parameters are the same as when creating any other file system replication except for the Select a VDM: entry. For this entry, you must select an existing VDM on which to mount the DR file system if the production file system is on a VDM.

Figure 21 Create a File System Replication for VDM

VDM failover

This operation performs a failover of the selected destination-side replication sessions, with possible data loss, to handle the disaster recovery scenario in which the source becomes unavailable. After the failover, the destination object is read/write. When communication is re-established between the source and destination, the source becomes read-only.

This failover operation terminates the transfer of data if there is a transfer in progress, causing a loss of any data not yet transferred to the destination. The destination is restored to a consistent state. When the source site becomes available again, replication can be restarted. If the source site is still available when you perform a failover, the system attempts to change the source file system from read/write to read-only.

Rainfinity File Management Appliance EMC Celerra NS Series 67

Solution Guide

Page 68: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

When a replication relationship involves a VDM, perform the failover in this order:

1. Fail over the VDM. The system changes the VDM state from unloaded to OK on the DR site. If the production site is still available, the system attempts to change the VDM on the production site from OK to unloaded.

2. Fail over the user file systems contained in that VDM.

Note: Successful access to CIFS servers, when failed over or switched over, requires that you maintain DNS, Active Directory, user mapping, and network support of the data recovery (disaster recovery) site. The Celerra Network Server depends on the DNS, Active Directory, user mapping, and network systems at your destination site to function correctly upon failover.

Celerra failback

When failover is complete, the status shown in the Replications window will be “Failed Over” both on the production and the DR Celerras as shown in Figure 22.

Figure 22 Celerra failover status

When the production Celerra functionality is restored, failback the VDM and its file systems, or only file systems if using NFS, from the DR Celerra to the production Celerra. Prior to doing the failback, you must initiate replication of the file system or VDM from the DR Celerra to the production Celerra.

68 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 69: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Initiate replication from the DR Celerra to the production Celerra with the following steps:

1. From the DR Celerra Replications page, select the VDM or file system to be replicated back.

2. Click Start as shown in Figure 23.

Figure 23 Start a replication

3. The Start Replication window will open as shown in Figure 24.

Rainfinity File Management Appliance EMC Celerra NS Series 69

Solution Guide

Page 70: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 24 Start a replication

4. Select Discard Changes on Destination Since Last Copy to overwrite any changes made to the destination object since the last update.

5. Click OK to begin the replication.

When the DR to production Celerra replication is complete, you must restart replication from the production to the DR Celerra through a feature called replication reversal. Replication reversal is the process of reversing the direction of replication. The file system on the DR Celerra becomes read-only and the file system on the production Celerra becomes read/write.

Reverse replication is initiated by selecting a VDM or file system and clicking the Reverse button on the DR-Celerra as shown in Figure 25.

70 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 71: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 25 Reverse a replication

After the Reverse operation is initiated from the DR Celerra, the Status of the replication will appear as OK on the production and the DR Celerra.

Rainfinity File Management Appliance EMC Celerra NS Series 71

Solution Guide

Page 72: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Figure 26 Replication status

Centera replication

Prior to enabling Centera replication, the FMA at the DR site must be configured with the IP addresses of the DR Centera Access Nodes, as shown in FMA settings for archiving to Centera.

Centera replication is enabled using CLI commands. The command “set cluster replication” sets the IP address of the target (replica) cluster and enables or disables replication. Replication should be configured with multiple IP addresses. If only one IP address has been configured, replication will stop when the node with the defined IP address goes down. Furthermore, do not leave IP addresses in the configured list of nodes that are unavailable for an extended period of time.

To launch the CLI, you can use the Centera Viewer tool or you can go to the Start menu, select Programs > Centera Console > CLI. Enter your username and password.

Enter the set cluster replication command as:

Config# set cluster replication Replication Enabled? (yes, no) [no]: yes Replication Address [10.6.29.189:3218]: 10.6.125.144:3218

This is the target cluster where data is being replicated to. The address consists of the hostname or IP address, followed by the port number of the replication cluster (3218). Separate multiple addresses with a comma. The following is an example of the different replication addresses:

72 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 73: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Host name: NY1.customer.com:3218, NY2.customer.com:3218

IP address: 10.5.51.12:3218, 10.5.50.12:3218

EMC recommends that a minimum of two and a maximum of three replication addresses be used. Adding more than three replication addresses may cause significant timeouts on the source cluster.

Replicate Delete? (yes, no) [yes]: yes This command propagates deletes and privileged deletes of C-Clips on the source cluster to the target cluster. The corresponding pool must exist on the target cluster. Replication/restore is automatically paused if a C-Clip is replicated for which the pool does not exist on the target cluster.

Replicate incoming replicated Objects? (yes, no) [yes]: no This must be enabled on the middle cluster in a Chain topology. This does not have to be enabled in a Unidirectional, Bidirectional, or Star topology.

Replicate System Pools? (no, current, all or <list>) [current]: no No: Turn replication for the System Pool off. Current: Leave the setup as it currently is. All: Replicate all pools in the System Pool. List: List individual pools (in the System Pool) to replicate.

Replicate Application Pools? (no, current, all or <list>) [current]: default No: Turn replication for all application pools off. Current: Leave the setup as it currently is. All: Replicate all application pools. List: List individual application pools to replicate

Profile Name: The profile name on the target cluster. The export/import poolprofilesetup command enables the user to export pools or profiles to another cluster. It is possible to use wildcards here to list all profiles: * or all profiles beginning with the same letter(s): ab*. The Profile Name was not used on our test configuration.

Location of .pea file:

If a PEA file was used, enter the location on your local machine (for example: C:\Centera\replication.pea). Enter the full pathname to this location or press Enter to get a popup asking for the password of the profile.

Issue the command? (yes, no) [no]: yes

Once replication is setup, data archived to the production Centera will be automatically replicated to the DR Centera.

Centera failover and failback

If the primary Centera fails, files will be recalled from the DR Centera. Users will continue to access data on the production Celerra.

Rainfinity File Management Appliance EMC Celerra NS Series 73

Solution Guide

Page 74: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

To initiate recall from the DR Centera, IP addresses of the production FMA and FMA-HA must be deleted or commented out from the production Data Mover hosts file leaving only the IP addresses of the DR FMA and FMA-HA.

After failing over to the FMA devices at the DR site, recalls will be successful but no archiving of the production file systems from the production site will take place. If full functionality will not be restored at the production site before the next desired archiving cycle, the DR FMA device must be reloaded in order to restore the production site FMA database to the DR FMA.

When the production Centera is fully functional, repeat the set cluster replication command to replicate from the DR to the production Centera.

Production site failover

Should the production site experience a catastrophic failure, the FMA, the Celerra, and the Centera will failover to the DR site using the steps and commands detailed in the previous sections of this guide.

FMA Backup Although the information stored in the database of an FMA device is not required to recall files from secondary storage, it is required for the other functions of the FMA device. When planning the implementation of FMA in a DR environment it is important that a backup of the production FMA database be available at the DR site after failover of the FMA. Although we do not specify how or where to place backups of the FMA database, a replicated file system on the Celerra is an excellent place for this. An NFS export can be mounted to the FMA device and the database copied to the exported file system. Alternatively, FTP can be used to put the backup onto the Celerra. The backup can be restored from Celerra if needed. The backup script is on the FMA device at /opt/rainfinity/filemanagement/bin/fmbackup. In the event that a disaster occurs, the fmrestore command is used to load the output from the fmbackup script onto a new appliance.

The FMA database The purpose of the File Management appliance database is to store information about the tiered storage implementation and archiving activities and it is used extensively within the FMA software. All major functions of the tiered storage infrastructure other than file recall utilize the FMA database either by creating, modifying, or querying entries.

Most of the features accessed through the GUI utilize the FMA database. This includes:

♦ Creating or listing schedules using the “Schedule” page

♦ Generating reports or managing archived files using the “Archived Files” page

♦ Creating or viewing policies using the “Policies” page

♦ Creating/viewing file servers or NAS repositories from the “Configuration” page

When using the command line, nearly all options of the rffm command make extensive use of the FMA database. In addition, the scheduler component of the FM appliance triggers tasks to run at specific times. When a task runs (regardless of the type) it reads or modifies the contents of the database.

74 Rainfinity File Management Appliance EMC Celerra NS Series

Solution Guide

Page 75: EMC Celerra NS Series · Celerra server ... (DR) ... The Rainfinity FMA on Celerra NS Series solution configuration includes a production site and a

Solution Best Practices

Users interact with the FMA database indirectly by using the GUI and CLI and the database does not require any specific maintenance to be performed by administrators.

Note that when you are upgrading the FM appliance using a CD upgrade the table indices will be rebuilt to ensure optimal performance. This may take a few minutes or up to a few hours.

The FMA database is not an integral part of recalling archived data and thus the loss of the FMA database will not inherently cause data unavailability. However, the FMA database stores metadata describing the states of files on primary and secondary storage as well as their relationships. The loss of this information will affect most features of the appliance as previously described. It is therefore desirable to protect the FMA database through periodic backups.

There is no specific recommendation for backup strategies but specific consideration should be given to the types of events which modify the contents of the FMA database. Specifically, archiving, data collection, and stub scanning tasks have the potential to generate large amounts of database changes. It is advisable to schedule backups after these types of tasks have completed.

The entire configuration of an FM appliance including the database is backed up using the fmbackup utility from the FM CLI. This utility generates an output file (.tgz) which should be copied to a safe location for use in the event that a disaster occurs.

The output file from the fmbackup utility can be restored on an FM appliance using the fmbackup utility from the FM CLI. This will erase any existing configuration on the FM appliance and replace it with the configuration and database contents stored in the backup file.

Rainfinity File Management Appliance EMC Celerra NS Series 75

Solution Guide