2258 an alternative to ndmp for network-attached … alternative to ndmp for network-attached...

40
#ibmedge © 2016 IBM Corporation 2258 An Alternative to NDMP for Network-Attached Storage Backup Joseph King CTO, CAS Severn Lars Henningsen CTO, General Storage

Upload: ngothu

Post on 14-Mar-2018

228 views

Category:

Documents


1 download

TRANSCRIPT

#ibmedge © 2016 IBM Corporation

2258 An Alternative to NDMP for Network-Attached Storage Backup Joseph King – CTO, CAS Severn Lars Henningsen – CTO, General Storage

#ibmedge

Presenters

Joseph King CTO, VP Presales and Technical Services CAS Severn [email protected] 443-668-0888

2

Lars Henningsen CTO General Storage Software GmbH [email protected] +49 151 67 31 30 13

Agenda

• History of NDMP

• Challenges with NDMP

• MAGS – An Alternative to NDMP

3

History of NDMP

• Co-Developed by PDC/Legato and Network Appliance in 1995

• First Specification Submitted – October 1996

• NDMP v4 – Approximately 2000/2001 • Allowed Proprietary Extensions

• NDMP v5 – Can’t find much past 2003-2006 • Mostly Extensions (Security, DataMovers, etc)

• Storage Networking Industry Association Manages Specification

4

History of NDMP – Data Movement

• Started with Direct Attached Tape

• Progressed to 3 way Backup • Backup Server Received Data or Redirected Data

• Waning use of Tape Libraries – Enter the VTL

• Rise of VTL with Deduplication

5

Questions

• How often do you do full backups?

• How long does it take?

• Have you ever done a restore?

• Have you ever done a restore of an entire system?

• Have you given up on NAS backup?

6

Challenges with NDMP

File server backup

How are really big file servers backed up? (in ascending order of popularity) 4.) NDMP (Slow. Doesn’t scale well. Requires regular full backups.) 3.) SnapMirror to Tape (NetApp only. Mostly faster than NDMP but still requires full backups.) 2.) SnapDiff (NetApp only. Needs new baseline from time to time. Errors difficult to sort out. Doesn’t help with restore. Similar situation with all forms of journal based backup on other platforms.) 1.) Not at all.

*) Combinations of various kinds of mirroring and snapshot technologies often substitute what is usually considered to be backup. Almost all these methods are proprietary (can’t be simply reused after migrating to another file server technology), require disk for all data, get very expensive very fast (or even unusable) when more historic data has to be kept etc.

File server backup

In an ideal world, you could simply use your backup tool (well, TSM since it’s still the only one really doing “incremental forever”) and its existing infrastructure and operational integration to backup file servers of any size with any number of objects. Just like you do with any other file system in your environment. You wouldn’t have to worry when migrating your file services from NetApp to EMC to IBM to Microsoft to xyz and back again because backup method and existing backup data stay the same (i.e. “as seen by the user” and not “as seen by a specific file server”)

NAS

File server incremental backup scenario with TSM

File System

File Server TSM Client TSM Server

DB

dsmc incr \\myfileserver\mytopshare

File System

File Server TSM Client TSM Server

DB

TSM client/server/DB Metadata gets sent to the client for comparison. Client sends changed/new data (if/when it finds any). All in bulk and rather efficient – to a point.

TSM client/file server/file system Looks up directory and file information to compare with meta data received from the server. Reads changed/new data (if/when it finds any). All of that one by one.

File server incremental backup scenario with TSM

File System

File Server TSM Client TSM Server

DB

File server incremental backup scenario with TSM

Lets say the process of looking up an object in the file system and deciding whether or not to backup it up takes 2 ms on average….

File server incremental backup scenario with TSM

… which means scanning 500 objects/second

which means just 1,800,000 in an hour

which means just 43,200,000 in a full day

which means just 302,400,000 in a week which means most file servers simply cannot be backed up in this way. Too many objects – not enough time.

MAGS – An Alternative to NDMP

File System

File Server TSM Client TSM Server

DB

MAGS

The major challenge you face when backing up file servers incrementally is latency

File System

File Server TSM Client TSM Server

DB

MAGS

You could try to bring latency down a bit (use infiniband, keep metadata on SSD or in RAM, use faster disks, faster CPUs, faster everything) but that wouldn’t really help. Speed it up by a factor of two (ambitious) and you still end up with something probably orders of magnitude too slow.

File System

File Server TSM Client TSM Server

DB

MAGS

However - using two threads rather than just one practically achieves the same as cutting latency in half.

File System

File Server TSM Client TSM Server

DB

MAGS

Using four threads basically equals the effect of cutting latency to 25% etc.

MAGS

So the solution is running many “incrementals” in parallel, which is what MAGS does automatically • MAGS is a program which runs on the same machine as your (Windows) TSM client

• Splits a file system into hundreds or thousands of more or less equal chunks and scans

them in as many parallel streams (TSM client runs) as you have licensed (20 streams per license package)

• Makes sure there is no overlapping and nothing left out

• Works as a single, scheduler-friendly job with a beginning and an end and a single return code

• Does not interfere with data at all. Only the regular TSM client handles files and directories. MAGS merely points the client at the right directories at the right time.

File System

File Server TSM Client TSM Server

DB

So instead of this…

MAGS

MAGS

File System

File Server TSM Client TSM Server

DB

… you get this

MAGS

MAGS

Download and install MAGS on the windows machine running your TSM client Log on to the MAGS web interface and configure which file servers and which shares to back up Run or schedule MAGS

Deployments

Chemical / Germany

MAGS TSM Servers

\weird-application\

\00

\01

\03

\... 60,000,000 files

Deployments

Financial / Italy

MAGS TSM Servers

110,000,000 files

Deployments

Automotive / Germany

MAGS TSM Servers

210,000,000 files

20 Isilon X nodes

400,000,000 Files

24 Isilon X nodes

650,000,000 Files

10 Isilon NL nodes

10 Isilon NL nodes

S

S

S

S

10 Isilon HD nodes

10 Isilon HD nodes

GSCC Cluster (IP-only / internal Flash /

Linux TSM Servers)

dsmISI parallel access

MAGS

MAGS

Deployments

Automotive / Germany

S

S

S

S

MAGS

MAGS

Deployments

Automotive / Germany

MAGS FAQ

So how fast is it?

It depends, of course, but in most cases it scales almost linearly with the number of streams – so 20 streams is practically 20 times as fast as the same TSM client without MAGS. From experience, the overhead for establishing new sessions with the TSM server has a negative impact with “smaller” file systems. So if you only have 30 million files and backup without MAGS takes 20 hours, you may expect two or three hours rather than just one. Rule of thumb: the bigger the file system, the more of a linear scalability you can expect.

MAGS FAQ

What about compatibility?

As already mentioned, MAGS doesn’t handle the data as such. From a TSM client and server perspective, data look exactly like they would if you had backed up without MAGS. You can back up incrementally without MAGS based on MAGS backups, backup incrementally with MAGS on the basis of backups originally done without MAGS, restore data backed up with MAGS without using MAGS for the restore and (especially useful) restore data with MAGS which you haven’t backed up with MAGS.

MAGS FAQ

Wait a minute…. you said “restore”?

Yes. Restoring with MAGS is as much of an accelerator as backing up with MAGS. Even if data come from tape (if you have more than one drive and more than one cartridge holding the data you want to restore), it is usually a lot faster than it would be without MAGS. Record so far is restoring half a petabyte (250 million files) of NetApp data to an Isilon in less than 6 days (about 1 GB/s).

MAGS FAQ

What about options x, y and z? Are they supported?

Yes. Every TSM client session started by MAGS is using the options you specified in your dsm option file, cloptset, include-exclude-list etc.

MAGS FAQ

Are snapshots supported?

Yes. Backing up from snapshots is possible without compromising a name space. That snapshot can reside in the original file server or in a synchronized, secondary file server. To speed things up even further, MAGS can spread the load across multiple file servers if they hold copies of the same snapshot. With Isilons it can, on top of that, spread the load evenly across all nodes of one or even two clusters. Individual latencies are taken into consideration – so every additional source of data makes the entire process faster – even if that additional source is slower than other sources.

MAGS FAQ

Speaking of loads… doesn’t the file server break down during backup?

No. Most file servers have no problem with hundreds of clients browsing through meta data. They’re usually optimized for handling many, many requests at the same time rather than trying to speed up a single, big one (which is why they don’t perform well with a single threaded TSM backup). The first, full backup is a different matter and may require some caution.

MAGS FAQ

What about our TSM server? Will that suffer?

There will certainly be more load than for the same job without MAGS but for a shorter time. Keep an eye on stuff like maxnummp, number of mountpoints in device classes, maxsession etc. On the other hand, issues you may have with locking are usually a lot less persistent because a typical MAGS session only lasts minutes rather than days.

MAGS FAQ

How many client machines do I need and how do I size them?

Probably fewer than you may think. During a regular backup, most of the time is wasted on waiting for the NAS server(s) to respond. With MAGS, you can actually use all the resources you have. Typically, you can calculate somewhere between 4 and 12 streams per Xeon core – depending on latency. 2,000 files per second per core means 48,000 files per second for a 24 core machine which means up to 170 million files per hour. That is more than most file servers can provide – so your file server is more likely to be the limiting factor than the machine doing the backup. If in doubt (every environment is different), use the free trial period to test it. In terms of RAM, you’ll need a lot less than for a regular backup. Normally, the client requests meta data about big parts of a file systems which then build up as a big chunk waiting to be worked through. With MAGS, there are smaller chunks which disappear as soon as the corresponding, smaller job finished. There is a RAM limit setting in MAGS which prevents swapping if it gets too crammed. 64 – 128 GB overall should work nicely for most users.

MAGS FAQ

What about NFS?

Not yet. MAGS currently supports only CIFS/Windows. NFS V3 and V4 support is coming with MAGS V1.2 which will be available in late Q3 or early Q4/2016. It requires ssh access to a Linux machine (any distribution supported by TSM) in addition to the Windows machine running MAGS. In mixed environments, NFS will require a separate TSM node name (so you can start today with backing up all CIFS data via MAGS and your NFS data via a regular TSM client which can then be controlled by MAGS once the functionality becomes available).

MAGS FAQ

Nobody really understands TSM PVU licensing. How does backing up a file server effect that?

Neither do we. From our understanding, you’ll have to license PVUs for the file server - NOT for the TSM client machine actually running the backup – unless you also backup parts of that client machine’s disks. Consider if Front-End licensing makes better sense.

MAGS Wrapup

There is an alternative to NDMP Eliminates the occasional full backup headache NAS backups can finish on time We use proxies for VMware backup, why not use it for NAS

Thank you.

References

• http://www.snia.org/sites/default/files/technical_work/NDMP/NDMP%20White%20Paper.pdf

• http://www.snia.org/about/news/newsroom/pr/snia-releases-ndmp-software-nas-file-servers-and-appliances

• https://en.wikipedia.org/wiki/NDMP

• http://www.nfsv4bat.org/Documents/ConnectAThon/2001/ndmp_overview.pdf