2258 an alternative to ndmp for network-attached … alternative to ndmp for network-attached...
TRANSCRIPT
#ibmedge © 2016 IBM Corporation
2258 An Alternative to NDMP for Network-Attached Storage Backup Joseph King – CTO, CAS Severn Lars Henningsen – CTO, General Storage
#ibmedge
Presenters
Joseph King CTO, VP Presales and Technical Services CAS Severn [email protected] 443-668-0888
2
Lars Henningsen CTO General Storage Software GmbH [email protected] +49 151 67 31 30 13
History of NDMP
• Co-Developed by PDC/Legato and Network Appliance in 1995
• First Specification Submitted – October 1996
• NDMP v4 – Approximately 2000/2001 • Allowed Proprietary Extensions
• NDMP v5 – Can’t find much past 2003-2006 • Mostly Extensions (Security, DataMovers, etc)
• Storage Networking Industry Association Manages Specification
4
History of NDMP – Data Movement
• Started with Direct Attached Tape
• Progressed to 3 way Backup • Backup Server Received Data or Redirected Data
• Waning use of Tape Libraries – Enter the VTL
• Rise of VTL with Deduplication
5
Questions
• How often do you do full backups?
• How long does it take?
• Have you ever done a restore?
• Have you ever done a restore of an entire system?
• Have you given up on NAS backup?
6
File server backup
How are really big file servers backed up? (in ascending order of popularity) 4.) NDMP (Slow. Doesn’t scale well. Requires regular full backups.) 3.) SnapMirror to Tape (NetApp only. Mostly faster than NDMP but still requires full backups.) 2.) SnapDiff (NetApp only. Needs new baseline from time to time. Errors difficult to sort out. Doesn’t help with restore. Similar situation with all forms of journal based backup on other platforms.) 1.) Not at all.
*) Combinations of various kinds of mirroring and snapshot technologies often substitute what is usually considered to be backup. Almost all these methods are proprietary (can’t be simply reused after migrating to another file server technology), require disk for all data, get very expensive very fast (or even unusable) when more historic data has to be kept etc.
File server backup
In an ideal world, you could simply use your backup tool (well, TSM since it’s still the only one really doing “incremental forever”) and its existing infrastructure and operational integration to backup file servers of any size with any number of objects. Just like you do with any other file system in your environment. You wouldn’t have to worry when migrating your file services from NetApp to EMC to IBM to Microsoft to xyz and back again because backup method and existing backup data stay the same (i.e. “as seen by the user” and not “as seen by a specific file server”)
NAS
File server incremental backup scenario with TSM
File System
File Server TSM Client TSM Server
DB
dsmc incr \\myfileserver\mytopshare
File System
File Server TSM Client TSM Server
DB
TSM client/server/DB Metadata gets sent to the client for comparison. Client sends changed/new data (if/when it finds any). All in bulk and rather efficient – to a point.
TSM client/file server/file system Looks up directory and file information to compare with meta data received from the server. Reads changed/new data (if/when it finds any). All of that one by one.
File server incremental backup scenario with TSM
File System
File Server TSM Client TSM Server
DB
File server incremental backup scenario with TSM
Lets say the process of looking up an object in the file system and deciding whether or not to backup it up takes 2 ms on average….
File server incremental backup scenario with TSM
… which means scanning 500 objects/second
which means just 1,800,000 in an hour
which means just 43,200,000 in a full day
which means just 302,400,000 in a week which means most file servers simply cannot be backed up in this way. Too many objects – not enough time.
File System
File Server TSM Client TSM Server
DB
MAGS
The major challenge you face when backing up file servers incrementally is latency
File System
File Server TSM Client TSM Server
DB
MAGS
You could try to bring latency down a bit (use infiniband, keep metadata on SSD or in RAM, use faster disks, faster CPUs, faster everything) but that wouldn’t really help. Speed it up by a factor of two (ambitious) and you still end up with something probably orders of magnitude too slow.
File System
File Server TSM Client TSM Server
DB
MAGS
However - using two threads rather than just one practically achieves the same as cutting latency in half.
File System
File Server TSM Client TSM Server
DB
MAGS
Using four threads basically equals the effect of cutting latency to 25% etc.
MAGS
So the solution is running many “incrementals” in parallel, which is what MAGS does automatically • MAGS is a program which runs on the same machine as your (Windows) TSM client
• Splits a file system into hundreds or thousands of more or less equal chunks and scans
them in as many parallel streams (TSM client runs) as you have licensed (20 streams per license package)
• Makes sure there is no overlapping and nothing left out
• Works as a single, scheduler-friendly job with a beginning and an end and a single return code
• Does not interfere with data at all. Only the regular TSM client handles files and directories. MAGS merely points the client at the right directories at the right time.
MAGS
Download and install MAGS on the windows machine running your TSM client Log on to the MAGS web interface and configure which file servers and which shares to back up Run or schedule MAGS
Deployments
Chemical / Germany
MAGS TSM Servers
\weird-application\
\00
\01
\03
\... 60,000,000 files
20 Isilon X nodes
400,000,000 Files
24 Isilon X nodes
650,000,000 Files
10 Isilon NL nodes
10 Isilon NL nodes
S
S
S
S
10 Isilon HD nodes
10 Isilon HD nodes
GSCC Cluster (IP-only / internal Flash /
Linux TSM Servers)
dsmISI parallel access
MAGS
MAGS
Deployments
Automotive / Germany
MAGS FAQ
So how fast is it?
It depends, of course, but in most cases it scales almost linearly with the number of streams – so 20 streams is practically 20 times as fast as the same TSM client without MAGS. From experience, the overhead for establishing new sessions with the TSM server has a negative impact with “smaller” file systems. So if you only have 30 million files and backup without MAGS takes 20 hours, you may expect two or three hours rather than just one. Rule of thumb: the bigger the file system, the more of a linear scalability you can expect.
MAGS FAQ
What about compatibility?
As already mentioned, MAGS doesn’t handle the data as such. From a TSM client and server perspective, data look exactly like they would if you had backed up without MAGS. You can back up incrementally without MAGS based on MAGS backups, backup incrementally with MAGS on the basis of backups originally done without MAGS, restore data backed up with MAGS without using MAGS for the restore and (especially useful) restore data with MAGS which you haven’t backed up with MAGS.
MAGS FAQ
Wait a minute…. you said “restore”?
Yes. Restoring with MAGS is as much of an accelerator as backing up with MAGS. Even if data come from tape (if you have more than one drive and more than one cartridge holding the data you want to restore), it is usually a lot faster than it would be without MAGS. Record so far is restoring half a petabyte (250 million files) of NetApp data to an Isilon in less than 6 days (about 1 GB/s).
MAGS FAQ
What about options x, y and z? Are they supported?
Yes. Every TSM client session started by MAGS is using the options you specified in your dsm option file, cloptset, include-exclude-list etc.
MAGS FAQ
Are snapshots supported?
Yes. Backing up from snapshots is possible without compromising a name space. That snapshot can reside in the original file server or in a synchronized, secondary file server. To speed things up even further, MAGS can spread the load across multiple file servers if they hold copies of the same snapshot. With Isilons it can, on top of that, spread the load evenly across all nodes of one or even two clusters. Individual latencies are taken into consideration – so every additional source of data makes the entire process faster – even if that additional source is slower than other sources.
MAGS FAQ
Speaking of loads… doesn’t the file server break down during backup?
No. Most file servers have no problem with hundreds of clients browsing through meta data. They’re usually optimized for handling many, many requests at the same time rather than trying to speed up a single, big one (which is why they don’t perform well with a single threaded TSM backup). The first, full backup is a different matter and may require some caution.
MAGS FAQ
What about our TSM server? Will that suffer?
There will certainly be more load than for the same job without MAGS but for a shorter time. Keep an eye on stuff like maxnummp, number of mountpoints in device classes, maxsession etc. On the other hand, issues you may have with locking are usually a lot less persistent because a typical MAGS session only lasts minutes rather than days.
MAGS FAQ
How many client machines do I need and how do I size them?
Probably fewer than you may think. During a regular backup, most of the time is wasted on waiting for the NAS server(s) to respond. With MAGS, you can actually use all the resources you have. Typically, you can calculate somewhere between 4 and 12 streams per Xeon core – depending on latency. 2,000 files per second per core means 48,000 files per second for a 24 core machine which means up to 170 million files per hour. That is more than most file servers can provide – so your file server is more likely to be the limiting factor than the machine doing the backup. If in doubt (every environment is different), use the free trial period to test it. In terms of RAM, you’ll need a lot less than for a regular backup. Normally, the client requests meta data about big parts of a file systems which then build up as a big chunk waiting to be worked through. With MAGS, there are smaller chunks which disappear as soon as the corresponding, smaller job finished. There is a RAM limit setting in MAGS which prevents swapping if it gets too crammed. 64 – 128 GB overall should work nicely for most users.
MAGS FAQ
What about NFS?
Not yet. MAGS currently supports only CIFS/Windows. NFS V3 and V4 support is coming with MAGS V1.2 which will be available in late Q3 or early Q4/2016. It requires ssh access to a Linux machine (any distribution supported by TSM) in addition to the Windows machine running MAGS. In mixed environments, NFS will require a separate TSM node name (so you can start today with backing up all CIFS data via MAGS and your NFS data via a regular TSM client which can then be controlled by MAGS once the functionality becomes available).
MAGS FAQ
Nobody really understands TSM PVU licensing. How does backing up a file server effect that?
Neither do we. From our understanding, you’ll have to license PVUs for the file server - NOT for the TSM client machine actually running the backup – unless you also backup parts of that client machine’s disks. Consider if Front-End licensing makes better sense.
MAGS FAQ
Where can I get MAGS?
You can try to write down this URL: http://www.concat.de/leistungen/it-infrastrukturen/backup_disaster_recovery/tsm-isilon/dsmisi-mags/
or just google “tsm mags download”
MAGS Wrapup
There is an alternative to NDMP Eliminates the occasional full backup headache NAS backups can finish on time We use proxies for VMware backup, why not use it for NAS
Thank you.
References
• http://www.snia.org/sites/default/files/technical_work/NDMP/NDMP%20White%20Paper.pdf
• http://www.snia.org/about/news/newsroom/pr/snia-releases-ndmp-software-nas-file-servers-and-appliances
• https://en.wikipedia.org/wiki/NDMP
• http://www.nfsv4bat.org/Documents/ConnectAThon/2001/ndmp_overview.pdf