what - static.spiceworks.com · web viewfile-level backups are the most traditional and commonly...

File-Level versus Block-Based Backup Software: A Technical White PaperOne of the long standing debates related to data protection technologies has been whether to use file-level or block-based backup technologies. Which one is better? And, as it goes with many technical questions, the answer is "It Depends". Of course, vendors will ensure the answer aligns with their offerings. The objective of this white paper is not to provide a definitive answer, but to expose the intricacies of each of these technologies so one can make an informed decision on which method is better suited for their environment.

What are File-Level backups?File-level backups are the most traditional and commonly used method of performing backups. As the word traditional implies, it all started with file-level backups. In this method, the backup software crawls through the entire file system identifying files that need to be backed up based on a defined criteria or backup policy. Then, the backup software creates a copy of those targeted files at a pre-defined destination. Traditionally, that pre-defined destination has been tape; however, as disk prices have come down and disk-to-disk backups have thus become affordable, many customers do choose disk for their file level backups with tapes often used as either tier-3 or archival targets.

The main advantage of the file-level backup approach is the ability to backup each and every file irrespective of the underlying storage or disk format. Since the file is accessed via the operating system, there are no issues backing up the files. With that said, this technique also poses several drawbacks impeding the backup performance.

File-level backups: Why are they slow?

As part of the backup process, the file-level backup software scans the entire file system to identify files matching pre-defined criteria, which is a very time-consuming process. Additionally, scanning the file system every time as part of a backup is very taxing on the client resources.

Imagine scanning a 1 TB drive with hundreds of millions of files each and every time you need to run a backup. Obviously the time it takes to complete the scan adds to the backup time, making it very difficult for the file-level software to meet backup windows. For this reason, organizations using file-level backup software rely heavily on the classical weekly full and daily incremental backup schemes. This technique was designed to work around this limitation of file-level backup software and came in handy in that pre-virtualization era. In today's world where the average number of servers across an organization has been growing exponentially (in part thanks to Virtualization!) and companies measure their data footprint in PB (instead of GB or TB), organizations are struggling to meet their backup windows (and ultimately meet the Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO)) using such traditional techniques.

What's wrong with the weekly full and daily incremental backup scheme?

A full backup involves backing up the entire data set which can be very time consuming. On the other hand, an incremental backup implies backing up only files that have been created (new) or modified since the last backup. Since full backups take more time to complete, IT centers use incremental backups during the work week ensuring they can still meet their backup window. However, they are still forced to do a full backup on a weekly basis. Why? To understand this, we need to discuss recovery performance. Though file-level incremental backups help meet the backup window, they are backing up only changed files since the last backup. So during restores, a full restoration of data will be slower as all incremental copies of the data will be needed for recovery.

And, keep in mind, whenever we say backup time for a file-level backup, it includes the time it takes to scan the entire file system (that might have millions of files) in addition to the time it takes to transfer the data over the network and write it to the backup destination. Hence with file-level backup software, incremental backups also tend to take a long time limiting the ability to run multiple backups on a single day. This in turn impacts an organization's agility in meeting decreasing Recovery Point Objectives (RPOs) or a service provider's ability to prove adherence to those stringent Service Level Agreements (SLAs)

Despite these concerns, for most environments incremental backups are the only way to perform a daily backup to ensure the backup window is met while meeting the RPO (Recovery Point Objective) requirements. But how efficient are the file-level incremental backups? Can they really help meet RPO requirements? Is a daily incremental backup enough? What if the backup fails?

File-level Incremental Backups - Are they truly incremental?

Though a daily incremental backup sounds good, there are several drawbacks to this approach. For example, it is very difficult to meet the decreasing RPOs using a daily backup. What does this mean exactly? The decreasing RPOs relate to the fact that we are living in a 24x7 world and expectations for

being online and data accessible at all times are becoming the norm. Now while taking this into consideration, recognize that the number one need to recover a file is due to human error. So, if a user deletes a file, the expectation is to get that file back in the least amount of time possible which is the RTO (Recovery Time Objective). IT centers find it very challenging to meet the decreasing RPO and RTO requirements using file-level backup software.

The incremental backup technique is very different in file-level backup software. An entire file which has changed since the last backup will be backed up as part of an incremental backup irrespective of the amount of changes to that file. One might wonder what's wrong with that. Let’s understand this using an example.

Example

Consider two 10 GB files that are being backed up every day. The first backup of these files, on day 1, consists of a full backup of the entire data. So, the entire 20 GB data is transferred across the network and stored at the destination (backup target). After the first backup, the user modified one of the files which increased its size by an additional 1 GB. On day 2, if an incremental backup is performed then only the modified file is backed up. But the file-level backup software performs a backup of the entire file i.e. 11 GB. So, on day 2, 11 GB is transferred across the network and stored at the destination.

And what happens if a server fails between backups? Due to the nature of file-backups, administrators take fewer backups which mean the chance of losing data is much greater. So if a server fails, they only have the last backup to restore from. And if users have files open, the files might not be backed up at all. Depending on the environment, if users are working extended hours and the backup window keeps

shrinking as discussed earlier, you can see the recipe for disaster in terms of files that are not even backed up at all. Clearly, file level backups are inefficient in environments with performance requirements, growing data volumes, and shrinking backup windows.

Summary of file-level backups Decreases network performance as server must scan the entire file system each and every time

during a backup taxing the client resources Slower backups due to file system scanning Requires more network bandwidth due to inefficient incremental backups Relies on the outdated weekly full and daily incremental type of backup schemes leaving

exposures to data loss Slower recoveries Uses excessive space on the destination (backup target) by storing an entire file as part of an

incremental backup Makes it difficult to meet the decreasing RPO and RTO requirements Cannot backup open files Risks of losing data are much higher

What are Block-based backups?Block-based backup was introduced to the market in the mid-1990’s. However, it was not adopted widely or even accepted as a method of backup until mid-2000’s. Growing data volumes coupled with the move to virtualized environments definitely drove new requirements for data protection.

Block-based backup software tracks the block changes that happen during disk writes and backs up only the "modified blocks" as part of the incremental backups. This eliminates the file system scanning phase that occurs in file-level backup software during backup operations.

All files are written as bits (binary digits 0 and 1) on the disk and block-based backup software utilizes this basic concept to optimize the amount of data that is transferred as part of the incremental backups. The software keeps track of the block changes as they are happening (disk writes) using a change journal. At the time of the scheduled backup, the journal is referenced to identify the changed blocks (incremental) and only those blocks are transferred to the destination

Example

Let us look at the previously considered scenario from a block-based backup software perspective.

The first backup is the same, a full backup of the entire data, irrespective of the type of software. But, on day 2, when an incremental backup is performed only the updated data blocks are transferred over the network i.e. 1 GB. Compare this with the 11 GB in case of a file-level incremental backup - A huge saving of network bandwidth and disk space on the destination (backup target).

Also, the file-based backup required a full backup weekly. A block-based backup only requires that full backup the first time. After that, all backups are incremental forever. In the exact same way your operating system keeps track of all the bits that make up your file, the block-based backup software essentially does the exact same thing for backups. Thus, block based backups are essentially using the same type of technology used since the first hard drive was introduced. Applying these principles to backups was just not introduced initially and thus, file-based backups became a standard way to backup files from the 80’s through the first part of 2000.

Further complicating the backup space is the fact that not all software is created equally. So, even if a backup software claims to be block-based, it must have been developed to support incremental backups forever. Otherwise, users will still have to adhere to the high impacts of weekly full backup schemes.

Block-based backup software: Impact on RPOs and RTOs

A major advantage of block-based backup software comes from the way it performs incremental backups. Since only modified blocks are transferred as part of incremental backups, the network bandwidth utilization is optimized and the target disk space usage is efficient. Additionally, since the amount of data that is transferred over the network is less and there is no file-system scanning involved, block-based software can easily meet backup windows thus giving the ability to perform multiple incremental backups in a day. What does this mean to a user? Increased recovery points (RPOs). As we discussed earlier, users expect to have their data recovered and this approach to backup increases the chances of being able to recover data. Furthermore, even open files can be backed up with the block-based approach, further diminishing the chances of data loss.

And when it comes to the Recovery Time Objectives (RTO), block-based backup software can perform exponentially better than the file-level backup solutions. This is very important for service companies that must meet their service level agreements (SLAs) when it comes to data accessibility. For example, a file that could take hours to recover with file-based backups could be recovered in less than two minutes. A server that would have taken days to recover with the file-based approach can be recovered in less than 30 minutes with block-based technology. Clearly, block-based backups are highly efficient ensuring there is not degraded network performance while at the same time, reducing backup times, improving recovery times, and increasing recovery points.

Summary of block-based backups Faster backups as only changed blocks need to be backed up Efficient use of network bandwidth as incremental backups are truly incremental Efficient use of target disk space Allows incremental forever type of backup schemes

Better RPOs by allowing multiple backups in a day Shorter RTOs due to efficient restores Provides better SLAs for backups and recoveries

Are Block-based backups always better?Here comes the million-dollar question - Which backup method is better? And, as mentioned earlier "It depends" (no pun intended). Block-based backup software also has limitations. Since it is highly reliant on the block architecture of the storage disk, not all file systems support block-based backup software. It is required for the underlying file system to store data in blocks to be able to use block-based backup software. There are environments where only file-level backup software works. For example, when using a file system that is not block-based such as CIFS and NFS, it is critical to ensure compatibility with the backup software.

But, in most instances block-based backup solutions outperform file-level software. An ideal solution would be to use software that provides both file-level and block-based options allowing users to choose the appropriate method based on the backup job. Not many backup software products in the market provide both these capabilities. And if they claim to, then it is critical to examine the details. As mentioned previously, block-based backups should provide incremental forever schemes, yet some backup software still requires base backups often. So the users are advised to pick the right solution that meets their needs by conducting research and assessing the solutions.

Can File-Level Software also perform block-based backups?Most file-level software in the market cannot provide the same features or performance of block-based backups due to the architectural differences between these two backup types. Block-based backup software operates at the file system level tracking the blocks whereas file-level backup software operate at the files and folders level.

However, a few block-based backup software products can provide options for file-based backups also. The DPX software, offered by Catalogic, combines the best of both worlds by allowing administrators to perform both block-level and file-level backups using a single product and user interface.

Catalogic DPX and The NSB SolutionAmong the few backup products in the market that can provide both file-level and block-based backup options, Catalogic DPX software has proven its efficiency for more than 20 years.

Catalogic Software partnered with NetApp to extend the capabilities of the ONTAP operating system to deliver a full data protection solution. The NSB Solution combines the Catalogic DPX software with the high performance NetApp storage systems to provide a world-class data protection solution that works across physical, virtual, and cloud environments providing both block-based and file-level backup capabilities for heterogeneous storage environments.

With a partnership that goes back fifteen years, Catalogic fully understands NetApp's highly efficient ONTAP® platform and developed DPX software to fully compliment and maximize NetApp storage systems capabilities for an optimum data protection solution. For the customer, the complete integration of Catalogic software with NetApp storage translates into a comprehensive data protection solution with a full set of capabilities including backup, disaster recovery, copy data management, bare metal restore, file or application restores, tape backups, tiered backups, archiving, ROBO and more! And while the solution requires ONTAP as a target device, it does not limit what can be backed up. DPX will back up any heterogeneous storage, physical server, or virtual machine. And, as part of the NSB solution, DPX leverages the high-performance NetApp storage systems to optimize the storage utilization on the backup destination by using techniques like deduplication, compression, snapshots, FlexClone®, and other ONTAP functionality.

While the NSB solution focuses on NetApp, Catalogic has also partnered with NetApp to deliver this same solution for NetApp’s E-Series and EF-Series.

Customers that prefer other target devices will find this option is also available as Catalogic DPX can provide one data protection solution across physical, virtual and cloud eliminating the need for point solutions that focus on individual applications or environments.

Catalogic DPX Catalogic has enhanced the performance your data protection environment by:

Providing faster and more consistent backups by leveraging block-level and snapshot techniques

Allowing faster and more innovative recoveries using its proprietary architecture Enhancing integration with applications allowing more frequent backups and a wide range

of recoveries - granular to server-level Supporting physical and virtual environments with a single, easy-to-use solution

DPX's proprietary architecture leverages snapshot techniques and change journal mechanisms to perform block-level incremental forever backups allowing efficient utilization of network bandwidth by transferring only changed blocks as part of incremental backups.

DPX software also tightly integrates with hardware and software vendors’ native technologies like Microsoft VSS, NetApp storage system's snapshot and FlexClone technologies, and Windows

deduplication techniques for efficient backup and recovery performance. In instances where organizations have proprietary, outdated, or unsupported applications/software/servers that need to be protected, DPX customers can take advantage of scripted backup techniques using application-specific or array-based snapshot mechanisms. Or, users can also easily switch to the file backup mode to perform traditional, file-level backups for such specific applications/servers.

DPX customers have reduced their backup times by more than 90% while achieving more than 99% backup success rate. While providing such tremendous backup performance, DPX also offers very unique and advanced recovery options like Instant Access, Instant Virtualization, Full Virtualization, Rapid Return to Production, and Bare Metal Recovery. Users can recover an entire server as a virtual machine within minutes for instantaneous access to data and applications, or they can leverage the simple-to-use Bare Metal Recovery feature which allows users to recover a server as a physical machine in 3-easy steps.

Catalogic DPX can perform fast recoveries at granular levels such as files, entire applications, Exchange object levels, Sharepoint farms, SQL or Oracle databases, entire servers, or entire data centers.

DPX performs application-consistent, block-level backups for applications like Microsoft Exchange, Microsoft SQL, Microsoft SharePoint, and Oracle. While taking advantage of the Block-Level Incremental Forever type of backup schemes, backup administrators can perform more frequent backups of the mission critical data hosted by these applications thereby improving RPOs and SLAs. Furthermore users can still perform granular recoveries like individual messages, calendar items, notes, tablespaces, documents, and so on.

DPX - An easier way to achieve global deduplicationBy transferring only changed blocks as part of incremental backups, DPX is performing Source-Side Data Reduction. Combining this with the deduplication and compression features on the target NetApp storage system, the NSB solution provides an alternative to global deduplication without the limitations of resource contention (high CPU usage) on the client or complex architecture. For other target storage devices, Windows deduplication features are leveraged providing yet another option for users to align the technology to their environment.

Summary of Catalogic DPX

Catalogic DPX allows organizations to leverage the best of both block-based and file-level backup worlds in a single product. While achieving a backup success rate of more than 99%, DPX reduces the backup window by more than 90%. By optimizing network bandwidth utilization, DPX allows organizations to easily meet the ever decreasing RPO and RTO requirements. With its tight integration with critical applications like Exchange, SQL, SharePoint and Oracle, DPX allows users to perform application-

consistent backups more frequently and granular recoveries instantaneously. Using its unique and advanced recovery options like Instant Access, Instant Virtualization, Full Virtualization, Rapid Return to Production, and Bare Metal Recovery, DPX allows administrators to easily (with a few mouse clicks) and quickly (within minutes) restore files and servers, both physical and virtual.

Traditional Backup

DPX based NSB(Block-level, snapshot-based)

DPX with Any Storage(Block-level, snapshot-based)

Major Differentiators

Backup model File Block and File Level Block levelBackup time / frequency (RPO) One > 10 hrs * Multiple < 2 hours * Multiple < 2 hours *Application integration Yes Yes YesApplication server impact High Low Low

Recovery model File RestoreMulti-level restore (File/Volume/App/Server)

Multi-level restore (File/Volume/App/Server)

Recovery time (RTO) Hours / Days2 - 5 minutes< 30 min. for entire server

2 - 5 minutes< 30 min. for entire server

Infrastructure and Architecture

Media servers Multiple None DPX Open Storage ServerData Movement on Network High Low LowRequires single purpose HW Yes No NoTotal Cost of Ownership High Lower Lowest

* Representative backup times based on customer experiences in production environmentsThis chart compares the traditional, file-level backup model against DPX using block-level, snapshot-based backups with a NetApp target device.

Catalogic DPX integrates backup, disaster recovery, bare metal recovery and copy data management into one easy to use solution that performs the fastest backups and recoveries with more efficient usage of storage, network bandwidth, and resources. DPX is truly a universal data protection solution for the modern day IT data centers.

www.catalogicsoftware.com

http://www.catalogicsoftware.com/

what - static.spiceworks.com · web viewfile-level backups are the most traditional and commonly...

Documents