h8116 sql data domain wp

27
White Paper Abstract Users are faced with many options and tradeoffs when choosing a backup strategy for Microsoft SQL Server databases. This white paper maps out those choices and examines how EMC ® Data Domain ® deduplication storage preserves data integrity, meets stringent RTO/RPO objectives, and integrates easily into a multitude of active SQL or third-party backup environments. November 2010 MICROSOFT SQL SERVER BACKUP AND RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN

Upload: backspa

Post on 27-Sep-2015

218 views

Category:

Documents


2 download

DESCRIPTION

gfd

TRANSCRIPT

  • White Paper

    Abstract

    Users are faced with many options and tradeoffs when choosing a backup strategy for Microsoft SQL Server databases. This white paper maps out those choices and examines how EMC Data Domain deduplication storage preserves data integrity, meets stringent RTO/RPO objectives, and integrates easily into a multitude of active SQL or third-party backup environments. November 2010

    MICROSOFT SQL SERVER BACKUP AND RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN

  • 2 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Copyright 2010 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners. Part Number h8116

  • 3 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Table of Contents

    Executive summary.................................................................................................. 5 Audience ............................................................................................................................ 5

    Introduction ............................................................................................................ 6 Additional concepts ........................................................................................................... 7

    Summary of best practices....................................................................................... 8

    SQL background ...................................................................................................... 9 Recovery models ................................................................................................................ 9 Recovery techniques ........................................................................................................ 10

    Terminology .......................................................................................................... 13 Types of backups ............................................................................................................. 13 COMPRESSION ................................................................................................................. 14 BLOCKSIZE ....................................................................................................................... 14 BUFFERCOUNT .................................................................................................................. 14 MAXTRANSFERSIZE ........................................................................................................... 15 Stripes ............................................................................................................................. 15 BACKUP command ........................................................................................................... 16

    EMC Data Domain product background .................................................................. 16 Advantages of EMC Data Domain in a SQL environment .................................................... 16 Data transfer rates ............................................................................................................ 17

    Integration ............................................................................................................ 18 Planning ........................................................................................................................... 19 Important options............................................................................................................. 20

    Backup types ............................................................................................................... 20 Compression ................................................................................................................ 20 Multiplexing ................................................................................................................. 20 Network........................................................................................................................ 21

    Third-party backup applications ....................................................................................... 21

    Microsoft recommendations .................................................................................. 22

    Conclusion ............................................................................................................ 22

    Appendix A: Additional resources .......................................................................... 23 Microsoft links.................................................................................................................. 23 EMC Data Domain links .................................................................................................... 23

    Appendix B: Backup compression .......................................................................... 23 Bottlenecks addressed by compression ........................................................................... 24 Compression challenges .................................................................................................. 24

    Compression and deduplication ................................................................................... 25

  • 4 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Appendix C: Index fragmentation ........................................................................... 25 Addressing the challenge ................................................................................................. 26

  • 5 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Executive summary Many database administrators prefer native Microsoft SQL Server backups directly to disk compared to using third-party backup applications. When utilizing native SQL Server backup, there is no reliance on the backup administrative team to perform backups or play a role in database recovery. Additionally, there is no longer a need for the database administrator to become proficient in deploying, configuring, administering, or maintaining third-party backup applications.

    Historically, native SQL backups have been the target of some criticism for a couple of reasons:

    Native SQL backup facilities do not provide automated media management capabilities. While backups performed to disk media eliminated the challenge of manually managing tape cartridges, this method also introduced the need for additional disk. The cost of disk versus removable tape media was significant.

    In addition, backup to disk did not meet the requirement of retaining an offsite copy of database backups as part of a disaster recovery strategy. Native backup to disk fell short of providing a viable solution for this requirement.

    Deployed as database backup media, EMC Data Domain deduplication storage systems address the historical pitfalls of performing native database backups:

    EMC Data Domain storage systems optimize storage capacity, making retention and replication of backup data extremely cost- and bandwidth-efficient by providing 10-30x data reduction.

    EMC Data Domain systems are simple to integrate utilizing traditional backup software, but also offer an alternative with high-speed, cost-effective backup directly to a CIFS network share, utilizing native SQL backup. Users have the choice to eliminate any need for third-party SQL Server backup application agents and their associated operational costs and maintenance fees.

    EMC Data Domain replication software empowers users to create offsite backup copies utilizing bandwidth-efficient wide area network (WAN) links, enabling faster time-to-DR.

    This white paper provides information about the use of EMC Data Domain deduplication storage as backup media for Microsoft SQL Server backups.

    Audience

    The target audience includes data protection architects, SQL Server database administrative staff, and backup administrators seeking information about integrating EMC Data Domain deduplication storage as a component in a comprehensive data protection strategy.

  • 6 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 1. The native database backup tool performs a full database backup to disk. The tool is easy to use and provides a feature set that addresses many business requirements.

    Figure 2. The NetBackup MS SQL Client graphical user interface is an example third-party backup application that uses VDI to interface with SQL Server.

    Introduction Microsoft SQL backup methodology falls into one of two generic categories. The first consists of native SQL Server database backups. This backup technique creates SQL database backups using tools and utilities native to Microsoft SQL Server and does not rely on third-party backup application software (Figure 1).

    Benefits include the use of backup and recovery interfaces familiar to the database administrative staff. This ability is included with Microsoft SQL Server, and there are no additional third-party software license fees.

    The second backup methodology uses third-party backup application software that integrates with Microsoft SQL Server to perform SQL database backups based on the Virtual Device Interface (VDI). This solution is typically packaged as a database agent specifically for Microsoft SQL Server and a particular backup application. When VDI is used, the backup application allows setting customized backup and recovery parameters similar to those that can be employed when using native Microsoft SQL tools and utilities.

    Third-party backup software may also use available snapshot technologies designed to enhance functionality or otherwise add value to backup and recovery processes (Figure 2).

    When the snapshot type is based on Microsoft Volume Shadow Copy Service (VSS), the backup application is the VSS requestor, the SQL Server is the VSS writer, and backup is coordinated with a VSS provider. Advanced backup and recovery features such as disk staging and instant recovery may be available with these implementations depending on the backup application and agent being used.

    Drawbacks to this strategy may include a user interface foreign to the database administrative staff and substantial third-party backup application license fees.

  • 7 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Additional concepts

    Many customers utilizing the native Microsoft SQL Server database backup methodology augment the solution with third-party backup client agents that effectively protect the native backup data as a flat file. This two-phased methodology is effectively backing up a backup. Among the perceived benefits of the augmented solution is that it allows segregation of the SQL database administrative staff from the data protection staff while providing means to retain database backups in conformance with sound business practices (Figure 3).

    Figure 3. The methodology of an augmented backup solution uses two backup solutions in conjunction to satisfy business objectives. SQL native database backups are performed to a Data Domain system and are subsequently backed up by a third-party backup application and written to the same Data Domain system.

  • 8 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Summary of best practices For those already well briefed on both Microsoft SQL Server and EMC Data Domain, Table 1 presents a summary of the suggested best practices. Explanations and reasoning for these suggestions are discussed later in this paper.

    Table 1. Summary of recommended settings

    PARAMETERS AFFECTING DEDUPLICATION PERFORMANCE SETTING SQL Server 2008 native compression NO_COMPRESSION Third-party backup application SQL Server local compression Disabled

    Third-party backup application multiplexing None PARAMETERS AFFECTING BACKUP AND

    RECOVERY PERFORMANCE SETTING

    BLOCKSIZE Default 512 byte or higher based on performance improvements

    BUFFERCOUNT Minimum 2 buffers per stripe (requires additional memory based on the MAXTRANSFERSIZE value)

    MAXTRANSFERSIZE 4194304 (requires available memory based on the BUFFERCOUNT value)

    Stripes Consider the use of multiple stripes to improve backup and restore data transfer rates

    Server Disk Subsystem Database and log files should be placed on disk storage with performance attributes facilitating required transaction and backup performance metrics

    IP Network Dedicated backup network that meets or exceeds bandwidth requirements for the desired data transfer rate

    EMC Data Domain System Sized to meet or exceed ingest rate and backup retention capacity requirements

    MOUNT OPTIONS SETTING When performing native database backups UNC path When using a third-party backup server Dependent on backup application

    and server OS type MISCELLANEOUS OPTIONS SETTING

    Co-mingling native and third-party backup application database backups

    Yes (this should have a negligible impact on deduplication ratios)

    Replication Yes, use the EMC Data Domain system to replicate database backups to the remote DR site

  • 9 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 4. Master, model, msdb, and tempdb system databases are provided by the Microsoft SQL Server Management Studio interface.

    SQL background A Microsoft SQL server instance includes system and user databases. System databases are created at installation and include:

    The master database, which records all system-level information for a Microsoft SQL server. It contains records for all login accounts and all system configuration settings. The master database records the existence and location of all other databases.

    The model database, which is used as a template that contains the default settings for all databases created within the Microsoft SQL Server instance.

    The msdb database, which is used for scheduling, alerts, and jobs.

    The tempdb database, which serves as a global resource that contains all temporary tables and temporary stored procedures. It is re-created every time the Microsoft SQL Server instance is started.

    Data protection strategies for the system databases are dependent on the database being protected. For instance, transaction log backups are not supported for the master database. The master database cannot be recovered if a functional version of it does not already exist. Recovery procedures for the master database may include re-installing Microsoft SQL Server such that a backup of the pre-disaster master database can then be restored.

    The model and msdb databases can contain customized data such as user-specific templates, scheduling information, as well as backup and restore history information. Without a data protection strategy, these items will need to be manually reconstructed in the event of a disaster.

    The tempdb database is empty when the SQL instance is shut down, and does not require protection as it is re-created at startup.

    Recovery models

    Microsoft SQL Server includes three recovery models: simple, bulk logged, and full. The desired recovery model can be deployed based on requirements. Functionally,

  • 10 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 5. Selection of the desired recovery model via the Database Properties dialog box.

    each recovery model differs with regard to how backup and recovery strategies are executed.

    The full recovery model includes log backups. This model typically has no exposure to data loss. Point-in-time recovery is possible, up to including the last committed transaction.

    The bulk logged recovery model requires log backups. This model permits high-performance bulk copy operations. Recovery to the end of any backup is possible; point-in-time recovery is not supported.

    The simple recovery model consists of performing full backups only. Logs are not backed up. In the event database recovery is required, the most recent full backup can be restored. Any changes that occurred subsequent to the last full backup must be redone. From a transactional perspective, the database can only be recovered to the point of the prior full backup.

    Recovery techniques

    The technique used to restore a database will vary based on the recovery model being used as well as the backup types being performed. Figures 6-9 provide a brief look at restoring a database that was protected using the full recovery model with full and transaction log backups. A single full backup was performed, followed by five transaction log backups.

  • 11 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 6. Restore Database dialog box and general database restore attributes. By default the full backup and subsequent transaction log backups are all selected. Clicking the OK button would initiate recovery to the most recent possible point in time. Alternatively, recovery to a specific point in time is also possible.

    Figure 7. Restore database options and available database recovery options. By default an existing database will not be overwritten. Also note that by default the recovery state is RESTORE WITH RECOVERY, which leaves the recovered database in an online and usable state after the restore process completes.

  • 12 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 8. An example of a recovery transaction that restores the initial full backup followed by the first transaction log backup. The remaining transaction logs were not included in this query for brevity.

    Third-party backup applications will each have a unique recovery interface for databases. Many automate and coordinate the recovery of full and transaction log backups similar to the way native Microsoft SQL Server tools and utilities do.

    Figure 9. In this example from the NetBackup MS SQL client restore GUI, a single full database backup and 5 transaction log backups are available for recovery.

  • 13 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Terminology Entire databases, specific database files, file groups, and transaction log backups are among the supported backup types with Microsoft SQL Server. This section defines the terminology associated with a given backup type.

    Types of backups

    Database backups

    Database Backup This is a full backup of an entire database and represents the state of the database at the point when the backup is completed.

    Differential Database Backup This is a backup of all the files within a database, and contains only the extents modified since the most recent full backup of each file. Restoring a database protected with full and differential backups to the most recent point in time includes recovering the most recent full and differential backup.

    Partial backups

    Partial Backup Partial backups provide flexibility for backing up databases that contain some number of read-only file groups. This is a partial backup of all data in the primary filegroup, each read/write filegroup, and any optionally specified read-only files or filegroups.

    Differential Partial Backup This backup contains only the extents modified since the prior partial backup of the same set of filegroups.

    File backups

    File Backup This consists of a full backup of all data in one or more files or filegroups.

    Differential File Backup This is a backup of one or more files containing data extents changed since the prior full backup of each file.

    Transaction log backups

    Regular transaction log backups are required when using the full or bulk-logged recovery models. This backup contains all log records that have not been backed up previously.

    Copy-Only backups

    Database backups usually change the database in some way, such as truncating a transaction log in the case of a full database backup. Copy-Only backups can be used in cases where a backup of a database is required without changing the database.

  • 14 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 10. In SQL Server 2008 native compression, the Compress backup service level property is used for backup jobs that do not explicitly enable or disable compression.

    COMPRESSION

    Specific to SQL Server 2008 Enterprise and later versions, backup compression can be enabled or disabled. The default product installation does not compress backups. A server-level compression setting can be applied that alters default behavior. The use of the COMPRESSION keyword within a backup SQL transaction explicitly enables backup compression. The use of the NO_COMPRESSION keyword within a backup SQL transaction explicitly disables backup compression.

    BLOCKSIZE

    The BLOCKSIZE keyword can be used to alter physical block size used when writing to backup media. By default the backup process will automatically select a block size appropriate for the backup device. Supported sizes are 512, 1K, 2K, 4K, 8K, 16K, 32K and 64K bytes. The default value used for disk backup is 512 bytes.

    The default 512-byte size yields excellent performance with EMC Data Domain systems. Third-party backup applications may substitute their own default value. The fact that this parameter can be adjusted is included as reference. The use of larger sizes may improve or degrade performance. Users are encouraged to investigate further to determine what value may provide optimal results in their environment.

    BUFFERCOUNT

    The BUFFERCOUNT keyword specifies the total number of I/O buffers used for the backup process. Any positive integer value can be specified. The practice of using a minimum of two buffers per stripe is recommended.

    This practice simultaneously provides one buffer that can be written into from the database (a reader thread) and one buffer that can be read out of for data transfer to a storage device (a writer thread). Buffers consume memory on Microsoft SQL Server based on the BUFFERCOUNT and MAXTRANSFERSIZE keyword parameters.

  • 15 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 11. A full backup of the Test_One database using the optional BUFFERCOUNT keyword with a parameter value equal to 1. This query executed in 118.238 seconds with a data transfer rate equal to 49.748 MB/s.

    Figure 12. The same full database backup represented in Figure 11 using the optional BUFFERCOUNT keyword with a parameter value equal to 2. The use of two buffers increased backup data transfer performance by approximately 9% when compared to using a single buffer.

    MAXTRANSFERSIZE

    The MAXTRANSFERSIZE keyword specifies the unit of transfer in bytes used between SQL Server and the backup media. Values can range from 64 KB to 4 MB. Larger units of transfer are generally preferred to smaller values.

    Excessive use of buffers combined with larger units of transfer consumes Microsoft SQL Server memory. Care should be taken to avoid memory-related errors as the result of using these parameters.

    Stripes

    While not a keyword within the context of Microsoft SQL Server, the term stripes correlates to the number of simultaneous backup streams to be created for a given backup operation. In the case of disk backups with SQL Server, multi-streamed backups are performed by specifying a number of backup disk targets with the BACKUP command.

  • 16 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    BACKUP command

    The recommended use of SQL stripes is as a speed matching technology. Multiple backup streams from a given database can be simultaneously written to a target EMC Data Domain system in an effort to achieve an aggregate data transfer rate that aligns with business requirements.

    Figure 13. A multi-striped database backup that uses eight stripes in an effort to improve backup data transfer rate performance. Multiple stripes can be used to better match data transfer rate capabilities between source and destination media.

    EMC Data Domain product background EMC Data Domain deduplication storage systems minimize backup and recovery times, storage and network bandwidth, and risk of data loss. EMC Data Domain offers a comprehensive range of products to meet the backup and archive storage needs of companies of all sizes as they seek to reduce costs and simplify data management.

    EMC Data Domain systems also offer replication that is extremely easy to deploy. The primary advantage of EMC Data Domain system replication is that the data is all deduplicated and compressed prior to being sent over the network.

    Advantages of EMC Data Domain in a SQL environment

    EMC Data Domain systems can be directly integrated into Microsoft SQL Server environments as disk backup media. In addition, EMC Data Domain systems support all leading enterprise backup and archive applications for seamless integration into existing IT infrastructures.

    The use of different backup methodologies with Microsoft SQL Server and EMC Data Domain systems typically has a negligible effect on overall data deduplication ratios. This enables performing native database backups in conjunction with database backups controlled by a third-party backup application without affecting deduplication efficiency. This includes third-party backup applications that use an SQL agent, with or without VSS snapshots. Additionally, the use of different numbers

  • 17 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 14. Database backup to a null device (part 1). The results of the query indicate that the theoretical maximum rate at which the SQL Server backup function can extract data from this database using a single stripe is approximately 80 MB/s. Regardless of the data transfer rate at which backup media can accept data, backing up this database as it currently stands will be speed limited to 80 MB/s when using a single stripe.

    of stripes or different BLOCKSIZE values also has a negligible impact on deduplication ratios.

    EMC Data Domain replication can be used to create offsite copies of SQL backups faster and more economically than legacy tape-based strategies. EMC Data Domain replication makes advanced disaster recovery preparedness for SQL Server a reality.

    Data transfer rates

    Multiple business objectives are considered when determining required backup and recovery data transfer rates. Decision criteria include backup window duration, log growth, and recovery time.

    By definition, slow backups are those that fail to meet or exceed business objectives. Understanding factors that can affect performance is critical to removing them from the environment.

    A reasonable place to start any backup performance investigation is to understand the theoretical maximum speed at which SQL Server can process a given database backup. Performing a database backup to a null disk device provides an estimate of that maximum achievable speed in a given environment (Figure 14).

    Figure 15. Database backup to a null device (part 2). We see improved results as the single stripe database backup to a null disk device now executes at more than twice the initial data transfer rate.

  • 18 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 16. Similar to multi-striped backups, the use of multiple null disk devices increases the number of readers used during the backup process. Consider the use of non-default values for BUFFERCOUNT and MAXTRANSFERSIZE in addition to the use of multiple backup stripes when investigating database backup performance with one or more null disk devices (see above). Once an acceptable null device backup data transfer rate is achieved, additional steps can be taken to understand and remove other bottlenecks from the remainder of the backup process.

    Figure 17. Nominal database backup performance improvement. This query shows a moderately tuned eight-stripe SQL database backup with an aggregate data transfer rate of approximately 172 MB/s, indicating that the network-attached backup devices are now limiting throughput.

    Integration Direct integration with Microsoft SQL Server, where the EMC Data Domain system is used as disk backup media, is accomplished by using the Data Domain system as a CIFS share. As a general rule, the UNC path to the share should be used instead of a mapped drive because: a) scheduled backups may execute when no user is logged in to the server and b) when Sqlservr.exe is executed as a service, it has no relation to a login session.

  • 19 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Third-party backup applications used to protect Microsoft SQL Server can also take advantage of EMC Data Domain systems employed as backup media. Data Domain systems are easily configured as supported backup media types including VTL, CIFS share, NFS mount, or an OpenStorage disk pool (OpenStorage requires an OST-compliant backup application such as Veritas NetBackup from Symantec). Additionally, OpenStorage adds enhanced backup image replication capabilities known as optimized duplication.

    In this scenario, backup images are replicated from one EMC Data Domain system to another under the direct control of NetBackup. NetBackup monitoring, reporting, and cataloging of duplicates can be used to architect a comprehensive disaster recovery plan.

    Planning

    Capacity and performance planning play a critical role in both successful deployment and ongoing production usage of an EMC Data Domain system. A detailed capacity analysis should be performed by a knowledgeable Data Domain system engineer. The analysis considers database sizes, growth rates, change rates, and retention periods as input criteria. Performance analysis considers data points such as the required aggregate data transfer rate for backups, connection topology requirements to support the data transfer rate, and the Data Domain system required to meet or exceed the required data transfer rate.

    Beyond capacity and performance planning are additional considerations for EMC Data Domain system replication.

    What database backups should be replicated?

    Replicating all database backups is certainly possible. However, many users will want to implement replication at a more granular level. Production database backups are usually excellent replication candidates, whereas development and test database backups are less critical. An analysis of network bandwidth and destination disk space requirements should be performed by a knowledgeable EMC Data Domain system engineer.

    Will database backups be replicated to a disaster recovery site, or between multiple production sites?

    Backups are typically replicated to serve as a second backup copy for recovery in the event of a disaster. When backups from a primary site are being replicated to a secondary site, planning is relatively straightforward. Users with multiple primary sites may decide to implement a bidirectional replication solution where database backups from either site are replicated to the alternate site. Proper planning should render an outline detailing which database backups are being replicated to each location.

  • 20 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Will tape-based backup copies be required?

    Some users replicate backup images to a central location for disaster recovery purposes while also using the solution as a vehicle that enables centralized tape creation. The third-party backup application used to create tape-based backup copies will dictate any additional considerations or restrictions that this solution involves. A knowledgeable EMC Data Domain system engineer will be able to assist with this planning task.

    Important options

    Backup types

    The goal of backups is to satisfy recovery time and point objectives. Outlining a strategy of full, differential, and transaction log backups is beyond the scope of this paper. That stated, there are a few key points worth noting:

    Performing full backups frequently with EMC Data Domain deduplication storage does not create a storage usage penalty, as redundant database segments do not consume additional disk space. While this may appear to enable the ability to perform full backups more frequently, the load full backups place on the SQL server and connection topology to the EMC Data Domain system should be taken into consideration.

    When split-mirror or snapshot backups are performed and controlled by a third-party backup application, the EMC Data Domain system is easily integrated as a backup storage device. The features provided by these backup techniques (low-impact backups, instant recovery, and so on) do not preclude the use of EMC Data Domain technology.

    Compression

    EMC Data Domain recommends not using Microsoft SQL Server-based compression in conjunction with backups written to EMC Data Domain systems. This topic is covered in greater detail in Appendix A: Additional resources.

    Multiplexing

    When the EMC Data Domain system is integrated as a backup device with a backup application that supports multiplexed backups, Data Domain recommends disabling multiplexed backups. Multiplexing limits the ability of the EMC Data Domain system to deduplicate incoming data.

    Historically used as a speed matching solution where multiple slower data streams were multiplexed into a single stream to take advantage of a somewhat faster tape drive, backups to disk derive no advantage from multiplexing. Whether deployed as a CIFS share, NFS mount, VTL, or OpenStorage disk pool, EMC Data Domain systems accommodate writing multiple backup streams in parallel without multiplexing.

  • 21 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Network

    When EMC Data Domain systems are deployed as a CIFS backup share, Data Domain recommends interconnecting SQL servers and EMC Data Domain systems using a dedicated backup area network. When deployment is in conjunction with a backup application as a CIFS share, NFS mount, or OpenStorage disk pool, EMC Data Domain similarly recommends interconnecting backup application media servers and EMC Data Domain systems using a dedicated backup area network.

    Whenever possible, the network used for backup and recovery communications should be segregated from other production networks. This best practice recommendation seeks to assure that network bandwidth is available for backup and restore jobs to meet or exceed business objectives.

    Network bandwidth requirements may dictate the need for a topology that supports data transfers in excess of 125 MB/s. All EMC Data Domain systems support the use of multiple GbE network interfaces, and many support the use of 10 GbE network interfaces.

    A knowledgeable EMC Data Domain system engineer will be able to assist with planning the deployment based on user requirements and available resources.

    Third-party backup applications

    When EMC Data Domain systems are integrated with third-party backup applications, it is important to note that Microsoft SQL Server backup parameters are handled the same as when compared to a native SQL Server backup implementation. The COMPRESSION, BLOCKSIZE, BUFFERCOUNT, and MAXTRANSFERSIZE keywords, as well as any striping, are still valid parameters. Some of these settings may be unavailable when using a third-party backup application (Figure 18 and Figure 19).

    Figure 18. Third-party backup applications like NetBackup 6.5.3 may allow the use of keyword parameters similar to native SQL Server backup tools. The figure shows the NetBackup MS SQL Client interface. Note as of NetBackup version 6.5.3 there is no ability to override the SQL 2008 server level compression setting.

  • 22 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 19. With Backup Exec 12.5, you can set SQL Server 2008 compression on a per job basis. Other parameters such as BUFFERCOUNT and MAXTRANSFERSIZE are absent from the Backup Job Properties dialog box.

    Users of third-party backup applications seeking to exploit the full complement of available Microsoft SQL Server backup options should contact their software provider in the event additional information is required.

    Microsoft recommendations A comprehensive collection of resources that address Microsoft SQL Server backup and restore are available online. This section includes a brief sampling of technical articles that can be referenced as required.

    Backing Up and Restoring Databases in SQL Server from SQL Server 2005 Books Online: http://msdn.microsoft.com/en-us/library/ms187048(SQL.90).aspx

    Backing Up and Restoring Databases in SQL Server from SQL Server 2008 Books Online: http://msdn.microsoft.com/en-us/library/ms187048.aspx

    Optimizing Backup and Restore Performance in SQL Server SQL Server 2005 Books Online: http://msdn.microsoft.com/en-us/library/ms190954(SQL.90).aspx

    Conclusion An EMC Data Domain system makes an excellent target for Microsoft SQL Server backups because it:

    Integrates easily and seamlessly into existing SQL Server environments

    Allows the database administrative team to retain a greater number of full backup images online, thereby optimizing recovery options while occupying minimal footprint in the data center

    Greatly reduces dependence on tape

  • 23 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Appendix A: Additional resources

    Microsoft links

    Microsoft SQL Server Community

    http://technet.microsoft.com/en-us/sqlserver/bb671048.aspx

    EMC Data Domain links

    EMC Backup and Recovery for Microsoft Applications Deduplication Enabled by EMC CLARiiON and Data Domain white paper

    http://www.emc.com/collateral/software/white-papers/h7051-backup-recovery-microsoft-deduplication-clariion-wp.pdf

    EMC Data Domain Family products and deduplication technology

    http://www.emc.com/products/family/data-domain-family.htm

    EMC Data Domain Global Deduplication Array

    http://www.emc.com/products/detail/hardware/data-domain-global-deduplication-array.htm

    EMC Data Domain Boost Software

    http://www.emc.com/products/detail/software/data-domain-boost.htm

    EMC Data Domain SISL Scalability Architecture A Detailed Review white paper

    http://www.emc.com/collateral/hardware/white-papers/h7221-data-domain-sisl-sclg-arch-wp.pdf

    EMC Data Domain Replicator Software A Detailed Review white paper

    http://www.emc.com/collateral/software/white-papers/h7082-data-domain-replicator-wp.pdf.pdf

    EMC Data Invulnerability Architecture: Ensuring Data Integrity and Storage System Recoverability white paper

    http://www.emc.com/collateral/software/white-papers/h7219-data-domain-data-invul-arch-wp.pdf

    Appendix B: Backup compression Performing compression on the SQL server when backups are executed can provide benefit by reducing the overall size of the backup. Smaller backups require fewer I/O operations to write to backup devices, consume less backup media, and may execute faster when compared to uncompressed backups.

  • 24 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    Figure 20. The top figure shows CPU utilization without server-based compression. The middle figure shows CPU utilization with SQL Server 2008 native compression. The bottom figure shows CPU utilization with a third-party solution using level 5.

    This appendix examines the tradeoffs associated with server-based compression, and the use of Microsoft SQL Server CPU cycles versus implementing a backup infrastructure that reduces impact to transactional processing performance.

    Bottlenecks addressed by compression

    Examining the backup data transfer path assists in providing an understanding of the backup bottlenecks that compression is able to circumvent:

    SQL Server to directly connected disk storage

    SQL Server to directly connected tape storage

    SQL Server network connected to a backup application media server

    Bandwidth constraints between the SQL server and destination storage device are mitigated with server-based compression as less data is being transferred between the server and storage device. Likewise, write speed limitations of the storage device are also mitigated by writing less data during a backup.

    Compression challenges

    While the benefits of compression are understood, there are potential drawbacks that should be noted:

    Compression consumes SQL Server resources

    Disk I/O related to reading database content is unchanged

    CPU resources on the SQL server are used to accomplish compression at backup time. If online transactions are impacted by the backup process, adding compression to the equation may induce a severe performance impact.

    Figure 20 details % Processor Time for the same database backup on a dual quad-core 2.66 GHz server platform with Microsoft SQL Server 2008. Data transfer rates and CPU usage vary widely in these examples. No SQL transactions or other activity beyond the single backup were executing at the time these metrics were captured. Note that all three backups used the same EMC Data Domain system as a disk backup device. Also note that the sampling rate used for the performance monitor for Figure 17 on page 18 was decreased to

  • 25 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    accommodate the longer running backup job.

    With server-based compression CPU usage is elevated to a point where non-backup transactions may be elongated. Backing up multiple databases simultaneously may not be practical with server-based compression.

    Compression and deduplication

    Native Microsoft SQL Server 2008 compression, or compression provided by a third-party backup application, occurs on the Microsoft SQL Server platform. EMC Data Domain deduplication technology is different in that Microsoft SQL backups are compressed and deduplicated on the EMC Data Domain system.

    Figure 21. Sample output of the sysstat command on a Data Domain system captured during a database backup. When compression and deduplication are performed on the Data Domain system, CPU usage on the SQL Server platform is greatly reduced when compared to the use of server-based compression. Also worth nothing is that the Net in data transfer rate is near the theoretical maximum that can be achieved with 2 GbE network connections. The next logical step to eliminate this bottleneck would be to use additional GbE interfaces or employ a single 10 GbE network connection.

    Pick one form of compression, but not both

    The recommended best practice is to architect a solution that compresses database backup data once. There are multiple reasons for this. First, compressing data that has already been compressed usually ends with a larger resulting data set when compared to compressing the data once. Second, the result of multiple compression operations has a negative impact on EMC Data Domain deduplication where the efficient use of disk is reduced.

    In short, the EMC Data Domain system is designed specifically to optimize compression and deduplication. To get the full value from the EMC Data Domain system, letting it perform the compression will always return the best results.

    Appendix C: Index fragmentation Index fragmentation affects I/O performance of queries whose data pages do not reside in the Microsoft SQL Server data cache. A variety of techniques are commonly

  • 26 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    used to reduce index fragmentation, including but not limited to DBCC INDEXDEFRAG, DBCC DBREINDEX, and CREATE INDEX WITH DROP EXISTING.

    While these techniques are effective in reducing index fragmentation, they can also have a negative impact on deduplication. Database administrative teams that routinely defragment all indexes at some predetermined frequency may notice reduced data deduplication rates on their EMC Data Domain systems. The end result is reduced storage efficiency.

    Index defragmentation has the effect of reorganizing the pages within a database such that EMC Data Domain deduplication sees the backup data stream as new, unique data. In addition to the inefficient use of backup device storage space, this can also impact the ability to replicate database backups using EMC Data Domain replication.

    A greater quantity of unique data blocks equates to replicating a greater quantity of data over what may be a bandwidth limitedWAN.

    Database administrative teams may find themselves in a situation where index fragmentation impacts query performance, and frequent index defragmentation impacts backup storage device performance in terms of deduplication and replication rates.

    Addressing the challenge

    EMC Data Domain recommends addressing these challenges with a balanced approach. For instance, instead of defragmenting all indexes based on a schedule, consider defragmentation based on thresholds. Additionally EMC Data Domain recommends the use of index keys that are less prone to fragmentation in the first place.

    Is index fragmentation the only issue impacting transaction performance?

    I/O subsystem performance, memory usage, and CPU utilization can all have a negative impact on query performance. These issues should be diagnosed and resolved versus the use of frequent automatic index defragmentation to improve performance.

    File fragmentation can also impact performance. Many small databases sharing the same logical disk volume combined with the use of the autogrowth property can cause logically sequential database files to allocate non-sequential physical storage on disk. Ideally, administrators should set the size of database files at deployment to accommodate potential future growth.

    While it may be impossible to anticipate the size of a given database three years into the future, doing so helps to reduce the possibility that file fragmentation will impact query performance. If automatically growing database files is a requirement, consider growing in large chunks versus small chunks. It may be impractical to locate each database on a unique logical volume, but consider doing so for databases that are expected to grow considerably over time. Finally, disk file fragmentation can be

  • 27 Microsoft SQL Server Backup and Recovery Best Practices with EMC Data Domain

    reduced by Windows file system defragmentation utilities such as the Windows Disk Defragmenter.

    Do all indexes need to be defragmented or just a subset?

    EMC Data Domain recommends the use of index defragmentation tools based on thresholds and limits versus automatically defragmenting every index on every table whether it is required or not. The suggestion is to understand what indexes and their corresponding fragmentation levels impact performance.

    These indexes should be monitored for a specific fragmentation threshold, and action taken to defragment these indexes only when necessary. Selective index defragmentation will have less impact on production and will assist in preserving the ability to efficiently deduplicate database backups.

    Figure 22. DBCC showcontig command output. This figure includes extent scan fragmentation data indicating that index C_Customer_I1 does not require defragmentation at this time.

    Structuring indexes and keys so as to minimize fragmentation may or may not be realistic in all cases, but it should be considered as it potentially reduces the need to defragment indexes frequently. Index and key inserts that occur at the end of the table and index are likely to reduce fragmentation. Deletes that occur in contiguous chunks also assist in reducing fragmentation.

    Executive summaryAudience

    /IntroductionAdditional concepts

    Summary of best practicesSQL backgroundRecovery modelsRecovery techniques

    TerminologyTypes of backupsDatabase backupsPartial backupsFile backupsTransaction log backupsCopy-Only backups

    COMPRESSIONBLOCKSIZEBUFFERCOUNTMAXTRANSFERSIZEStripesBACKUP command

    EMC Data Domain product backgroundAdvantages of EMC Data Domain in a SQL environment/Data transfer rates

    IntegrationPlanningWhat database backups should be replicated?Will database backups be replicated to a disaster recovery site, or between multiple production sites?Will tape-based backup copies be required?

    Important optionsBackup typesCompressionMultiplexingNetwork

    Third-party backup applications

    Microsoft recommendationsConclusionAppendix A: Additional resourcesMicrosoft linksEMC Data Domain links

    Appendix B: Backup compressionBottlenecks addressed by compressionCompression challengesCompression and deduplication

    Appendix C: Index fragmentationAddressing the challenge