hp converged storage: advances in deduplication help...

6
IN TODAY’S HYPERCONNECTED WORLD, with its multiple mobile devices, ubiquitous Internet access and pervasive social media platforms, people expect immediate access to information and services. These expectations are increasingly felt in corporate IT departments, where business units demand instant applications and turn-on-a-dime services. Virtualization and cloud computing can help corporate IT meet these demands by helping it become more flexible and agile. But the ultimate solution is to transform the way IT is delivered. Many enterprises have already started on the journey toward a full IT as a service (ITaaS) model. As organizations travel this road, however, they often run into a wall. Actually, several walls, including those between the server, storage and networking functions. The traditional IT infrastructure is often too rigid to enable companies to fully utilize their IT resources. In many cases, servers, storage and networking have been built and managed separately, creating func- tional silos. And within the storage architecture, an explosion in the amount and types of data—coupled with new demands from the virtualization of servers and clients—has made storage increasingly inflexible and complicated to manage. These factors stand in the way of the kind of adapt- ability, agility and integrated management that the efficient enterprise requires. If organizations are to continue toward the goal of delivering ITaaS, they need to break down these barriers and lay the groundwork for a next-generation architecture. /// THE LIMITS OF TRADITIONAL STORAGE The typical storage architecture was designed 20 years ago, when workloads were predictable and data was structured. But today companies are dealing with an unprecedented amount of information, including unstructured data such as audio and video, which requires massive capacities. Storage systems must accommodate many different types of workloads with different performance requirements. Add to the mix increasingly demanding applications, distributed data center environments, legacy business processes that must be supported and nonstandard infrastructure inher- ited through acquisitions, and you get a gerrymandered architecture comprising many discrete storage resources /// IDG Tech Dossier HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data CONVERGENCE HELPS ORGANIZATIONS MASTER THE ART OF DEDUPLICATION

Upload: others

Post on 23-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HP CONVERGED STORAGE: Advances in Deduplication Help …go.pbpmedia.com/Download/LDGN/2A1/74429.pdf · “big data” requirements. /// DATA DEDUPLICATION 2.0 A converged storage

IN TODAY’S HYPERCONNECTED WORLD, with its multiple mobile devices, ubiquitous Internet access and pervasive social media platforms, people expect immediate access to information and services. These expectations are increasingly felt in corporate IT departments, where business units demand instant applications and turn-on-a-dime services.

Virtualization and cloud computing can help corporate IT meet these demands by helping it become more flexible and agile. But the ultimate solution is to transform the way IT is delivered. Many enterprises have already started on the journey toward a full IT as a service (ITaaS) model.

As organizations travel this road, however, they often run into a wall. Actually, several walls, including those between the server, storage and networking functions. The traditional IT infrastructure is often too rigid to enable companies to fully utilize their IT resources. In many cases, servers, storage and networking have been built and managed separately, creating func-tional silos. And within the storage architecture, an

explosion in the amount and types of data—coupled with new demands from the virtualization of servers and clients—has made storage increasingly inflexible and complicated to manage.

These factors stand in the way of the kind of adapt-ability, agility and integrated management that the efficient enterprise requires. If organizations are to continue toward the goal of delivering ITaaS, they need to break down these barriers and lay the groundwork for a next-generation architecture.

/// THE LIMITS OF TRADITIONAL STORAGE

The typical storage architecture was designed 20 years ago, when workloads were predictable and data was structured. But today companies are dealing with an unprecedented amount of information, including unstructured data such as audio and video, which requires massive capacities. Storage systems must accommodate many different types of workloads with different performance requirements. Add to the mix increasingly demanding applications, distributed data center environments, legacy business processes that must be supported and nonstandard infrastructure inher-ited through acquisitions, and you get a gerrymandered architecture comprising many discrete storage resources

/// IDG Tech Dossier

HP CONVERGED STORAGE:

Advances in Deduplication Help Tame Big Data

CONVERGENCE HELPS ORGANIZATIONS MASTER THE ART OF DEDUPLICATION

Page 2: HP CONVERGED STORAGE: Advances in Deduplication Help …go.pbpmedia.com/Download/LDGN/2A1/74429.pdf · “big data” requirements. /// DATA DEDUPLICATION 2.0 A converged storage

Become an IT service bureau

Aggregate internal and external services

Self-provision services on demand

Virtualize and automate

THE JOURNEY TO AN EFFICIENT ENTERPRISEOrganizations typically pass through five phases as they transform their traditional operations into an IT-as-a-service model:

that must be managed individually. Such an architecture is disruptive to scale, expensive to own and operate and increasingly difficult and labor-intensive to manage.

ITaaS requires a pool of storage that’s flexible and fungible. The IT staff must be able to quickly configure storage for a particular need and then just as quickly reconfigure it so it can be used again elsewhere. The storage must be malleable so that capacity can be quickly expanded, data and applications can be easily and securely migrated and workloads can be auto-matically rebalanced. Applications need to be online 24/7/365, so high availability is paramount. Finally, management of the entire storage pool, as well as coordination with virtualized servers and networking, should be streamlined and simplified.

/// THE PATH TO IT AS A SERVICE

Organizations need a strategy for rearchitecting storage so that it enables, rather than constricts, the delivery of IT services. According to HP, it’s all about Converged Storage, which breaks through the barriers, reducing complexity so that IT can expand storage on a pay-as-you-grow basis. It involves the creation of a pool of storage based on modular building blocks that can be moved and reconfigured on the fly to meet a range of needs. In fact, HP’s Converged Storage approach incorporates several core capabilities:

8MULTITENANCY: the ability to securely host many different applications in a single pool of storage, delivering the appropriate level of resources and performance for each application

8FEDERATION: the ability to geographically distribute storage resources and move data among those resources without disrupting user access to that data

8EFFICIENCY: the ability to allocate resources in the most cost-effective manner through thin provisioning and other techniques

2 /// HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data

8AUTONOMIC MANAGEMENT: the capability to reconfigure itself, balancing workloads and determining the appropriate tiering of data without manual intervention

All companies need to protect their data with solu-tions that have these characteristics—incorporating technologies such as deduplication, which removes redundant data for better capacity utilization. That effectively enables companies to effectively deal with “big data” requirements.

/// DATA DEDUPLICATION 2.0

A converged storage strategy can help companies more easily deal with the various storage chal-lenges they face, including the increasing amounts of unstructured data they must manage. Also known as big data, unstructured data includes any data that is not in a structured database format. That’s everything from e-mail to Microsoft Word and PowerPoint docu-ments, to video and audio recordings.

The increasing amounts of big data plaguing compa-nies result in one or more of four pain points:

8Shrinking backup windows amid ever-increasing amounts of data

8Increasingly difficult disaster recovery processes, including use of tape that must be transported to and from a backup site

8Lights-out data protection requirements for remote and branch offices where no IT personnel are onsite

8The need for rapid file restore, which involves finding the right tape and matching files with compatible backup systems

For all of these reasons, it makes sense for companies to try to reduce the amount of data they store and, hence, have to back up. One way a converged storage infrastructure helps do that is through advanced data deduplication technology.

Standardize and consolidate

Page 3: HP CONVERGED STORAGE: Advances in Deduplication Help …go.pbpmedia.com/Download/LDGN/2A1/74429.pdf · “big data” requirements. /// DATA DEDUPLICATION 2.0 A converged storage

THE POWER OF DEDUPLICATION SOFTWARE

HP Data Protector software, powered by HP StoreOnce deduplication, enables clients to

8 Maximize IT staff resources through remote deployment and management of deduplication stores from a central data center

8 Control licensing costs by redeploy-ing deduplication agents on application or backup servers, at no cost to exist-ing customers with Advanced Backup to Disk licenses

8 Reduce compliance risk in a small or standalone office by automating retention times for different data types and removal of expired data

/// DEDUPLICATION 1.0

For several years, storage systems have offered dedu-plication technology that helps address today’s chal-lenges by eliminating duplicate occurrences of data, thus reducing the volume of data that companies must store. These solutions identify where duplicate data exists and then write it only once while creating an index of pointers that indicate where the dupli-cate blocks should live in various files so they can be rebuilt as needed.

Specifically, deduplication goes right to the heart of the four pain points outlined above. The technology improves backup speed by reducing the amount of data that needs to be backed up. The systems can deal with multiple simultaneous streams of backup to a single device at rates of up to 28 TB per hour. Numerous indi-vidual backup streams from a range of heterogeneous platforms can be consolidated onto a single disk-based backup device—which not only improves performance but also ensures that all backup data is in the same place, regardless of the platform it came from. Dedupli-cation also lowers overall storage costs, by decreasing the amount of data that needs to be backed up and by increasing efficiency.

By dramatically reducing the amount of data that needs to be backed up—by up to 95 percent in some cases—deduplication makes data replication to a

3 /// HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data

remote disaster recovery site a practical alternative to using tape-based backups. Once a data set is estab-lished at the backup site, only changes to the data need to be replicated over the WAN. Similarly, data restores are much faster, since they’re handled over the WAN and don’t involve finding and transporting tapes. And the entire process can be automated, managed centrally via a single pane of glass. All of this dramatically increases the reliability of data backups.

Automated backup and disaster recovery also means there’s no need for operator intervention at remote sites, since data center staff can handle all tasks. Tape handling can also be eliminated, thus freeing up staffers’ time at remote sites.

Most backup applications can track where data is stored following a replication, making restores faster and easier. And by reducing data volumes, deduplica-tion enables more data to remain on hand locally in near-term storage for even faster file restores. All of this once again results in lower costs by requiring less bandwidth for backups. Additionally, deduplication reduces restore complexity, because all data can be restored from the same backup device, regardless of the platform it came from.

/// NEXT-GENERATION DEDUPLICATION

Whereas first-generation deduplication technology represented a significant step forward in dealing with big data, next-generation products are now emerging that bring even greater benefits, including increased compatibility and availability as well as improved restore performance.

Initial deduplication products were focused on various vendors’ point storage solutions. As such, they achieved deduplication by using different, often incompatible, algorithms. So if data needed to be sent between storage systems, it often needed to be reconstituted and then deduped again on the target system.

Next-generation deduplication products, or Dedupe 2.0 systems, use a common deduplication algorithm across all storage systems—whether they’re smaller systems in branch offices or large data center storage facilities. That means no more reconstituting data as it traverses different storage systems, which saves bandwidth and improves performance.

First-generation deduplication technology was also focused more on backup performance than restore performance. That’s a growing issue as companies

Page 4: HP CONVERGED STORAGE: Advances in Deduplication Help …go.pbpmedia.com/Download/LDGN/2A1/74429.pdf · “big data” requirements. /// DATA DEDUPLICATION 2.0 A converged storage

deal with increasing amounts of big data. The more data that’s backed up, the faster restores need to be. Dedupe 2.0 products can deliver restore speeds that are just as fast as backup speeds.

Dedupe 2.0 products also deliver high availability, which is increasingly important in helping companies back up more data in the same or shortened backup windows. Under such circumstances, companies can’t afford to have a backup process fail at 3 a.m. and require a restart. Some Dedupe 2.0 systems can now be configured to have a backup storage system kick in if a primary system fails, all without operator intervention. That means there’s no single point of failure in the backup process—a crucial consideration for massive storage systems that have to back up hundreds or thousands of servers on a routine basis.

/// ONE STEP AT A TIME

Many of these Dedupe 2.0 technologies were devel-oped by HP Labs and are now included in the HP StoreOnce family of dedupe appliances. One example is the common deduplication algorithms that enable all systems to deal with deduplication in the same way—a concept the technology leader calls feder-ated deduplication. Federation means that data never has to be rehydrated as it passes from one system to another, thus enabling companies to save time and money on backups, since they don’t need as much local- or wide-area bandwidth.

HP Labs has also developed specialized large container technology that improves data layouts in a storage system and enables restores to occur just as fast as backups—up to 28 TB per hour with the HP B6200 Backup System. That kind of performance is crucial in helping companies meet aggressive recovery time objectives (RTO) after a failure or a disaster.

HP’s pioneering advancements have resulted in tech-nology with no single point of failure across nodes, control-lers, cache, disks, paths, power and cooling, because each node is paired with a partner node that can take over if its companion fails. And, when used with certain backup applications, intelligent storage systems can automati-cally detect certain failures and take necessary corrective actions, including restarts—all without operator interven-tion. All of that while operating up to twice as fast as most competing systems and with three times the capacity.

Companies enjoying the benefits of Dedupe 1.0 technology will immediately see the added value that Dedupe 2.0 products can bring to their big data storage. Those that have yet to introduce deduplica-tion in their environment can skip the 1.0-genera-tion products altogether and immediately garner the benefits next-generation technology brings to a converged storage environment.

By using these concepts as a base, organizations can develop an ideal storage platform to support virtual and cloud computing. Indeed, HP’s Converged Storage will enable organizations to deploy storage 40 percent faster, reduce the time it takes to deliver IT services from weeks to minutes, reduce energy use and physical space requirements by 50 percent, and cut the time and expense of managing storage systems. n

For more information on HP’s deduplication technology and products, click here.

1 Source: “Complete Storage and Data Protection Architecture for VMware vSphere,” 2011, HP. http://h20195.www2.hp.com/v2/getdocument.aspx?docname=4AA3-5141ENW.pdf

2 Source: “Top 10 Reasons Why You Should Choose HP Store-Once” solution brief, 2010. http://h20195.www2.hp.com/V2/GetPDF.aspx/4AA3-2347ENW.pdf

4 /// HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data

BIG-TIME TCO SAVINGS

Data deduplication with HP StoreOnce affords multiple opportunities for savings:

8 Using HP Labs innovations such as sparse indexing means less (as much as 95 percent less) backup data stored on disk. Such algorithms, combined with the cost-effectiveness of the HP storage appliances that include HP StoreOnce technology, delivers a superior solution at appreciably lower cost than comparable competitive offerings.

8 HP customer studies have shown that HP StoreOnce backup systems generate 50 percent TCO savings versus a tradi-tional backup infrastructure—with the additional benefit of the faster recovery that disk-based backup provides.1

8 HP StoreOnce allows for faster, more cost-effective configuration. According to an independent evaluation conducted in 2010 by the Evaluator Group, HP StoreOnce deduplication technology required fewer steps to configure— in one situation, just 11 steps, versus 33 steps for the competition. “It did not even require looking at the manuals,” according to the Evaluator Group.2

Page 5: HP CONVERGED STORAGE: Advances in Deduplication Help …go.pbpmedia.com/Download/LDGN/2A1/74429.pdf · “big data” requirements. /// DATA DEDUPLICATION 2.0 A converged storage

5 /// HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data

////////////

Extend your data center’s life expectancyCompanies can extend the life of their data centers by two to five years through a combination of IT strategies

By Sandra Gittlen Computerworld

This year marks the 10th anniversary of the 1,200-square-foot data center at the Franklin W. Olin College of Engineering -- that means the facility has been operating three years longer than CIO and vice president of operations Joanne Kossuth had originally planned. Now, even though the school needs a facility with more capacity and better connectivity, Kossuth has been forced to set aside the issue because of the iffy economic times.

“Demand has certainly increased over the years, push-ing the data center to its limits, but the recession has tabled revamp discussions,” she says.

Like many of her peers, including leaders at Citigroup and Marriott International, Kossuth has had to get creative to eke more out of servers, storage, and the facility itself. To do so, she’s had to re-examine the life cycle of data and applications, storage array layouts, rack architectures, server utilization, orphaned de-vices and more.

Rakesh Kumar, research vice president at Gartner, says he’s been bombarded by large organizations looking for ways to avoid the cost of a data center upgrade, expan-sion or relocation. “Any data center investment costs at minimum tens of millions, if not hundreds of millions, of dollars. With a typical data center refresh rate of five to 10 years, that’s a lot of money, so companies are look-ing for alternatives,” he says.

While that outlook might seem gloomy, Kumar finds that many companies can extract an extra two to five years from their data center by employing a combination of strategies, including consolidating and rationalizing hardware and software usage;

Suggested ReadingThese additional resources include business white papers and previously published articles from IDG Enterprise.

Read the full article

////////////

Recoup with data dedupeEight products that cut storage costs through data deduplication

By Logan G. Harbaugh Network World

Backing up servers and workstations to tape can be a cumbersome process, and restoring data from tape even more so. While backing up to disk-based storage is faster and easier, and probably more reliable, it can also be more expensive.

One way to get the best of both worlds is to back up to disk-based storage that uses deduplication, which in-creases efficiency by only storing one copy of a thing.

While the process was originally used at the file level, many products now work at the block or sub-block (chunk) level, which means that even files that are mostly the same can be deduplicated, saving the space consumed by the parts that are the same.

For instance, say someone opens a document and makes a few changes, then sends the new version to a dozen people. With file-level deduplication, the old and new versions are different files, though only one copy of the new version is stored. With block-level or sub-block-level deduplication, only the first document and the changes between the first document and the second are stored.

There is some debate about the optimum process - deduplication of files is not very efficient, blocks, more so, chunks even more so. However, the smaller the chunks, the more processing it takes, and the bigger the indices are that keep track of duplicates. Some systems use variable size chunks to tune this, depend-ing on the type of data being stored.

The good news is that deduplication works well - in our tests, all of the products were able to create a second copy of a volume and use less than 1% additional space, and to back up a copy of the test volume with 4,552 files changed totaling 31.7 GB and use no more than 32GB of additional space, and in some cases a

Read the full article

Page 6: HP CONVERGED STORAGE: Advances in Deduplication Help …go.pbpmedia.com/Download/LDGN/2A1/74429.pdf · “big data” requirements. /// DATA DEDUPLICATION 2.0 A converged storage

6 /// HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data

Suggested Reading

////////////

The Emergence of a New Generation of Deduplication Solutions:Comparing HP StoreOnce vs. EMC Data Domain

By Edison Group

The HP StoreOnce deduplication technology, launched in 2010, helps IT organizations address the challenge of protecting and recovering exponentially growing amounts of data in the face of stagnant or incremen-tally increasing IT budgets.

On November 29, HP launched the latest iteration of its StoreOnce deduplication portfolio. The HP B6200 StoreOnce Backup System provides enterprise-class scale-out capabilities and autonomic restart of backup jobs for high data availability. The autonomic restart feature, an important differentiator, is designed to eliminate failed backups by pairing nodes within a couplet – there are two nodes in each couplet – so the surviving node can take over when its companion node fails.

To help current and potential customers understand the value of the appliance and the StoreOnce strategy, technology research firm Edison Group compared HP’s B6200 StoreOnce offering to its nearest competi-tor, EMC Data Domain 890 and Data Domain Global Deduplication Array. Edison considered a number of criteria that are of critical concern to today’s data center IT managers in evaluating products. These in-clude scalability (including capacity and performance), high availability, architectural approach, pricing, and licensing.

In the course of its research Edison found that HP B6200 StoreOnce meets, and in many cases exceeds, Data Domain’s published specifications. Notably, Edison also found the HP B6200 StoreOnce to be the only enterprise-class deduplication appliance to offer an autonomic restart feature, which provides industry-leading availability for big-data backups.

////////////

HP StoreOnce: The Next Wave of Data Deduplication

By Enterprise Strategy Group

Leveraging deduplication in backup environments yields significant advantages. The cost savings in reducing disk capacity requirements change the economics of disk-based backup. For some organiza-tions, it allows disk-based backup—and, importantly, recovery—to be extended to additional workloads in the environment. For others, deduplication makes it possible to introduce disk-based backup where it may not have been feasible before.

Deduplication in data protection is not new; however, it is being implemented in new ways. Its availability in secondary disk storage systems was the predominant delivery vehicle just a few years ago. Today, the tech-nology is available as an integrated feature of backup software, cloud gateway, and software-as-a-service (SaaS) solutions, delivering bandwidth savings in addi-tion to reduced storage capacity benefits. In addition to distributing deduplication processing across mul-tiple points in the backup data path, there are many more deduplication techniques and approaches today too. Vendors are perfecting and optimizing algorithms that identify and eliminate redundancy to meet the ever-changing requirements driven by relentless data growth and IT’s desire to keep pace with the volume of data under management.

The evolution of deduplication is being provoked by user requirements, as well as improvements in IT infrastruc-ture, including larger, faster disk drives, and APIs facilitating better integration between data protection hardware and software components. IT organizations that have or plan to implement deduplication want greater flexibility in how and where deduplication is deployed, tighter integration with the backup policy engine and backup catalog, faster performance for backup and recovery, the ability to dedu-plicate within and across domains to gain more efficiency. And, they want it for the lowest cost possible.

Read the full article Read the full article

4AA3-9132ENW