addressing the changing role of unstructured data with...

16
©COPYRIGHT 2018 451 RESEARCH. ALL RIGHTS RESERVED. Addressing the Changing Role of Unstructured Data With Object Storage COMMISSIONED BY OCTOBER 2018

Upload: others

Post on 11-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

©COPYRIGHT 2018 451 RESEARCH. ALL RIGHTS RESERVED.

Addressing the Changing Role of Unstructured Data With Object Storage

CO M M I SS I O N E D BY

O CTO B E R 20 1 8

Page 2: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

A B O U T T H E AU T H O R

ST EV E N H I L LS E N I O R A N A LYST, STO R A G E

Steven Hill is a Senior Analyst of Storage technologies. He covers the latest generation of hyperconverged systems, cloud-based storage and business continuity/disaster recovery solutions for enterprise customers.

2CO M M I S S I O N E D BY W EST E R N D I G I TA L

About this paperA Black & White paper is a study based on primary research survey data that assesses the market dynamics of a key enterprise technology segment through the lens of the “on the ground” experience and opinions of real practitioners — what they are doing, and why they are doing it.

Page 3: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

3CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

Executive SummaryFor the last decade, the nature and makeup of business data has been shifting from structured database information toward a vast array of unstructured data in the form of documents, media and dense data sets used by medical imaging, scientific modeling, engineering and other technical applications, which can generate massive quantities of information every time they’re used. On top of this explosion of unstructured data is the growing need to maintain and manage all that data for an extended period of time, either for use in future research or to categorize and protect it to meet legal or industry compliance requirements.

In 2017, we fielded the first 451 Research/Western Digital Object Storage Survey to examine the increasing importance of unstructured data and hybrid cloud storage, as well as the metadata capabilities of object storage. We published our results in a November 2017 paper titled The Growing Role of Object Storage in Solving Unstructured Data Challenges. The goal of this survey was to provide a current snapshot of the IT industry’s real-world storage problems, and to track customer awareness regarding object storage over time. Based on our findings in 2017, at the request of Western Digital we altered the 2018 target audience of the survey to focus exclusively on 200 enterprise customers (rather than 100 enterprises and 100 service providers, as in the 2017 survey), and we made some modifications to the questions in order to gain additional insight into the modern enterprise.

Key Findings• Unstructured data continues to grow faster than traditional database data for customers in most

vertical markets, and is rapidly exceeding the ability to manage it.

• Filesystems lack the rich metadata capabilities needed to identify, classify and contextualize many forms of unstructured data.

• The metadata capabilities of object storage provide a framework for identifying and contextualizing data that can be used to automate long-term data management.

• A growing number of applications either create or utilize extremely large and/or dense data sets that may exceed the capabilities of traditional filesystems.

• Artificial intelligence and machine learning platforms are evolving and becoming mainstream. They present new opportunities for extracting better and ongoing value from business data.

• New AI/ML platforms can also provide the tools for generating reliable, rich metadata about media-based objects that can then serve as criteria for policy-based, long-term data management.

• Privacy initiatives like GDPR and The California Consumer Privacy Act of 2018 will have a substantial impact on business data, requiring better identification and granular management of both structured and unstructured data.

• There is a growing need for personnel with the data science and business intelligence skills to gain useful and actionable insight from stored information.

Page 4: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

4CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

Analytics Sets the Trend for Creation and Use of Unstructured DataTracking Enterprise Storage Patterns

One of the goals of the annual Object Storage Survey is to track the source of new data as well as examine the evolving uses of that data. As compute performance continues to grow, and increasingly efficient and detailed models for data analysis continue to progress, global businesses of all sizes are starting to recognize the valuable information that deep data analysis can generate – from both new and existing data. This has interesting ramifications for the IT industry because it validates the trend of continued storage growth and provides some justification for maintaining much of that data for an even longer period in order to extract its maximum value. In Figures 1 and 2, we compare enterprise customer responses from 2017 and 2018 to map out overall storage growth and the relative impact of unstructured data.

Figure 1: Total Enterprise Storage Growth – 2017-2018 Source: 451 Research and Western Digital custom surveyQ. How much overall storage growth are you experiencing annually?

13%

11%

30%

46%

31%33%

19%

9%7%

3%

Enterprise 2017 (n=100)

Enterprise 2018 (n=200)

0%

10%

20%

30%

40%

50%

Less than 20% per year 20-40% per year 40-60% per year 60-80% per year Greater than 80%

Page 5: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

5CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

Figure 2: Relative Unstructured Data Growth – 2017-2018Source: 451 Research and Western Digital custom surveyQ. How fast would you say your unstructured data is growing when compared to other business data?

As we stated earlier, for the 2018 Object Storage Survey, we chose to focus specifically on the enterprise storage customer, as opposed to the 50/50 mix of enterprise users and service providers in the 2017 survey. So for the purposes of data consistency, we’ve only included the 100 responses from the enterprise group in 2017 for comparison. Looking at total annual storage growth in the enterprise (Figure 1), the most noticeable decrease, in the 60% and higher growth range, is accompanied by an increase in the 20-40% category. While this variance could be attributed to a number of causes, we believe that total storage growth will continue to average between 40% and 50%, and the picture will become clearer over time.

Perhaps more interesting is the data from Figure 2, which shows a continuing trend of unstructured data growing faster than other forms of data. There is a marked increase in the ‘somewhat faster’ growth category, but the key takeaway is that roughly two-thirds of respondents in both 2017 and 2018 indicated that unstructured data represents an increasing majority share of their overall data growth. Whether it’s documents, images, video, audio or large data sets from medical, science or engineering sources, unstructured data is presenting an increasingly difficult challenge for IT managers. This challenge is due in part to unstructured data’s role in uncontrolled storage growth, but also because of the lack of a consistent model for identifying, classifying and securing this file-based data over time.

1% 1%

10%

6%

22% 23%

38%

55%

29%

16%

0%

10%

20%

30%

40%

50%

60%

Much Slower Somewhat Slower Same Rate Somewhat Faster Much Faster

Enterprise 2017 (n=100)

Enterprise 2018 (n=200)

Page 6: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

6CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

HPC, Machine Learning and Analytics Change the Role of Object Storage

While database information continues to play a mission-critical role in the enterprise, there has been a dramatic increase in the use and number of applications that generate and utilize unstructured data. Most of these applications store this data in traditional file format, but the majority of file-based data formats offer little or no contextual information about the contents of that data. Object storage provides a solution to this problem because object-based data incorporates flexible and detailed metadata, which can be used to identify and categorize that information. Object storage also provides a model for policy-based data management that substantially increases visibility and can be used to automate governance.

While object storage has been in use for several decades, it was often relegated to static applications like long-term archives. But object storage is now undergoing a renaissance. Not only does the abstraction offered by the object model provide the core technology for many distributed, scale-out storage offerings, object storage also serves as the only model capable of managing exabytes of cloud storage worldwide. One of the goals of this annual Object Storage Survey is to track adoption rates of object storage for a number of common business use cases. Figure 3 shows how and at what rate our respondents are utilizing object storage in their business environment for 2017 and 2018.

Figure 3: Enterprise Object Storage Adoption – 2017-2018Source: 451 Research and Western Digital custom surveyQ. Please select your planned or anticipated use cases and timeline for object storage adoption. * (Note: Machine Learning/AI response was added for 2018, so no data exists for 2017.)

68%61%

66%60%

70% 69%

54% 53%

42%

56% 52% 52%

72%66%

49%44%

36%

21%

24% 15% 26%13% 17%

31%28%

34%

27% 29% 31%

16%

19%

30%

30%

34%

8%10% 14%

11% 13% 11% 8% 15%17%

14%10% 10%

10%13%

14%16%

18%

3%

6% 5% 4% 4% 4% 7% 5% 7% 5% 9% 8%

2%4% 7% 11%

0%

13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2017 2018 2017 2018 2017 2018 2017 2018 2017 2018 2017 2018 2017 2018 2017 2018 2017 * 2018

In use 12 months 24 months No plan

Archiving Shared File/Object

Backup Analytics (Big Data)

HPC Contentdistribution

IoT dataData protection

Machine Learning/AI

Page 7: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

7CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

There is a lot of information packed into Figure 3, and as mentioned, there were some changes and additions to the Object Storage Survey for 2018 that we believe will yield an even better picture over time. The only upward shift between 2017 and 2018 appeared in the High-Performance Computing (HPC) segment. The most common thread among companies already using object storage for HPC applications was the fact that the majority reported that unstructured data made up 40-60% of their total data storage. A deeper look at those respondents showed a surprisingly even mix of company size, vertical market and total storage capacity.

Although it goes beyond the scope of this survey to determine exactly what our respondents consider to be HPC, it’s not unreasonable to posit that a growing number of companies are leveraging either on-premises or cloud-based HPC services for applications like non-destructive testing, prototyping, molecular modeling, genomics and other use cases that combine extremely large and dense unstructured data sets with parallelized computing platforms. There is certainly a crossover between HPC and other application categories like Big Data Analytics and Machine Learning/Artificial Intelligence; but regardless of the specific computing model involved, we believe the data illustrates a current and continuing trend in the adoption of object storage to support far more dynamic workloads than simple archiving.

The power of analytics lies in the fact that it is, by its very nature, an evolutionary process. Many analytics initiatives today begin with a relatively well-defined intent, but that intent should also be adaptable based on information gained in the analytical process. To put it simply, we don’t know what we don’t know, so information gleaned from the analysis of a data set today may yield deeper and more useful insight by incorporating previous results in future analytics models. It’s this potential that makes it worth keeping a growing number of data sets available for longer periods of time to support future analysis. Perhaps more importantly, it points to an even greater justification to adopt better models for identifying and managing unstructured data for the long-term.

Page 8: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

8CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

A Deeper Look at the Key Sources of Unstructured DataThe Changing Face of Business Data

Businesses have been dealing with unstructured data for decades, but have typically addressed it as a lesser part of the larger data storage problem. While large SAN and NAS storage systems provide the capacity and data protection needed for important business data, their hierarchical filesystems offer little or no useful information beyond a filename, extension, date and basic attributes.

Today, most business applications still depend on filesystems and user-based naming, a system that helps users remember what they’ve created and where it may be located, but this model couldn’t be worse when it comes to accurately identifying and contextualizing that data over time. As it stands, most file-based data rapidly becomes dark data when it is no longer directly connected to the user who created it. Given the substantial growth of unstructured data, the problems of long-term unstructured data storage will only increase.

An important component of the data management formula lies in understanding the business applications that create and utilize unstructured data. Simply adding capacity and accommodating backups were the primary goals of NAS platforms in the past, but as the value of the information contained in file-based data and the ramifications of improperly protecting it continue to grow, it will become increasingly important to adopt a more intelligent model for next-generation secondary storage.

Figure 4 identifies the current top sources of unstructured data for 200 enterprise customers across 10 vertical markets, and it paints an interesting picture of the growing number of applications that now depend on unstructured data.

Page 9: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

9CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

Figure 4: Emerging Sources of Unstructured Data – 2018Source: 451 Research and Western Digital custom surveyQ. What is the makeup of your unstructured data environment at present?

While documents and email continue to top the chart, they are closely followed by applications with very dense data formats that may be difficult or impossible to compress or de-duplicate. For example, a basic DNA scan can be nearly a GB of data; seismic survey data for oil and gas can range from dozens to thousands of GB of data per survey; and a 4k video stream could consume between 100 and 700GB per hour, depending on format and codec. Of course, this is merely the original, raw data and doesn’t take into account the additional storage needed to simply protect it. It also excludes the replication needed to support concurrent workloads and the storage space needed for any equally dense, iterative work product that may be spawned from the original raw data.

Support for very large data items and theoretically infinite scalability are only part of the opportunity enabled by modern object storage, and more and more enterprises are looking to leverage the advanced capabilities of object storage. The logical abstraction of object storage combined with the availability of rich metadata provides a flexible framework for data protection, governance, identification, classification, distribution and automation that’s appealing for nearly every use case, and these capabilities are perfectly aligned with the growing adoption of public, private and hybrid cloud initiatives worldwide.

46%

50%

50%

51%

52%

53%

53%

56%

57%

57%

58%

59%

59%

64%

65%

User-created video

Web content

Audio recordings (call monitoring, dictation, etc.)

CAD data

Commercial video (finished projects and/or production data)

IoT data (log and/or sensor data)

Video monitoring (security camera output)

Publishing work product (art, page elements)

Machine learning data sets

Digital images (camera files and scanned documents)

Big Data Analytics

Large-scale data for science, engineering and associated research

Medical or industrial imagery

Email and attachments

Business Documents (word processing/spreadsheets)

Page 10: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

10CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

In Figure 5, we examine the major expectations and concerns of object storage customers and how they differed between 2017 and 2018. Again, this is a case where doubling the number of respondents between surveys may have had an impact, but there are some clearly defined trends.

Figure 5: Customer Needs and Expectations Around Object Storage – 2017-2018Source: 451 Research and Western Digital custom surveyQ. How important are the following concerns for managing unstructured data using Object Storage?

The need for industry compliance rose to the top of the list of concerns in 2018, which is no surprise given the demanding requirements of initiatives like the EU’s General Data Protection Regulation (GDPR), which came into effect in May 2018. The GDPR sets a strict new set of rules for the management of private information governing issues like user consent, breach notification, data portability, privacy by design, and rights to both access personal data and also have it expunged upon demand. This was followed in short order by The California Consumer Privacy Act of 2018, which follows similar guidelines and is set to take effect January 1, 2020.

54%

57%

59%

59%

59%

60%

61%

61%

61%

61%

62%

62%

62%

62%

63%

64%

50%

53%

56%

56%

55%

68%

61%

58%

48%

60%

47%

55%

61%

52%

63%

60%

S3 compatibility

eDiscovery protection/litigation hold

Data visibility/classification

Hybrid and multi-cloud support

File interface (NFS/CIFS)

Continuous availability

Data lifecycle management capabilities

Scalability for both file/object size and quantity

Operating Expense (Opex) pricing model (or "pay as you grow")

Access control and data security

Support for multiple geographies (Regions)

Data sovereignty compliance

Multi-site data protection

Ability to enforce policies on data

Performance

Industry compliance

Enterprise 2017 (n=100)

Enterprise 2018 (n=200)

Page 11: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

11CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

What makes these regulations major concerns for businesses worldwide is that they both address businesses located in those jurisdictions and also protect individuals living in those areas. This means that a privacy violation of an EU resident’s information by a US company carries the same potential penalty, which for GDPR is a maximum fine of 4% of annual global revenue or €20m (whichever is greater). With the California initiative, the penalty can be as much as $7,500 for each violation, which may not seem that bad until you realize that a breach affecting half a million California-based users could theoretically result in a fine of $3.75bn. While these are worst-case scenarios, we believe regulations like these elevate the need for accurate and functional unstructured data identification to mission-critical status – and for the first time, retaining data becomes both an asset and a potential liability.

This will be especially critical for companies whose primary business model is based on user information, but it could be an issue for companies that store any personal information in any manner. Looking at the overall trend in Figure 5, five of the top six concerns listed revolve around data governance, and even the fifth-highest concern, Support for Multiple Geographies (Regions), has data governance implications. Figure 6 offers a different insight into the business drivers for object storage in 2018.

Figure 6: Business Reasons for Long-Term Data Retention – 2018 Source: 451 Research and Western Digital custom surveyQ. What is your primary business driver for evaluating long-term storage, access and data analysis?

For 2018 we opted to look at the business concerns of C-level survey respondents vs. those of the rest of the respondents, revealing some interesting differences. As expected, cost savings made the top of the list for senior management, while operational improvements were by far the key concern for those further down the chain. These responses certainly followed the traditional separation of responsibilities, but they are actually interrelated. Operational efficiency has a direct relationship with cost, so streamlining and automating long-term data management has the potential to improve both. It’s a win-win, but the first challenge lies in changing the way we deal with unstructured data.

12%

8%

13%

13%

25%

10%

17%

15%

18%

17%

13%

18%

13%

23%

10%

29%

18%

32%

Total (n=200)

C-Level (n=40)

Other (n=160)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Looking for new revenue opportunities Identifying storage cost savings

Better protecting valuable business data Providing greater access to data resources

Addressing changed data compliance issues Improving operations

Page 12: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

12CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

Building an Inclusive Data Management Strategy

Matching Business Needs to Long-Term Data Management

For the most part, IT has been dealing with the unstructured data challenge by simply throwing more capacity at it – but this is a short-term fix that’s rapidly becoming untenable as data continues to grow at record rates, and it does little or nothing to address the challenges of long-term management.

We believe this problem is specific to the unstructured data that typically falls into the ‘secondary’ storage category. High-performance ‘primary’ storage remains best suited for highly transactional applications like OLTP databases and mission-critical systems of record. While secondary storage may not have the performance requirements expected of primary systems, secondary storage applications still merit high-end data protection as well as the ability to deliver storage to a greater number of use cases.

This ‘new’ secondary storage market offers a wide selection of options when it comes to physically hosting and delivering flexible, software-defined storage services in a variety of protocols and APIs. It’s no longer necessary to depend on proprietary hardware to deliver enterprise-class secondary storage, and secondary storage vendors across the industry have been changing their approach, with many now identifying as ‘information management’ providers rather than storage companies.

This is a reasonable claim, because many next-generation storage vendors are now leveraging the abstraction and metadata capabilities of object storage to move beyond rudimentary capacity management with the intent of delivering ‘intelligent’ storage that’s able to make decisions based on an awareness of the contents and relative value of the data being stored. But this enabling technology is only the first step; the next step lies in defining a model for metadata designation that’s simple, relevant and extensible across vendors.

Unfortunately, the creation of metadata is a blank slate, and the industry has yet to agree on the basic criteria that should serve as the foundation for policy-based data management. This is best illustrated in Figure 7, which shows that the majority of companies are looking for help from their vendors in defining their metadata needs.

Page 13: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

13CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

Figure 7: Vendor Assistance Expectations for Metadata Definition – 2017-2018Source: 451 Research and Western Digital custom surveyQ. How much assistance would you require from a vendor in determining your metadata needs?

Even though it’s only been a year since our last Object Storage Survey, there appears to be an upward trend in customer understanding of the challenges and opportunities offered by metadata-based management, and an increase in customers needing little or no vendor assistance. Regardless, over 70% of the respondents in 2018 still look to their vendors for guidance when it comes to defining their long-term metadata needs. In reality, any reliable metadata is better than no metadata at all, and the very nature of object metadata allows customers to evolve their metadata framework over time. That being said, the storage industry isn’t off the hook, and we still strongly advocate for the adoption of a common, simple and extensible metadata model that should be attached to all forms of business data.

Another growing concern for all forms of data, structured and unstructured alike, is data hygiene. In the case of unstructured data, this may be part of an eDiscovery process, or simply a desire to regularly detect and eliminate toxic data; but from a broader perspective, even structured data presents a substantial challenge when it comes to curating and normalizing data from multiple sources for analytics purposes. One of the keys to successful analytics lies in the process of preparing large quantities of data in the form of data lakes, or pools of data that will be used as the target. While analytics can be about finding that needle in a haystack, proper data hygiene can improve the analytics process by simply eliminating the haystacks that don’t contain needles and reducing the size of the ones that do.

Better data identification, via contextual metadata, will offer a new set of tools for sorting massive quantities of unstructured data. This will better establish what data should be included in or excluded from the process, and can help to provide the controls necessary to establish compliance with privacy initiatives like GDPR, which establishes “the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.”

3%

6%

15%

24%

28%

31%

41%

34%

13%

6%

Enterprise 2017(n=100)

Enterprise 2018(n=200)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

None Basic pre-sale discussion

Help in defining a metadata framework Full business assessment and long-term metadata planning

Migration services

Page 14: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

14CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

The interest in and availability of analytics is rapidly becoming universal across all vertical markets and companies of nearly every size, creating the need for a new generation of data specialists with new capabilities for understanding the nature of data and translating business needs into actionable insight. Figure 8 illustrates some of the new positions and opportunities opening up in the information management and analytics space, which increasingly involves unstructured data.

Figure 8: New Skills Needed to Manage the Use and Growth of Unstructured DataSource: 451 Research and Western Digital custom surveyQ. Please select your planned or anticipated hiring plans to keep up with the skills needed to support unstructured data growth for your key

initiatives:

While some of the traditional positions in IT remain stable, there are new entries in the form of data scientists and engineers to focus on optimizing data for analytics and translating results into usable business intelligence. As a new question in 2018, this sets the baseline for future surveying, and we look forward to tracking this data as the challenge of unstructured data management evolves.

55%

41%

53%

46%

37%

60%

56%

52%

57%

24%

33%

24%

29%

36%

23%

23%

30%

24%

12%

13%

12%

12%

8%

12%

11%

9%

9%

10%

14%

11%

14%

20%

6%

10%

10%

11%

Storage Administrators

Storage Architects

Backup & Archive Administrators

Data Engineers

Data Scientists

IT Operations

System Administrators

Application Developers

Networking Engineers

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Now 12 months 24 months No plan

Page 15: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

15CO M M I S S I O N E D BY W EST E R N D I G I TA L

B L A C K & W H I T E | A D D R E S S I N G T H E C H A N G I N G R O L E O F U N ST R U C T U R E D DATA W I T H O B J EC T STO R A G E

ConclusionWe believe that unstructured data management remains one of the largest challenges for IT going forward. Every indicator across the IT industry points to the need to better identify and classify unstructured data, but the adoption of new ideas – especially in the highly conservative storage industry – can be slow at times. This is as it should be; everything in IT ultimately begins and ends with storage, so protecting business data and ensuring its availability should be a primary concern for businesses, and by extension the IT that supports the business units. But we’re not talking about a new platform: object storage has several decades of validation, and serves as the basis for cloud storage offerings containing untold exabytes of data, with up to 19x9s of data resilience.

The storage industry as a whole has evolved dramatically over the last two decades, but it’s an evolution that mainly occurs behind the scenes. Storage management tasks that used to require extreme specialization, hours of work and a certain amount of risk are now simplified or automated, and the old idea of storage as a blind repository is no longer realistic. The next stage in storage evolution has to move beyond the simple nuts and bolts of storing data. It should focus on leveraging the information that data contains and contributing value through visibility, control and automation, regardless of where that data physically resides. We can no longer afford ‘dumb’ storage, and object storage currently offers the only model capable of bringing storage to the next level.

Recommendations

Managing unstructured data using object storage is essentially the easy part. What’s harder is determining the metadata that’s of value and then creating that metadata. This is something that should have been addressed long ago, but now the problem is reaching the point of unmanageability, with growing legal and business ramifications.

The following recommendations are critically important moving forward:

• Understand your data environment – Take the time to explore what types of unstructured data are growing fastest, what applications are creating it, and what it’s being used for.

• Talk with internal business stakeholders – Different segments within your business will have different needs; start drawing up an objective list of all of them.

• Consider what information is business- or mission-critical – Everyone believes their data is most important, but it helps to start with the data that has legal or compliance considerations.

• Reconsider the need to gather and store personally identifiable information (PII) – There are several forms of PII that merit caution (e.g., Social Security numbers, driver’s license numbers, credit card numbers, etc.). The list will vary based on vertical market and business.

• Start somewhere – Simply tagging object data with an accurate time-stamp, nation of origin and information about the author (based on Active Directory or LDAP credentials) can be a valuable starting point that associates the document with an individual and a role within the company.

• Think long-term – A substantial majority of unstructured data in the form of images, audio and video may be difficult to identify now, but new technologies are appearing that will eventually make it easier to build useful, contextualized metadata about dense media.

Page 16: Addressing the Changing Role of Unstructured Data With ...pages.westerndigital.com/rs/712-LKT-595/images...Role of Object Storage in Solving Unstructured Data Challenges. The goal

About 451 Research451 Research is a preeminent information technology research and advisory company. With a core focus on technology innovation and market disruption, we provide essential insight for leaders of the digital economy. More than 100 analysts and consultants deliver that insight via syndicated research, advisory services and live events to over 1,000 client organizations in North America, Eu-rope and around the world. Founded in 2000 and headquartered in New York, 451 Research is a division of The 451 Group.

© 2018 451 Research, LLC and/or its Affiliates. All Rights Reserved. Reproduction and distribution of this publication, in whole or in part, in any form without prior written permission is forbidden. The terms of use regarding distribution, both internally and externally, shall be governed by the terms laid out in your Service Agreement with 451 Research and/or its Affiliates. The information contained herein has been obtained from sources believed to be reliable. 451 Research disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although 451 Research may discuss le-gal issues related to the information technology business, 451 Research does not provide legal advice or services and their research should not be construed or used as such.

451 Research shall have no liability for errors, omissions or inadequacies in the informa-tion contained herein or for interpretations thereof. The reader assumes sole respon-sibility for the selection of these materials to achieve its intended results. The opinions expressed herein are subject to change without notice.

NEW YORK1411 Broadway New York, NY 10018 +1 212 505 3030

SAN FRANCISCO140 Geary Street San Francisco, CA 94108 +1 415 989 1555

LONDONPaxton House 30, Artillery Lane London, E1 7LS, UK +44 (0) 203 929 5700

BOSTON75-101 Federal Street Boston, MA 02110 +1 617 598 7200