big data analytics || market and business drivers for big data analytics

Post on 27-Jan-2017

215 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CHAPTER 11Market and Business Drivers for Big DataAnalytics

1.1 SEPARATING THE BIG DATA REALITY FROM HYPE

There are few technology phenomena that have taken both the techni-cal and the mainstream media by storm than “big data.” From theanalyst communities to the front pages of the most respected sourcesof journalism, the world seems to be awash in big data projects, activi-ties, analyses, and so on. However, as with many technology fads,there is some murkiness in its definition, which lends to confusion,uncertainty, and doubt when attempting to understand how the meth-odologies can benefit the organization.

Therefore, it is best to begin with a definition of big data. The ana-lyst firm Gartner can be credited with the most-frequently used (andperhaps, somewhat abused) definition:

Big data is high-volume, high-velocity and high-variety information assetsthat demand cost-effective, innovative forms of information processing forenhanced insight and decision making.1

For the most part, in popularizing the big data concept, the analystcommunity and the media have seemed to latch onto the alliterationthat appears at the beginning of the definition, hyperfocusing on whatis referred to as the “3 Vs—volume, velocity, and variety.” Othershave built upon that meme to inject additional Vs such as “value” or“variability,” intended to capitalize on an apparent improvementto the definition.

The ubiquity of the Vs definition notwithstanding, it is worth notingthat the origin of the concept is not new, but was provided by (at thetime Meta Group, now Gartner) analyst Doug Laney in a researchnote from 2001 about “3-D Data Management,” in which he noted:

1Gartner’s IT Glossary. Accessed from ,http://www.gartner.com/it-glossary/big-data/. (Lastaccessed 08-08-13).

Big Data Analytics. DOI: http://dx.doi.org/10.1016/B978-0-12-417319-4.00001-6© 2013 Elsevier Inc.All rights reserved.

While enterprises struggle to consolidate systems and collapse redundantdatabases to enable greater operational, analytical, and collaborative consis-tencies, changing economic conditions have made this job more difficult.E-commerce, in particular, has exploded data management challenges alongthree dimensions: volumes, velocity and variety. In 2001/02, IT organizationsmust compile a variety of approaches to have at their disposal for dealingwith each.2

The challenge with Gartner’s definition is twofold. First, the impactof truncating the definition to concentrate on the Vs effectively distilsout two other critical components of the message:

1. “cost-effective innovative forms of information processing”(the means by which the benefit can be achieved);

2. “enhanced insight and decision-making” (the desired outcome).

The second is a bit subtler: the definition is not really a definition,but rather a description. People in an organization cannot use the defi-nition to determine whether they are using big data solutions or even ifthey have problems that need a big data solution. The same issueimpedes the ability to convey a value proposition because of the diffi-culty in scoping what is intended to be designed, developed, and deliv-ered and what the result really means to the organization.

Basically, it is necessary to look beyond what is essentially a mar-keting definition to understand the concept’s core intent as the firststep in evaluating the value proposition. Big data is fundamentallyabout applying innovative and cost-effective techniques for solvingexisting and future business problems whose resource requirements(for data management space, computation resources, or immediate, in-memory representation needs) exceed the capabilities of traditionalcomputing environments as currently configured within the enterprise.Another way of envisioning this is shown in Figure 1.1.

To best understand the value that big data can bring to your organi-zation, it is worth considering the market conditions that have enabledits apparently growing acceptance as a viable option to supplement theintertwining of operational and analytical business application in lightof exploding data volumes. Over the course of this book, we hope to

2Doug Laney. Deja VVVu: others claiming Gartner’s construct for big data, January 2012.Accessed from ,http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/..

2 Big Data Analytics

quantify some of the variables that are relevant in evaluating and mak-ing decisions about integrating big data as part of an enterprise informa-tion management architecture, focusing on topics such as:

• characterizing what is meant by “massive” data volumes;• reviewing the relationship between the speed of data creation and

delivery and the integration of analytics into real-time businessprocesses;

• exploring reasons that the traditional data management frameworkcannot deal with owing to growing data variability;

• qualifying the quantifiable measures of value to the business;• developing a strategic plan for integration;• evaluating the technologies;• designing, developing, and moving new applications into

production.

Qualifying the business value is particularly important, especiallywhen the forward-looking stakeholders in an organization need to effec-tively communicate the business value of embracing big data platforms,and correspondingly, big data analytics. For example, a business

Innovativetechniques

Costeffective

Increaseddata

volumes

Increasedcomputation

need

Increasedanalytics

need

Loweredbarrier toentry andsuccess

Big data

Figure 1.1 Cracking the big data nut.

3Market and Business Drivers for Big Data Analytics

justification might show how incorporating a new analytics frameworkcan be a competitive differentiator. Companies that develop customerupselling profiles based on limited data sampling face a disadvantagewhen compared to enterprises that create comprehensive customer mod-els encompassing all the data about the customer intended to increaserevenues while enhancing the customer experience.

Adopting a technology as a knee-jerk reaction to media buzz has alowered chance of success than assessing how that technology can beleveraged along with the existing solution base as away of transform-ing the business. For that reason, before we begin to explore the detailsof big data technology, we must probe the depths of the business dri-vers and market conditions that make big data a viable alternativewithin the enterprise.

1.2 UNDERSTANDING THE BUSINESS DRIVERS

The story begins at the intersection of the need for agility andthe demand for actionable insight as the proportion of signal to noisedecreases. Decreasing “time to market” for decision-making enhance-ments to all types of business processes has become a critical competi-tive differentiator. However, the user demand for insight that is drivenby ever-increasing data volumes must be understood in the context oforganizational business drivers to help your organization appropri-ately adopt a coherent information strategy as a prelude to deployingbig data technology.

Corporate business drivers may vary by industry as well as by com-pany, but reviewing some existing trends for data creation, use, sharing,and the demand for analysis may reveal how evolving market conditionsbring us to a point where adoption of big data can become a reality.

Business drivers are about agility in utilization and analysis of collec-tions of datasets and streams to create value: increase revenues, decreasecosts, improve the customer experience, reduce risks, and increase pro-ductivity. The data explosion bumps up against the requirement forcapturing, managing, and analyzing information. Some key trends thatdrive the need for big data platforms include the following:

• Increased data volumes being captured and stored: According to the2011 IDC Digital Universe Study, “In 2011, the amount of

4 Big Data Analytics

information created and replicated will surpass 1.8 zettabytes, . . .growing by a factor of 9 in just five years.”3 The scale of this growthsurpasses the reasonable capacity of traditional relational databasemanagement systems, or even typical hardware configurations sup-porting file-based data access.

• Rapid acceleration of data growth: Just 1 year later, the 2012 IDCDigital Universe study (“The Digital Universe in 2020”) postulated,“From 2005 to 2020, the digital universe will grow by a factor of 300,from 130 exabytes to 40,000 exabytes, or 40 trillion gigabytes (morethan 5,200 gigabytes for every man, woman, and child in 2020). Fromnow until 2020, the digital universe will about double every two years.”4

• Increased data volumes pushed into the network: According toCisco’s annual Visual Networking Index Forecast, by 2016, annualglobal IP traffic is forecast to be 1.3 zettabytes.5 This increase innetwork traffic is attributed to the increasing number of smart-phones, tablets and other Internet-ready devices, the growing com-munity of Internet users, the increased Internet bandwidth andspeed offered by telecommunications carriers, and the proliferationof Wi-Fi availability and connectivity. More data being funneledinto wider communication channels create pressure for capturingand managing that data in a timely and coherent manner.

• Growing variation in types of data assets for analysis: As opposed tothe more traditional methods for capturing and organizing struc-tured datasets, data scientists seek to take advantage of unstructureddata accessed or acquired from a wide variety of sources. Some ofthese sources may reflect minimal elements of structure (such asWeb activity logs or call detail records), while others are completelyunstructured or even limited to specific formats (such as socialmedia data that merges text, images, audio, and video content).To extract usable signal out of this noise, enterprises must enhancetheir existing structured data management approaches to accommo-date semantic text and content-stream analytics.

• Alternate and unsynchronized methods for facilitating data delivery:In a structured environment, there are clear delineations of the

32011 IDC Digital Universe Study: extracting value from chaos, ,http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm..4The Digital Universe in 2020, ,http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf..5See Cisco Press Release of May 30, 2012, ,http://newsroom.cisco.com/press-release-content?type5webcontent&articleId5888280..

5Market and Business Drivers for Big Data Analytics

discrete tasks for data acquisition or exchange, such as bulk filetransfers via tape and disk storage systems, or via file transfer proto-col over the Internet. Today, data publication and exchange is fullof unpredictable peaks and valleys, with data coming from a broadspectrum of connected sources such as websites, transaction proces-sing systems, and even “open data” feeds and streams from govern-ment sources and social media networks like Twitter. This createsnew pressures for rapid acquisition, absorption, and analysis whileretaining currency and consistency across the different datasets.

• Rising demand for real-time integration of analytical results: There aremore people—with an expanding variety of roles—who are consumersof analytical results. The growth is especially noticeable in companieswhere end-to-end business processes are augmented to fully integrateanalytical models to optimize performance. As an example, a retailcompany can monitor real-time sales of tens of thousands of StockKeeping Units (SKUs) at hundreds of retail locations, and log minute-by-minute sales trends. Delivering these massive datasets to a commu-nity of different business users for simultaneous analyses gives newinsight and capabilities that never existed in the past: it allows buyers toreview purchasing patterns to make more precise decisions regardingproduct catalog, product specialists to consider alternate means of bun-dling items together, inventory professionals to allocate shelf space moreefficiently at the warehouse, pricing experts to instantaneously adjustprices at different retail locations directly at the shelf, among other uses.The most effective uses of intelligence demand that analytical systemsmust process, analyze, and deliver results within a defined time window.

1.3 LOWERING THE BARRIER TO ENTRY

Enabling business process owners to take advantage of analytics inmany new and innovative ways has always appeared to be out of reachfor most companies. And the expanding universe of created informa-tion has seemed to tantalizingly dangle broad-scale analytics capabili-ties beyond the reach of those but the largest corporations.

Interestingly, for the most part, much of the technology classified as“big data” is not new. Rather, it is the ability to package these techni-ques in ways that are accessible to organizations in ways that up untilrecently had been limited by budget, resource, and skills constraints,which are typical of smaller businesses. What makes the big data

6 Big Data Analytics

concept so engaging is that emerging technologies enable a broad-scaleanalytics capability with a relatively low barrier to entry.

As we will see, facets of technology for business intelligence andanalytics have evolved to a point at which a wide spectrum of busi-nesses can deploy capabilities that in the past were limited to the larg-est firms with equally large budgets. Consider the four aspects inTable 1.1.

The changes in the environment make big data analytics attractiveto all types of organizations, while the market conditions make it prac-tical. The combination of simplified models for development, commod-itization, a wider palette of data management tools, and low-costutility computing has effectively lowered the barrier to entry, enablinga much wider swath of organizations to develop and test out

Table 1.1 Contrasting Approaches in Adopting High-Performance CapabilitiesAspect Typical Scenario Big Data

Applicationdevelopment

Applications that take advantage of massiveparallelism developed by specializeddevelopers skilled in high-performancecomputing, performance optimization, andcode tuning

A simplified application execution modelencompassing a distributed file system,application programming model,distributed database, and programscheduling is packaged within Hadoop, anopen source framework for reliable,scalable, distributed, and parallelcomputing

Platform Uses high-cost massively parallel processing(MPP) computers, utilizing high-bandwidthnetworks, and massive I/O devices

Innovative methods of creating scalableand yet elastic virtualized platforms takeadvantage of clusters of commodityhardware components (either cycleharvesting from local resources or throughcloud-based utility computing services)coupled with open source tools andtechnology

Datamanagement

Limited to file-based or relational databasemanagement systems (RDBMS) usingstandard row-oriented data layouts

Alternate models for data management(often referred to as NoSQL or “Not OnlySQL”) provide a variety of methods formanaging information to best suit specificbusiness process needs, such as in-memorydata management (for rapid access),columnar layouts to speed query response,and graph databases (for social networkanalytics)

Resources Requires large capital investment inpurchasing high-end hardware to beinstalled and managed in-house

The ability to deploy systems like Hadoopon virtualized platforms allows small andmedium businesses to utilize cloud-basedenvironments that, from both a costaccounting and a practical perspective, aremuch friendlier to the bottom line

7Market and Business Drivers for Big Data Analytics

high-performance applications that can accommodate massive datavolumes and broad variety in structure and content.

1.4 CONSIDERATIONS

While the market conditions suggest that there is a lowered barrier toentry for implementing big data solutions, it does not mean that imple-menting these technologies and business processes is a completelystraightforward task. There is a steep learning curve for developing bigdata applications, especially when going the open source route, whichdemands an investment in time and resources to ensure the big dataanalytics and computing platform are ready for production. And whileit is easy to test-drive some of these technologies as part of an “evalua-tion,” one might think carefully about some key questions beforeinvesting a significant amount of resources and effort in scaling thatlearning curve, such as:

• Feasibility: Is the enterprise aligned in a way that allows for newand emerging technologies to be brought into the organization,tested out, and vetted without overbearing bureaucracy? If not,what steps can be taken to create an environment that is suited tothe introduction and assessment of innovative technologies?

• Reasonability: When evaluating the feasibility of adopting big datatechnologies, have you considered whether your organization facesbusiness challenges whose resource requirements exceed the capabil-ity of the existing or planned environment? If not currently, do youanticipate that the environment will change in the near-, medium-or long-term to be more data-centric and require augmentation ofthe resources necessary for analysis and reporting?

• Value: Is there an expectation that the resulting quantifiable valuethat can be enabled as a result of big data warrants the resource andeffort investment in development and productionalization of thetechnology? How would you define clear measures of value andmethods for measurement?

• Integrability: Are there any constraints or impediments within theorganization from a technical, social, or political (i.e., policy-ori-ented) perspective that would prevent the big data technologiesfrom being fully integrated as part of the operational architecture?What steps need to be taken to evaluate the means by which bigdata can be integrated as part of the enterprise?

8 Big Data Analytics

• Sustainability: While the barrier to entry may be low, the costs asso-ciated with maintenance, configuration, skills maintenance, andadjustments to the level of agility in development may not be sus-tainable within the organization. How would you plan to fund con-tinued management and maintenance of a big data environment?

In Chapter 2, we will begin to scope out the criteria for answeringthese questions as we explore the types of business problems that aresuited to a big data solution.

1.5 THOUGHT EXERCISES

Here are some questions and exercises to ponder before jumping head-first into a big data project:

• What are the sizes of the largest collections of data to be subjectedto capture, storage, and analysis within the organization?

• Detail the five most challenging analytical problems facing yourorganization. How would any of these challenges be addressed if thevolume of data is increased by a factor of 10 and 100, respectively?

• Provide your own definition of what big data means to yourorganization.

• Develop a justification for big data within your organization in onesentence.

• Develop a single graphic image depicting what you believe to be theimpact of increased data volumes and variety.

• Identify three “big data” sources, either within or external to yourorganization that would be relevant to your business.

9Market and Business Drivers for Big Data Analytics

top related