research strategy report performing big data …

13
Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption © Analysys Mason Limited 2016 RESEARCH STRATEGY REPORT analysysmason.com PERFORMING BIG DATA ANALYTICS USING HADOOP: ITS COMPLEX ECOSYSTEM IS LIMITING CSPs’ ADOPTION JUSTIN VAN DER LANDE

Upload: others

Post on 08-May-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016

RESEARCH STRATEGY REPORT

analysysmason.com

PERFORMING BIG DATA ANALYTICS USING HADOOP: ITS

COMPLEX ECOSYSTEM IS LIMITING CSPs’ ADOPTION

JUSTIN VAN DER LANDE

Page 2: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016

KEY QUESTIONS ANSWERED IN THIS REPORT

WHO NEEDS TO READ THIS REPORT

2

This report analyses the increasing use of the Apache Hadoop

technology by communications service providers (CSPs). Although

Hadoop is not the only big data analytics infrastructure available,

it illustrates the shift that CSPs and vendors are making towards

the adoption of new, low-cost and more powerful technology to

enable them to store, compute and analyse big data sets.

Hadoop is not yet the dominant technology in the data

infrastructure market, but CSPs and vendors increasingly consider

it to be production-ready for telecoms applications. In addition,

Hadoop has the support of a strong and active set of vendors, as

well as the wider open-source community, and its highly active

development teams can ensure that the technology’s future

development will be supported.

This report provides recommendations for vendors that use

Hadoop-based technology for analytics, and for CSPs that use

Hadoop as part of their big data infrastructure.

The report is based on several sources:

Analysys Mason’s internal research, which draws on vendor

engagements

interviews of stakeholders in the data infrastructure market.

About this report

How should Hadoop and related technologies be used to support big

data analytics use cases for CSPs?

Which core technologies does Hadoop use?

Which companies supply, distribute and support the technology, and

what do they provide?

Where have CSPs deployed Hadoop within their organisations?

Which business cases is Hadoop being used to address?

Vendors that are active in the provision of big data analytics systems,

and need to understand their market.

Vendors that provide systems to CSPs that may need to integrate

Hadoop-based systems to support their current applications.

CSPs considering big data systems and technology for analytics use

cases.

Page 3: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016

CONTENTS CONTENTS

EXECUTIVE SUMMARY

HADOOP ARCHITECTURE AND COMPONENTS

KEY VENDOR SOLUTIONS

HADOOP IMPLMENTATIONS

ABOUT THE AUTHOR AND ANALYSYS MASON

3

Page 4: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016

Hadoop

cluster

Figure 1: Flow diagram indicating how Hadoop is increasingly being used as part of CSPs’

data infrastructure, in conjunction with traditional data warehouse technology

4

CSPs should increase their use of Hadoop to support specific

data infrastructure use cases, but they must ensure that the

components that they select are suitable. In addition, vendors

can help CSPs to unlock Hadoop’s potential by productising

Hadoop integrations with their applications.

CSPs have been performing big data analytics for many years, and

Hadoop-based solutions have recently been deployed by CSPs

because of their low cost and scalability. However, Hadoop is only

suitable for certain business cases, and compared with more-

established and more-mature solutions, it requires additional

resources to support its complex components and rapidly

changing ecosystem.

As a result, CSPs hesitate to deploy Hadoop – or they limit its use

to only a part of their data infrastructure. Vendors’ solutions that

are based on Hadoop are therefore being delayed as a result of

CPSs’ slow deployments.

This report provides:

an understanding of the Hadoop technology and why it is

difficult for CSPs to adopt

an overview of how vendors have incorporated Hadoop

technology in to their solutions to address CSPs’ issues

a discussion of how – and why – CSPs are using Hadoop.

Executive summary

Source: Analysys Mason

Operational data is

increasingly stored

in Hadoop and

diverted from the

Enterprise Data

Warehouse (EDW)

OSS/BSS

application

data stores

Enterprise

data

warehouse

Data consolidation

Refinement and

enhancement of current

and new functionality and

insights, driven by

application vendors with

close coupling of data

Data consolidation

Page 5: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016

Figure 2: the shift to a hybrid data infrastructure to encompass Hadoop technology, driven by

new data types and business requirements, is complex and slow

5

CSPs must store and analyse large volumes of data to remain

competitive, and Hadoop can help to address these needs with

its powerful and low-cost technology. However, Hadoop is

supplied as open-source software within a fragmented

ecosystem that is not considered to be as robust as traditional

technology.

Hadoop’s ecosystem consists of dozens of components that

complement and compete with each other. The different

requirements of different CSPs dictate which types of components

are required, and commercial considerations inform the selection

of the supplier that is used.

Vendors and CSPs should avoid selecting combinations of Hadoop

components that may effectively become proprietary and would

therefore restrict their ability to develop, support or purchase

solutions that run on these components.

Vendors understand that they need to help support and integrate

their solutions with CSPs’ changing data infrastructure. The

different permutations and combinations of Hadoop components

can potentially create a fragmented data infrastructure, which

would need to be supported by software solutions. As a result,

development and implementation costs would increase because

each variation requires porting and testing.

Hadoop adoption has been slow amongst CSPs because it is a complex

technology and has a fragmented ecosystem of over 120 projects

New data sources

(Web logs, email,

clickstream, social media,

sensor data

Traditional Data

sources (RDBMS, LLTP,

OLAP)

NoSQL Hadoop based

data stores Traditional data stores

Embedded analytics

tools and packed

solutions

Standalone analytics

tools

Applications

Data infrastructure

Data sources

Source: Analysys Mason

Page 6: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016

Sqoop Flume

Chukwa

Pig

Hive

Hcatalog

Lucene Crunch

Avro

Thrift

Manhout

Ambari

Zookeeper

Hama

Oozie

Figure 3: Hadoop is an ecosystem of different projects that relate to each other

6

The Hadoop ecosystem can support any business case need, but

CSPs and vendors must understand the development history

and the functionality associated with each component in order

to create a combination of components that best supports their

business needs.

In February 2015, the Open Data Platform initiative (ODPi) was set

up to address the challenge of Hadoop’s fractured ecosystem, and

to create a certification programme to test the conformity of new

components. However, not all vendors participate in the initiative,

including some of the main distributors.

This report examines Hadoop’s utility from three perspectives:

We provide an overview of Hadoop, including the creation and

evolution of its ecosystem through different development

projects, as well as its open-software method of going to

market. We also explain the different components that make up

its ecosystem and their functionality.

We examine different vendor approaches to using Hadoop to

provide big data analytics systems. The report covers the three

main distributors of the software, and provides examples of

telecoms-specific vendors that use them.

We discuss how CSPs are using Hadoop within their

organisations through a series of case studies, and how they

have installed, purchased and integrated the technology.

CSPs and vendors must understand the Hadoop ecosystem in order to

implement the technology successfully

HDFS

MapReduce

YARN

Hadoop Core

Source: Analysys Mason

Cassandra

Page 7: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016 7

Recommendations

1 Vendors that use Hadoop need to decide which of its components and distributions they will use for their

solutions in order to reduce development efforts and to provide the support required for a stable platform.

Vendors that adopt Hadoop as part of their product set must make informed decisions about which Hadoop

components to select. This will enable them to provide a stable and consistent development environment for

building, testing and deploying their product solutions. The choice of components must reflect the data

requirements and needs of their target use cases and be acceptable to the CSPs that they are targeting.

2 Vendors should only adopt established Hadoop components that are available from all of the major

distributions in order to ensure that they can address the widest possible market.

Many of Hadoop’s components are offered by multiple distributors. Some components are only available from a

single distributor, and this is particularly the case for newer components that address near-real-time data

requirements. Vendors should only select components that are available from all of the major distributions,

ensuring that their solutions can be installed on data infrastructure for the largest number of potential customers.

3 CSPs should establish their own Hadoop architecture to encourage vendors to meet CSPs’ requirements,

including products that integrate legacy data components into Hadoop.

Where CSPs have not established their own Hadoop architecture, different approaches will be deployed with every

vendors’ application or solution. This haphazard approach creates a complex environment that is difficult to

support and is less predictable in the way it performs when supporting different business requirements.

Page 8: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016

CONTENTS CONTENTS

EXECUTIVE SUMMARY

HADOOP ARCHITECTURE AND COMPONENTS

KEY VENDOR SOLUTIONS

HADOOP IMPLMENTATIONS

ABOUT THE AUTHOR AND ANALYSYS MASON

25

Page 9: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016 26

About the author

Justin van der Lande (Principal Analyst) leads the Analytics, Customer Experience Management and CSP IT Strategies research programmes,

which are part of Analysys Mason’s Telecoms Software research stream. He specialises in business intelligence and analytics tools, the

functionality of which cuts across all of the research programmes in this area. He also provides project management for large-scale projects

within our Telecoms Software research. Justin has more than 20 years’ experience in the communications industry in software development,

marketing and research. He has held senior positions at NCR/AT&T, Micromuse (IBM), Granite Systems (Telcordia) and at the TM Forum. Justin

holds a BSc in Management Science and Computer Studies from the University of Wales

Page 10: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016 27

About Analysys Mason

Knowing what’s going on is one thing. Understanding how to take advantage of events is quite another. Our ability to understand the

complex workings of telecoms, media and technology (TMT) industries and draw practical conclusions, based on the specialist

knowledge of our people, is what sets Analysys Mason apart. We deliver our key services via two channels: consulting and research.

Consulting

Our focus is exclusively on TMT.

We support multi-billion dollar investments,

advise clients on regulatory matters,

provide spectrum valuation and auction

support, and advise on operational

performance, business planning

and strategy.

We have developed rigorous

methodologies that deliver tangible

results for clients around the world.

For more information, please visit

www.analysysmason.com/consulting

Research

We analyse, track and forecast the different

services accessed by consumers and

enterprises, as well as the software,

infrastructure and technology

delivering those services.

Research clients benefit from

regular and timely intelligence

in addition to direct access to

our team of expert analysts.

Our dedicated Custom Research

team undertakes specialised

and bespoke projects for clients.

For more information, please visit

www.analysysmason.com/research

27

Consumer and SME services

Digital economy

Regional markets

Network technologies

Telecoms software

Strategy and planning

Transaction support

Performance improvement

Regulation and policy

Page 11: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016 28

Research from Analysys Mason

We provide dedicated coverage of developments in the telecoms, media and technology (TMT) sectors, through a

range of research programmes that focus on different services and regions of the world.

To find out more, please visit www.analysysmason.com/research

28

PROGRAMMES

Service Assurance

Customer Experience Management

Customer Care

Revenue Management

Analytics

Network Orchestration

Software-Controlled Networking

Service Delivery Platforms

Service Fulfilment

Telecoms Software Market Shares

Telecoms Software Forecasts

PROGRAMMES

Digital Economy Strategies

Digital Economy Platforms

Future Comms and Media

IoT and M2M Solutions

PROGRAMMES

Mobile Services

Mobile Devices

Fixed Broadband and Multi-Play

SME Strategies

PROGRAMMES

Fixed Networks

Wireless Networks

Spectrum

Consumer and SME services

Digital economy

Regional markets

Telecoms software

Network technologies

PROGRAMMES

Global Telecoms Forecasts

Asia–Pacific

The Middle East and Africa

European Country Reports

European Core Forecasts

European Telecoms Market Matrix

Research portfolio

Page 12: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016 29

Consulting from Analysys Mason

For 30 years, our consultants have been bringing the benefits of applied intelligence to enable clients around the

world to make the most of their opportunities.

To find out more, please visit www.analysysmason.com/consulting

29

Consulting portfolio

Strategy and planning

Transaction support

EXPERTISE

Commercial due diligence

Regulatory due diligence

Technical due diligence

Regulation

EXPERTISE

Policy development and response

Margin squeeze tests

Analysing regulatory accounts

Expert legal support

Media regulation

Postal sector costing, pricing and regulation

Regulatory economic costing

Net cost of universal service

Performance improvement

EXPERTISE

Market research

Market analysis

Business strategy and planning

Market sizing and forecasting

Benchmarking and best practice

National and regional broadband strategy and implementation

EXPERTISE

Performance analysis

Technology optimisation

Commercial excellence

Transformation services

EXPERTISE

Radio spectrum auction support

Radio spectrum management

Spectrum policy and auction support

Page 13: RESEARCH STRATEGY REPORT PERFORMING BIG DATA …

Performing big data analytics using Hadoop: its complex ecosystem is limiting CSPs’ adoption

© Analysys Mason Limited 2016

PUBLISHED BY ANALYSYS MASON LIMITED IN FEBRUARY 2016

Bush House • North West Wing • Aldwych • London • WC2B 4PJ • UK

Tel: +44 (0)20 7395 9000 • Email: [email protected] • www.analysysmason.com/research • Registered in England No. 5177472

© Analysys Mason Limited 2016. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means – electronic,

mechanical, photocopying, recording or otherwise – without the prior written permission of the publisher.

Figures and projections contained in this report are based on publicly available information only and are produced by the Research Division of Analysys Mason Limited independently of any

client-specific work within Analysys Mason Limited. The opinions expressed are those of the stated authors only.

Analysys Mason Limited recognises that many terms appearing in this report are proprietary; all such trademarks are acknowledged and every effort has been made to indicate them by the

normal UK publishing practice of capitalisation. However, the presence of a term, in whatever form, does not affect its legal status as a trademark.

Analysys Mason Limited maintains that all reasonable care and skill have been used in the compilation of this publication. However, Analysys Mason Limited shall not be under any liability for

loss or damage (including consequential loss) whatsoever or howsoever arising as a result of the use of this publication by the customer, his servants, agents or any third party.