managing growing transaction volumes using hadoop

15
The entire contents of this document are subject to copyright with all rights reserved. All copyrightable text and graphics, the selection, arrangement and presentation of all information and the overall design of the document are the sole and exclusive property of Virtusa. Copyright © 2010 Virtusa Corporation. All rights reserved Click to edit Master title style 2000 West Park Drive Westborough MA 01581 USA Phone: 508 389 7300 Fax: 508 366 9901 Managing Growing Transaction Volumes Using Hadoop Arvind Purushothaman – Director, IM Practice

Upload: arvind-purushothaman

Post on 01-Nov-2014

74 views

Category:

Data & Analytics


1 download

DESCRIPTION

Practical approach to managing growing data volumes by leveraging Hadoop in your Information Architecture

TRANSCRIPT

Page 1: Managing Growing Transaction Volumes Using Hadoop

The entire contents of this document are subject to copyright with all rights reserved. All copyrightable text and graphics, the selection, arrangement and presentation of all information and the overall design of the document are the sole and exclusive property of Virtusa. Copyright © 2010 Virtusa Corporation. All rights reserved

Click to edit Master title style

2000 West Park DriveWestborough MA 01581 USAPhone: 508 389 7300 Fax: 508 366 9901

Managing Growing Transaction Volumes Using

HadoopArvind Purushothaman – Director, IM Practice

Page 2: Managing Growing Transaction Volumes Using Hadoop

2 © Virtusa Corporation ● Confidential

Agenda

• Context Setting

• CIO’s mandate• • Coexistence of architectures

• Evaluation

• Summary

Page 3: Managing Growing Transaction Volumes Using Hadoop

3 © Virtusa Corporation ● Confidential

During this presentation...

In the Millennial World 15 minutes is a long time……..

1.8Mn tweets will be generated

Apple will receive

about 700,000 App downloads

Brands & Organisations will receive around

500,000 likes on Facebook

Over 14Mn status updates on FACEBOOK

54,000 photos will be shared on

INSTAGRAM

Over 13Mn pieces of new FACEBOOK content

will be created

Over 3Bn email messages will be

sent

Google will receive over

30Mn Search QueriesOver 8,000 new

websites will be created

Sources: Forrester Research, Hubspot Centre for Social Media, The Social Skinny, AlTwitter

Page 4: Managing Growing Transaction Volumes Using Hadoop

4 © Virtusa Corporation ● Confidential

…Consumers will spend over $5Mn online shopping

During the course of this presentation……..

44% of companies who tweet acquired new

customers

Almost 8 new people come onto the internet

every second

57% of Companies who blog acquired new

customers

61% of global internet users research products

online

9/10 mobile searches lead to

action…

…Over half lead to purchaseSources:

Forrester Research, Hubspot Centre for Social Media, The Social Skinny, AlTwitter

Page 5: Managing Growing Transaction Volumes Using Hadoop

5 © Virtusa Corporation ● Confidential

?BIG DATABIG NOISE

BIG OPPORTUNITYTechnology enables you to make sense out of

ALL Available Data

Page 6: Managing Growing Transaction Volumes Using Hadoop

6 © Virtusa Corporation ● Confidential

How the Industry defines Big Data ?

Gartner Defines Big Data is high-volume, high-velocity and high-variety information assets that demand cost effective, innovative forms of information processing for enhanced insight and decision making

Forrester Defines The frontier of a firm’s ability to store, process and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.

IBM: “….Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach..”

Oracle: “…. Big Data refers to datasets that grow so large that it is difficult to

capture, store, manage, share, analyze and visualize with the

typical database software tools…”

Website

Network Switches

Social Media

RFIDTransactional / operational systems

Page 7: Managing Growing Transaction Volumes Using Hadoop

7 © Virtusa Corporation ● Confidential

CIO’s manifesto

Support business growth through

innovationLower costs

Both are not optional – you need to lower costs and innovate at the same time

In the Information Management world, this means exponentially more data volumes, different types of data

More investments in data storage, computing power, licenses

What is the way forward?

Page 8: Managing Growing Transaction Volumes Using Hadoop

8 © Virtusa Corporation ● Confidential

Relational/Analytical

Relational/Analytical

Financial Data

Marketing Data

Data Warehouse(Relational)

Data Mart

Data Mart

Sales Data

Data Warehouse Access

Parametric & Ad Hoc reporting

OLAP

Dashboards

Exploratory Visualization

Direct Data Access

ETL

Data Points Data stores Access to BI Platform Insight Generation

Hadoop As Data Transformation Platform

Transactions

Logs

Big Data Cluster (Hadoop)

Parsed data Analytic data sets

Raw Data Master Data

Real Time Store(No SQL)

Big Data Access

Business Intelligence Platform

Statistical Analysis

Machine Learning

Open Source ETL

Streaming

ETL

Page 9: Managing Growing Transaction Volumes Using Hadoop

9 © Virtusa Corporation ● Confidential

Hybrid Architecture For A Telecom Client That Leverages HDFS, HBase, and Oracle 11g

Source

Integration & Infrastructure PlatformSDEDS (APP10765)

Bill & Payments PlatformONM (APP10487)

HDFS

HBase

HADOOP CLUSTER

Raw Call Data

CDR Store

MapReduce

ICS OCS Answered Unanswered Diverted Others

REST GATEWAY

UI Reports

UI Reports

UI Reports

ETL

Call Summary Data

Oracle DB

Month Date Hour

Level

Page 10: Managing Growing Transaction Volumes Using Hadoop

10 © Virtusa Corporation ● Confidential

Technology Components Of Hadoop

Core• HDFS + MapReduce

Data Movement• Relational Database – Sqoop• Real-time – Flume

NoSQL•HBase

Scheduling• Oozie

Analytics• Cloudera Impala, Tableau with Hive

Machine Learning•Mahout

Page 11: Managing Growing Transaction Volumes Using Hadoop

11 © Virtusa Corporation ● Confidential

3W’s – What, Where and When

Traditional DW data

Semi and Un-structured data Historical , Infrequently AccessedLegal & Regulatory

Insights

Post shelf life

Post processing – DW

85% tables and 50% columns unused*

* Source: TDWI

Page 12: Managing Growing Transaction Volumes Using Hadoop

12 © Virtusa Corporation ● Confidential

Decision Points

Source: Dr. Amr Awadallah and Dan Graham, “Hadoopand the Data Warehouse: When to Use Which”, copublishedby Cloudera, Inc. and Teradata Corporation.*HBase.

Page 13: Managing Growing Transaction Volumes Using Hadoop

13 © Virtusa Corporation ● Confidential

Cost Considerations

ETL Hadoop

Hardware Expensive Low

Software Expensive Low

Development Medium Medium

Maintenance High Low

Investment High upfront Invest as needed

Page 14: Managing Growing Transaction Volumes Using Hadoop

14 © Virtusa Corporation ● Confidential

How Can You Get Started

• Hadoop as an Enterprise Data Management platform is here to stay

• Get started – either moving “unused data” or bringing in additional sources and types of data

• In addition to “back-end” type functions, it provides Analytical capabilities in its own right

• To start small, leverage Hadoop on the Cloud

• Co-Existence is going to be the key for successful adoption

Build a good use case before you start, build a POC, Evangelize It

Page 15: Managing Growing Transaction Volumes Using Hadoop

US - Boston, New York UK - Windsor, London India – Hyderabad, Chennai Sri Lanka - Colombo

www.virtusa.com

© 2010 All rights reserved. Virtusa and all other related logos are either registered trademarks or trademarks of Virtusa Corporation in the United States, the European Union, and/or India. All other company and service names are the property of their respective holders and may be registered trademarks or trademarks in the United States and/or other countries.