managing growing transaction volumes using hadoop
DESCRIPTION
Practical approach to managing growing data volumes by leveraging Hadoop in your Information ArchitectureTRANSCRIPT
The entire contents of this document are subject to copyright with all rights reserved. All copyrightable text and graphics, the selection, arrangement and presentation of all information and the overall design of the document are the sole and exclusive property of Virtusa. Copyright © 2010 Virtusa Corporation. All rights reserved
Click to edit Master title style
2000 West Park DriveWestborough MA 01581 USAPhone: 508 389 7300 Fax: 508 366 9901
Managing Growing Transaction Volumes Using
HadoopArvind Purushothaman – Director, IM Practice
2 © Virtusa Corporation ● Confidential
Agenda
• Context Setting
• CIO’s mandate• • Coexistence of architectures
• Evaluation
• Summary
3 © Virtusa Corporation ● Confidential
During this presentation...
In the Millennial World 15 minutes is a long time……..
1.8Mn tweets will be generated
Apple will receive
about 700,000 App downloads
Brands & Organisations will receive around
500,000 likes on Facebook
Over 14Mn status updates on FACEBOOK
54,000 photos will be shared on
Over 13Mn pieces of new FACEBOOK content
will be created
Over 3Bn email messages will be
sent
Google will receive over
30Mn Search QueriesOver 8,000 new
websites will be created
Sources: Forrester Research, Hubspot Centre for Social Media, The Social Skinny, AlTwitter
4 © Virtusa Corporation ● Confidential
…Consumers will spend over $5Mn online shopping
During the course of this presentation……..
44% of companies who tweet acquired new
customers
Almost 8 new people come onto the internet
every second
57% of Companies who blog acquired new
customers
61% of global internet users research products
online
9/10 mobile searches lead to
action…
…Over half lead to purchaseSources:
Forrester Research, Hubspot Centre for Social Media, The Social Skinny, AlTwitter
5 © Virtusa Corporation ● Confidential
?BIG DATABIG NOISE
BIG OPPORTUNITYTechnology enables you to make sense out of
ALL Available Data
6 © Virtusa Corporation ● Confidential
How the Industry defines Big Data ?
Gartner Defines Big Data is high-volume, high-velocity and high-variety information assets that demand cost effective, innovative forms of information processing for enhanced insight and decision making
Forrester Defines The frontier of a firm’s ability to store, process and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.
IBM: “….Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach..”
Oracle: “…. Big Data refers to datasets that grow so large that it is difficult to
capture, store, manage, share, analyze and visualize with the
typical database software tools…”
Website
Network Switches
Social Media
RFIDTransactional / operational systems
7 © Virtusa Corporation ● Confidential
CIO’s manifesto
Support business growth through
innovationLower costs
Both are not optional – you need to lower costs and innovate at the same time
In the Information Management world, this means exponentially more data volumes, different types of data
More investments in data storage, computing power, licenses
What is the way forward?
8 © Virtusa Corporation ● Confidential
Relational/Analytical
Relational/Analytical
Financial Data
Marketing Data
Data Warehouse(Relational)
Data Mart
Data Mart
Sales Data
Data Warehouse Access
Parametric & Ad Hoc reporting
OLAP
Dashboards
Exploratory Visualization
Direct Data Access
ETL
Data Points Data stores Access to BI Platform Insight Generation
Hadoop As Data Transformation Platform
Transactions
Logs
Big Data Cluster (Hadoop)
Parsed data Analytic data sets
Raw Data Master Data
Real Time Store(No SQL)
Big Data Access
Business Intelligence Platform
Statistical Analysis
Machine Learning
Open Source ETL
Streaming
ETL
9 © Virtusa Corporation ● Confidential
Hybrid Architecture For A Telecom Client That Leverages HDFS, HBase, and Oracle 11g
Source
Integration & Infrastructure PlatformSDEDS (APP10765)
Bill & Payments PlatformONM (APP10487)
HDFS
HBase
HADOOP CLUSTER
Raw Call Data
CDR Store
MapReduce
ICS OCS Answered Unanswered Diverted Others
REST GATEWAY
UI Reports
UI Reports
UI Reports
ETL
Call Summary Data
Oracle DB
Month Date Hour
Level
10 © Virtusa Corporation ● Confidential
Technology Components Of Hadoop
Core• HDFS + MapReduce
Data Movement• Relational Database – Sqoop• Real-time – Flume
NoSQL•HBase
Scheduling• Oozie
Analytics• Cloudera Impala, Tableau with Hive
Machine Learning•Mahout
11 © Virtusa Corporation ● Confidential
3W’s – What, Where and When
Traditional DW data
Semi and Un-structured data Historical , Infrequently AccessedLegal & Regulatory
Insights
Post shelf life
Post processing – DW
85% tables and 50% columns unused*
* Source: TDWI
12 © Virtusa Corporation ● Confidential
Decision Points
Source: Dr. Amr Awadallah and Dan Graham, “Hadoopand the Data Warehouse: When to Use Which”, copublishedby Cloudera, Inc. and Teradata Corporation.*HBase.
13 © Virtusa Corporation ● Confidential
Cost Considerations
ETL Hadoop
Hardware Expensive Low
Software Expensive Low
Development Medium Medium
Maintenance High Low
Investment High upfront Invest as needed
14 © Virtusa Corporation ● Confidential
How Can You Get Started
• Hadoop as an Enterprise Data Management platform is here to stay
• Get started – either moving “unused data” or bringing in additional sources and types of data
• In addition to “back-end” type functions, it provides Analytical capabilities in its own right
• To start small, leverage Hadoop on the Cloud
• Co-Existence is going to be the key for successful adoption
Build a good use case before you start, build a POC, Evangelize It
US - Boston, New York UK - Windsor, London India – Hyderabad, Chennai Sri Lanka - Colombo
www.virtusa.com
© 2010 All rights reserved. Virtusa and all other related logos are either registered trademarks or trademarks of Virtusa Corporation in the United States, the European Union, and/or India. All other company and service names are the property of their respective holders and may be registered trademarks or trademarks in the United States and/or other countries.