architecting for big data analytics and beyond - 1105 mediadownload.101com.com/pub/tdwi/files/tt...
TRANSCRIPT
Business Analytics | © TechTarget
Wayne W. EckersonDirector of Research, TechTargetFounder, BI Leadership Forum
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
What comes next?
● Kilobyte (KB) – 103 bytes● Megabyte (MB) –106 bytes● Gigabyte (GB) – 109 bytes● Terabyte (TB) –1012 bytes● Petabyte (PB) – 1015 bytes● –1018 bytes● – 1021 bytes● – 1024 bytes
3
Exabyte (EB)Zettabyte (ZB)Yottabyte (YB)
Information explosion
4
Every 18 months, non-rich structured and unstructured enterprise data doubles
2005 2006 2007 2008 2009 2010 2011 2012
Unstructured &Content Depot
Structured &Replicated
Source: IDC Digital Universe 2009; White Paper, Sponsored by EMC, May 2009
Data deluge
● Structured data - Call detail records- Point of sale records- Claims data
● Semi-structured data- Web logs- Sensor data- Email, Twitter
● Unstructured data- Video, Audio, - Images, Text
5
“A Sea of Sensors”, The Economist, Nov 4, 2010
Three “Big Data” revolutions
● Data warehousing (1995+) ● Analytical platforms (2005+)● Hadoop ecosystem (2010+)
6Business Analytics | © TechTarget
First revolution: data warehousing
7Business Analytics | © TechTarget
BI Server
Operational System
Operational System
Data MartData
Warehouse
Reports /Dashboards
Operational System
Operational System
Data Warehouse
ETL ETL
Second revolution: analytical platforms
1010dataAster Data (Teradata)CalpontDatallegro (Microsoft)ExasolGreenplum (EMC)IBM SmartAnalyticsInfobrightKognitioNetezza (IBM)Oracle ExadataParaccelPervasiveSand TechnologySAP HANASybase IQ (SAP)TeradataVertica (HP)
Purpose-built database management systems designed explicitly for query processing
and analysis that provides dramatically higher
price/performance and availability compared to general
purpose solutions.
Deployment Options-Software only (Paraccel, Vertica)-Appliance (SAP, Exadata, Netezza)-Hosted(1010data, Kognitio)
Game-changing technology
● Purpose built- For analytics in general - For specific analytic workloads
● Quicker to deploy- Preconfigured and tuned- Fast ROI
● Faster and more scalable- Faster query response times- Linear performance
● Built-in analytics- Libraries of functions- Extensible SDK
● Less costly - Less power, cooling, space- Fewer people to maintain
Business value of analytic platforms
● Kelley Blue Book –Consolidates millions of auto transactions each week to calculate car valuations
● AT&T Mobility – Tracks purchasing patterns for 80M customers daily to optimize targeted marketing
● CBS Interactive – Analyzes Web visitor behavior to optimize content/ad placement and revenue
MPP Analytical Database
Analytical appliance
Hadoop + Analytical database
Third Revolution - Hadoop
Business Analytics | © TechTarget 11
•Open source projects•Hosted by Apache Foundation•Initially developed by Google, Yahoo, etc.•Offers scale out architecture on commodity servers with direct attached storage
Click to edit Master title styleHadoop distilled
12
Open Source $$
MapReduce
“Schema at Read”
Unstructured data
BIG DATA
Distributed File System
Benefits- Any data- Agile- Expressive- Affordable
Drawbacks- Immature- Batch oriented- Security, concurrency, metadata, etc. - Expertise- TCO?
Data scientist
Click to edit Master title styleHadoop hype
13
Gartner Group – Hype Cycle
Overheard“Hadoop will replace relational databases.”
“Hadoop will replace data warehouses.”
“Hadoop has a superior query engine compared to analytical platforms.”
“Use Hadoop for any application that requires more than one node.”
Hadoop adoption rates
14
38%
32%
20%
5%
4%
No plans
Considering
Experimenting
Implementing
In production
Based on 158 respondents, BI Leadership Forum, April, 2012
Hadoop workloads
15
92%
92%
83%
58%
42%
25%
58%
92%
92%
92%
67%
67%
67%
83%
Staging area
Online archive
Transformation Engine
Ad hoc queries
Scheduled reports
Visual exploration
Data mining
Today In 18 Months
Based on respondents that have implemented Hadoop. BI Leadership Forum, April, 2012
Hadoop’s impact on the data warehouse
16
0%
50%
67%
33%
25%
8%
Replaces it
Offloads existing workloads
Handles new workloads
Shares existing workloads
Shares new workloads
Don't know
Based on respondents that have implemented Hadoop. BI Leadership Forum, April, 2012
Business Intelligence
17Analytics Intelligence
Continuous IntelligenceCo
nten
t Int
ellig
ence
Data Warehousing
Ad hoc query, Spreadsheets, OLAP, Visual Analysis, Analytic
Workbenches, Hadoop
Analytic Sandboxes
Event-driven
Reports and Dashboards
MAD Dashboards
Data Ware-housing
End-User Tools
Event-Driven A
lerts and D
ashboards
BI Framework 2020
Ad hoc SQL
Dashboard A
lerts
Event detection and correlation
CEP, Streams
Analytic Sandboxes
Design Framework
Architecture
Reporting &
Analysis
Excel, Access, OLAP, Data mining, visual exploration
Keyw
ord
sear
ch, B
I too
ls,
Xque
ry, H
ive,
Jav
a, e
tc.
Map
Redu
ce, X
ML
sche
ma,
Ke
y-va
lue
pairs
, gra
ph
nota
tion,
etc
.
HD
FS, N
oSQ
Lda
tabs
es
ExplorationPower Users
Reporting & Monitoring (Casual Users)
Predefined Metrics
Corporate Objectives and StrategyTOP DOWN- “Business Intelligence”
Processes and Projects
Analysis and Prediction (Power Users)
Ad hoc queries
Analysis Begets
Reports
Reports Beget
Analysis
Pros:- Alignment-ConsistencyCons: - Hard to build- Politically charged- Hard to change- Expensive- “Schema Heavy”
Pros: - Quick to build- Politically uncharged- Easy to change-Low cost Cons: - Alignment- Consistency- “Schema Light”
Data Warehousing Architecture
Non-volatile Data
Analytics Architecture
Volatile Data
18
BI Framework
The new analytical ecosystem
19
Machine Data
Web Data
Hadoop Cluster
Operational Systems(Structured data)
Power User
BI Server
Casual UserOperational System
Operational System
Documents & Text
Free-StandingSandbox
Dept Data Mart
Data Warehouse
Virtual Sandboxes
Top-down Architecture
Bottom-up Architecture
External Data
Audio/video Data
Streaming/ CEP Engine
Extract, Transform, Load(Batch, near real-time, or real-time)
Analytic platform or non-relational database
In-memory Sandbox
www.bileadership.com
Analytical sandboxes
20www.bileadership.com
Machine Data
Web Data
Hadoop Cluster
Operational Systems(Structured data)
Power User
BI Server
Casual UserOperational System
Operational System
Documents & Text
Free-StandingSandbox
Dept Data Mart
Data Warehouse
Virtual Sandboxes
Top-down Architecture
Bottom-up Architecture
External Data
Audio/video Data
Streaming/ CEP Engine
Extract, Transform, Load(Batch, near real-time, or real-time)
Analytic platform or non-relational database
In-memory Sandbox
Recommendations
● Your BI architecture is now an “analytical ecosystem”
● Deploy analytical platforms to turbo-charge performance
● Explore Hadoop for “big data”● Reconcile top-down and bottom-up BI
environments
21Business Analytics | © TechTarget
Hadoop ecosystem
Business Analytics | © TechTarget 23
Courtesy, Hortonworks, 2012.