architecting for big data analytics and beyond - 1105 mediadownload.101com.com/pub/tdwi/files/tt...

23

Upload: others

Post on 15-Sep-2019

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to
Page 2: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Business Analytics | © TechTarget

Wayne W. EckersonDirector of Research, TechTargetFounder, BI Leadership Forum

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Page 3: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

What comes next?

● Kilobyte (KB) – 103 bytes● Megabyte (MB) –106 bytes● Gigabyte (GB) – 109 bytes● Terabyte (TB) –1012 bytes● Petabyte (PB) – 1015 bytes● –1018 bytes● – 1021 bytes● – 1024 bytes

3

Exabyte (EB)Zettabyte (ZB)Yottabyte (YB)

Page 4: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Information explosion

4

Every 18 months, non-rich structured and unstructured enterprise data doubles

2005 2006 2007 2008 2009 2010 2011 2012

Unstructured &Content Depot

Structured &Replicated

Source: IDC Digital Universe 2009; White Paper, Sponsored by EMC, May 2009

Page 5: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Data deluge

● Structured data - Call detail records- Point of sale records- Claims data

● Semi-structured data- Web logs- Sensor data- Email, Twitter

● Unstructured data- Video, Audio, - Images, Text

5

“A Sea of Sensors”, The Economist, Nov 4, 2010

Page 6: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Three “Big Data” revolutions

● Data warehousing (1995+) ● Analytical platforms (2005+)● Hadoop ecosystem (2010+)

6Business Analytics | © TechTarget

Page 7: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

First revolution: data warehousing

7Business Analytics | © TechTarget

BI Server

Operational System

Operational System

Data MartData

Warehouse

Reports /Dashboards

Operational System

Operational System

Data Warehouse

ETL ETL

Page 8: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Second revolution: analytical platforms

1010dataAster Data (Teradata)CalpontDatallegro (Microsoft)ExasolGreenplum (EMC)IBM SmartAnalyticsInfobrightKognitioNetezza (IBM)Oracle ExadataParaccelPervasiveSand TechnologySAP HANASybase IQ (SAP)TeradataVertica (HP)

Purpose-built database management systems designed explicitly for query processing

and analysis that provides dramatically higher

price/performance and availability compared to general

purpose solutions.

Deployment Options-Software only (Paraccel, Vertica)-Appliance (SAP, Exadata, Netezza)-Hosted(1010data, Kognitio)

Page 9: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Game-changing technology

● Purpose built- For analytics in general - For specific analytic workloads

● Quicker to deploy- Preconfigured and tuned- Fast ROI

● Faster and more scalable- Faster query response times- Linear performance

● Built-in analytics- Libraries of functions- Extensible SDK

● Less costly - Less power, cooling, space- Fewer people to maintain

Page 10: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Business value of analytic platforms

● Kelley Blue Book –Consolidates millions of auto transactions each week to calculate car valuations

● AT&T Mobility – Tracks purchasing patterns for 80M customers daily to optimize targeted marketing

● CBS Interactive – Analyzes Web visitor behavior to optimize content/ad placement and revenue

MPP Analytical Database

Analytical appliance

Hadoop + Analytical database

Page 11: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Third Revolution - Hadoop

Business Analytics | © TechTarget 11

•Open source projects•Hosted by Apache Foundation•Initially developed by Google, Yahoo, etc.•Offers scale out architecture on commodity servers with direct attached storage

Page 12: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Click to edit Master title styleHadoop distilled

12

Open Source $$

MapReduce

“Schema at Read”

Unstructured data

BIG DATA

Distributed File System

Benefits- Any data- Agile- Expressive- Affordable

Drawbacks- Immature- Batch oriented- Security, concurrency, metadata, etc. - Expertise- TCO?

Data scientist

Page 13: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Click to edit Master title styleHadoop hype

13

Gartner Group – Hype Cycle

Overheard“Hadoop will replace relational databases.”

“Hadoop will replace data warehouses.”

“Hadoop has a superior query engine compared to analytical platforms.”

“Use Hadoop for any application that requires more than one node.”

Page 14: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Hadoop adoption rates

14

38%

32%

20%

5%

4%

No plans

Considering

Experimenting

Implementing

In production

Based on 158 respondents, BI Leadership Forum, April, 2012

Page 15: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Hadoop workloads

15

92%

92%

83%

58%

42%

25%

58%

92%

92%

92%

67%

67%

67%

83%

Staging area

Online archive

Transformation Engine

Ad hoc queries

Scheduled reports

Visual exploration

Data mining

Today In 18 Months

Based on respondents that have implemented Hadoop. BI Leadership Forum, April, 2012

Page 16: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Hadoop’s impact on the data warehouse

16

0%

50%

67%

33%

25%

8%

Replaces it

Offloads existing workloads

Handles new workloads

Shares existing workloads

Shares new workloads

Don't know

Based on respondents that have implemented Hadoop. BI Leadership Forum, April, 2012

Page 17: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Business Intelligence

17Analytics Intelligence

Continuous IntelligenceCo

nten

t Int

ellig

ence

Data Warehousing

Ad hoc query, Spreadsheets, OLAP, Visual Analysis, Analytic

Workbenches, Hadoop

Analytic Sandboxes

Event-driven

Reports and Dashboards

MAD Dashboards

Data Ware-housing

End-User Tools

Event-Driven A

lerts and D

ashboards

BI Framework 2020

Ad hoc SQL

Dashboard A

lerts

Event detection and correlation

CEP, Streams

Analytic Sandboxes

Design Framework

Architecture

Reporting &

Analysis

Excel, Access, OLAP, Data mining, visual exploration

Keyw

ord

sear

ch, B

I too

ls,

Xque

ry, H

ive,

Jav

a, e

tc.

Map

Redu

ce, X

ML

sche

ma,

Ke

y-va

lue

pairs

, gra

ph

nota

tion,

etc

.

HD

FS, N

oSQ

Lda

tabs

es

ExplorationPower Users

Page 18: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Reporting & Monitoring (Casual Users)

Predefined Metrics

Corporate Objectives and StrategyTOP DOWN- “Business Intelligence”

Processes and Projects

Analysis and Prediction (Power Users)

Ad hoc queries

Analysis Begets

Reports

Reports Beget

Analysis

Pros:- Alignment-ConsistencyCons: - Hard to build- Politically charged- Hard to change- Expensive- “Schema Heavy”

Pros: - Quick to build- Politically uncharged- Easy to change-Low cost Cons: - Alignment- Consistency- “Schema Light”

Data Warehousing Architecture

Non-volatile Data

Analytics Architecture

Volatile Data

18

BI Framework

Page 19: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

The new analytical ecosystem

19

Machine Data

Web Data

Hadoop Cluster

Operational Systems(Structured data)

Power User

BI Server

Casual UserOperational System

Operational System

Documents & Text

Free-StandingSandbox

Dept Data Mart

Data Warehouse

Virtual Sandboxes

Top-down Architecture

Bottom-up Architecture

External Data

Audio/video Data

Streaming/ CEP Engine

Extract, Transform, Load(Batch, near real-time, or real-time)

Analytic platform or non-relational database

In-memory Sandbox

www.bileadership.com

Page 20: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Analytical sandboxes

20www.bileadership.com

Machine Data

Web Data

Hadoop Cluster

Operational Systems(Structured data)

Power User

BI Server

Casual UserOperational System

Operational System

Documents & Text

Free-StandingSandbox

Dept Data Mart

Data Warehouse

Virtual Sandboxes

Top-down Architecture

Bottom-up Architecture

External Data

Audio/video Data

Streaming/ CEP Engine

Extract, Transform, Load(Batch, near real-time, or real-time)

Analytic platform or non-relational database

In-memory Sandbox

Page 21: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Recommendations

● Your BI architecture is now an “analytical ecosystem”

● Deploy analytical platforms to turbo-charge performance

● Explore Hadoop for “big data”● Reconcile top-down and bottom-up BI

environments

21Business Analytics | © TechTarget

Page 22: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Questions?

● Wayne Eckerson● [email protected]

22Business Analytics | © TechTarget

Page 23: Architecting for Big Data Analytics and Beyond - 1105 Mediadownload.101com.com/pub/tdwi/files/TT Roadshow - Architecting for Big... · - For specific analytic workloads Quicker to

Hadoop ecosystem

Business Analytics | © TechTarget 23

Courtesy, Hortonworks, 2012.