big data in action – real-world solution showcase

37
Grab some coffee and enjoy the preshow banter before the top of the hour!

Upload: inside-analysis

Post on 01-Jul-2015

787 views

Category:

Technology


1 download

DESCRIPTION

The Briefing Room with Radiant Advisors and IBM Live Webcast on February 25, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=53c9b7fa2000f98f5b236747e3602511 The power of Big Data depends heavily upon the context in which it's used, and most organizations are just beginning to figure out where, how and when to leverage it. One key to success is integration with existing information systems, many of which still rely on relational database technologies. Finding ways to blend these two worlds can help companies generate measurable business value in fairly short order. Register for this episode of The Briefing Room to hear Analysts Lindy Ryan and John O'Brien as they explain how the combination of traditional Business Intelligence with Big Data Analytics can provide game-changing results in today's information economy. They'll be briefed by Eric Poulin and Paul Flach of Stream Integration who will share best practices for designing and implementing Big Data solutions. They'll discuss the components of IBM BigInsights, and explain how BigSheets can empower non-technical users who need to explore self-structured data. Visit InsideAnlaysis.com for more information.

TRANSCRIPT

Page 1: Big Data in Action – Real-World Solution Showcase

Grab some coffee and enjoy the pre-­show banter before the top of the hour!

Page 2: Big Data in Action – Real-World Solution Showcase

The Briefing Room

Big Data in Action: Real-World Solution Showcase

Page 3: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected]

Page 4: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Page 5: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

Topics

This Month: BIG DATA

March: CLOUD

April: BIG DATA

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

Page 6: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

Big Data

Page 7: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

Analysts: Lindy Ryan and John O’Brien

Lindy Ryan is the Research Director for Radiant Advisor’s Data Discovery and Visualization practice and leads research and analyst activities in the confluence of data discovery, visualization, and data science from a business needs perspective. She also retains the role of Editor in Chief of RediscoveringBI Magazine. As Radiant Advisors’ Editor in Chief for three years, Lindy participated in in-depth discussions and analysis with industry thought leaders and vendors while maturing her position and perspectives in the BI industry.

John O’Brien is Principal and CEO of Radiant Advisors. With over 25 years of experience delivering value through data warehousing and BI programs, John’s unique perspective comes from the combination of his roles as a practitioner, consultant, and vendor in the BI industry. His knowledge in designing, building, and growing enterprise BI systems and teams brings real world insights to each role and phase within a BI program. Today, through Radiant Advisors John provides research and advisory services that guide companies in meeting the demands of next generation information management, architecture, and emerging technologies.

Page 8: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

!   IBM offers a full suite of Big Data solutions, including InfoSphere Streams, InfoSphere BigInsights and InfoSphere Data Explorer

!   IBM also offers a series of products designed to leverage the power of Hadoop

!   Stream Integration is a Premier Business Partner with IBM and focuses its consultancy exclusively on IBM products

IBM

Page 9: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

Guests:

Eric Poulin VP of Business Analytics, Stream Integration

Paul Flach VP of Enterprise Analytics, Stream Integration

Page 10: Big Data in Action – Real-World Solution Showcase

10

Big  Data  Performance  for  Analy3cs  

Eric  Poulin  VP,  Analy3cs  &  Big  Data  [email protected]  

Page 11: Big Data in Action – Real-World Solution Showcase

11 11  

Agenda  

•  Overview  of  Stream  Integra3on  •  Big  Data  Performance  for  Analy3cs  •  Modular  Analy3cs  

Page 12: Big Data in Action – Real-World Solution Showcase

12

Company  Overview  

Copyright  ©  2014,  Stream  Integra3on  Inc.  All  rights  reserved.    

•  Award Winning Information Lifecycle Consultancy

•  Founded in 2000 •  IBM Premier Partner •  Exclusively focused on IBM

Information Management, Big Data and Analytics

•  Offices in North America, Caribbean, and Europe

•  Development and Support Centers in India and China  

12  

Page 13: Big Data in Action – Real-World Solution Showcase

13

LINKING  DATA  TO  THE  BUSINESS  REQUIREMENTS  

CONTENT  

STRUCTURED  DATA  

ANALYZE  INTEGRATE  

GOVERN  

INFOSPHERE  MDM  

DATA  

TRANSACTIONAL  &  COLLABORATIVE    APPLICATIONS  

MANAGE  

BUSINESS  ANALYTICS  APPLICATIONS  

STREAMS  

BIG  DATA  

EXTERNAL  INFORMATION  SOURCES  

ww  

QUALITY  

LIFECYCLE  MANAGEMENT  

SECURITY  &  PRIVACY  

 INFORMATION  

SERVER        

DESIGN  ★  DEPLOY  ★  OPERATE  ★  MANAGE  ★  EXTEND    

BIG    INSIGHTS  

TRADITIONAL  SOURCES  

PUREDATA/NETEZZA  

STREAMING  INFORMATION  

Page 14: Big Data in Action – Real-World Solution Showcase

14

Performance  for  the  Future  of  Analy3cs  

Paul  Flach      Stream  Integra3on  

Page 15: Big Data in Action – Real-World Solution Showcase

15

Capabili3es  Required  for  Hadoop  Style  Workloads  

Run3me  

Cluster  and  Workload  Management  

Visualiza3on  &  Discovery  

Data  Ingest  

Analy3cs  Engines  

File  System  

Data  Store  

Applica3on  Support  and  Development  Tooling  

Security  

15  

Page 16: Big Data in Action – Real-World Solution Showcase

16

Big  SQL  provides  na3ve  SQL  for  Hadoop  

ANSI  SQL  92+  support  

Page 17: Big Data in Action – Real-World Solution Showcase

17

Map  Reduce   MPP  RunKme  n+2  

User  Data  temp(s)  

HDFS  

Hadoop Data Node(s)

Map  Reduce   MPP  RunKme  n+n  

User  Data   temp(s)  

HDFS  

SQL sub-sections

Head Node

Host 2 Host n

Catalog Coordinator node

Host 1

Cluster  network  

Local  fs  (temps)  

Local  fs  (catalog  tables)  

Distributed  fs    

 

sync  

Map  Reduce   MPP  RunKme  n+1  

User  Data  temp(s)  

HDFS  

Direct  Hadoop  data  access   sync  

sync  

Big  AcceleraKon  

Query    OpKmizer  

Common  SQL  BigInsights  –  DB2  –  Netezza  

Oracle  –  Teradata  

Next  Gen  Big  SQL  will  provide  first  MPP  query  engine  for  Hadoop  

Page 18: Big Data in Action – Real-World Solution Showcase

18

BigSheets  provides  business  users  with  access  to  data  without  programming  

Spreadsheet-­‐style  interface  

Data  VisualizaKon  &  Graphs  

Page 19: Big Data in Action – Real-World Solution Showcase

19

Watson  Explorer  included  in  BigInsights  

Faceted  Search,  

NavigaKon  &  Discovery  

Page 20: Big Data in Action – Real-World Solution Showcase

20

AnalyKcs  Accelerators  provide  ability  to  extract  insights  more  quickly  

Text   Social  Media  

Machine  Data  

Page 21: Big Data in Action – Real-World Solution Showcase

21

App  Store  reduces  development  effort  and  enables  reusability  

Combine  Hadoop  Apps  

Page 22: Big Data in Action – Real-World Solution Showcase

22

Open  Source  Hadoop  Components  

Open  Source  

Visualization & Discovery Data Ingest

Cluster Optimization and Management

Nutch  

Runtime

Analytics Engines

File System

MapReduce  

HDFS    

Data Store HBase    

Application Support and Development Tooling

MapReduce   Pig   Hive  

ZooKeeper  

Sqoop  

Security

HCatalog  

Flume  

Avro  

Lucene   Oozie  

Derby  

22  

Page 23: Big Data in Action – Real-World Solution Showcase

23

BigInsights  Enterprise  Edi3on  Components  

IBM  Open  Source  

Visualization & Discovery Data Ingest

Cluster Optimization and Management Streams  

Netezza  

Nutch  

DB2  

DataStage  

IBM InfoSphere BigInsights

Runtime

Analytics Engines

File System

MapReduce  

HDFS    

Data Store HBase    

Text  Processing  Engine  and  Extractor  Library      (AQL+HIL)  

JDBC  

Application Support and Development Tooling

App  infrastructure  

MapReduce   Pig   Hive  

Splicable  Text  Compression  

ZooKeeper  High  Availability  

Integrated  Installer   Admin  Console  

Sqoop  

SystemML  

Eclipse   Big  SQL  

Security

HCatalog  

R  

Gnip  

BoardReader  

GPFS-­‐FPO   Guardium  

Flume  

Jaql  

Avro  

BigSheets  

Dashboard  /  visualiza3on   Data  Explorer   Lucene   Oozie  

PAM  

LDAP  

Private  firewall  

Derby  

Adap3ve  MapReduce  Enhanced  Monitoring  

Teradata  

23  

Page 24: Big Data in Action – Real-World Solution Showcase

24

Modular  Analy3cs  

Page 25: Big Data in Action – Real-World Solution Showcase

25 25

Plagorm  Analy3c  Modules  

Cloud Computing

GIS Engine Forecasting Engine

Routing Engine

Work Force Engine

Inventory Engine

Solutions

Core Engine

IMDB  

Column-­‐Store  

BigInsights  

Streams  

PureData  

In-­‐Flight  Data  

Self-­‐  Structured  Data  

Frequently  Requested  Summaries  

Low  Entropy  Data  

Mixed  Workload  Requests  

Page 26: Big Data in Action – Real-World Solution Showcase

26

Thank  you!  

Page 27: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analysts: Lindy Ryan and John O’Brien

Page 28: Big Data in Action – Real-World Solution Showcase

© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000

BIG DATA IN ACTION

28

Real-World Solution Showcase with Stream Integration Inside Analysis – The Briefing Room, February 25, 2014

Lindy Ryan | Research Director, Data Discovery & Visualization @lindy_ryan [email protected]

John O’Brien | Principal Analyst, Modern Data Platforms @obrienjw [email protected]

Page 29: Big Data in Action – Real-World Solution Showcase

© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000

Flexibility Class

MODERN DATA PLATFORM Big Data in Action: Real-World Solutions

29

Enterprise Data

Warehouses

ç

ç Master

Reference Data

Discovery, Scalable, Programs Stable, Context, SQL Discovery & Analytics Oriented

Apache Hadoop

ç

Highly Optimized for Analytics

In-memory MOLAP MPP

Optimized Class Reference Class

R pr

ogra

ms

Hiv

e SQ

L

askdjfl kasjdfl iuyuiio

Highly Specialized for Analytics

Graphs Document

Stores Text

Analytics

P

IG /

Hiv

e

Map

Red

uce

Ope

ratio

nal S

yste

ms,

Big

Dat

a, S

tream

s

HD

FS

ç Columnar

Extending SQL Access to Big Data and Hadoop via Hive and other HDFS SQL engines

Page 30: Big Data in Action – Real-World Solution Showcase

© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000

SQL-ON-HADOOP Big Data in Action: Real-World Solutions

30

PIG

Hiv

e-Q

L

MapReduce

HCatalog

Hadoop HDFS

Apache Hadoop v1

Map

Re

duce

PIG

Hiv

e-Q

L YARN

HCatalog

Hadoop HDFS

Apache Hadoop v2

PIG

H

ive

Map

Re

duce

YARN

Hadoop HDFS

HCatalog

Impa

la, H

AWQ

In

finiD

B, P

rest

o

Hadoop Distributions and 3rd Party

MPP

Eng

ine

Not all SQL-on-Hadoop is the same: 1.  SQL capabilities (SQL-92, Analytic functions SQL-2003? SQL-2011? UDF?) 2.  Scalability (not always the same as Hadoop scalability) 3.  Speed (flat out performance response time without caching)

File types: ORCFILE, SEQPART, Parquet

Page 31: Big Data in Action – Real-World Solution Showcase

© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000

TRADITIONAL FORMS OF DISCOVERY Big Data in Action: Real-World Solutions

31

Spreadsheets •  Most popular business “analytic” tool •  Having access to the data is the value •  Analysts can slice and dice data for insights

Basic Visualizations •  Provide visual representations of data •  Provide insights beyond plain text data •  Simplify complex information & highlight trends

Page 32: Big Data in Action – Real-World Solution Showcase

© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000

ANALYTIC FORMS OF DISCOVERY Big Data in Action: Real-World Solutions

32

Multi-Faceted, “Search Mode” •  Discovery within structured & unstructured data •  Mine through various forms of data at once •  Google-like search to iterate and deep dive

Advanced Visualizations •  Visualize clusters of data and correlations •  Discover analytic models iteratively with data •  Visual cues and cognitive sciences UX

Page 33: Big Data in Action – Real-World Solution Showcase

© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000

THANK YOU!

For more information

www.RadiantAdvisors.com

Twitter: @RadiantAdvisors #ModernBI #RediscoveringBI

RSS: feed://radiantadvisors.com/feed/

Email: [email protected]

LinkedIn: www.linkedin.com/company/radiant-advisors

Subscribe: Rediscovering BI quarterly e-magazine

www.radiantadvisors.com/rediscoveringbi

33

Page 34: Big Data in Action – Real-World Solution Showcase

© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000

ANALYST QUESTIONS Big Data in Action: Real-World Solutions

34

1.  How are you handling the performance or SQL capabilities in Hive with Big SQL?

2.  How do users define schema for Big SQL?

3.  Can you explain user roles, security, and metadata in the App Store? Who is the store administrator?

Page 35: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

Page 36: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: BIG DATA

March: CLOUD

April: BIG DATA

Page 37: Big Data in Action – Real-World Solution Showcase

Twitter Tag: #briefr

The Briefing Room

THANK YOU for your

ATTENTION!