the anywhere enterprise – how a flexible foundation opens doors

37
Grab some coee and enjoy the pre-show banter before the top of the hour!

Upload: inside-analysis

Post on 02-Jul-2015

90 views

Category:

Technology


2 download

DESCRIPTION

The Briefing Room with Dr. Robin Bloor and InfiniDB Live Webcast on August 12, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=1e562c3a4b9e9cb9a054f0ec216d578b Today’s organizations need all kinds of data, from a wide and growing array of sources. Marshaling all that data into one location can be difficult, even unrealistic. Increasingly, innovative companies are taking a much more distributed approach to storing and processing data. The end result is an information architecture that supports a broader range of business activities, and reduces dependence on costly data movement. Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor, as he explains how a distributed approach to data management can open doors to new business opportunities. He’ll be briefed by Jim Tommaney of InfiniDB who will explain how his company’s database has the flexibility to run on-prem, in the cloud, with cluster files systems or even Hadoop’s HDFS. He’ll also show how InfiniDB can serve as a conduit to companies looking to transform their information architecture to better satisfy changing market demands. Visit InsideAnlaysis.com for more information.

TRANSCRIPT

Page 1: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Grab some coffee and

enjoy the

pre-show

banter

before the top of the

hour!

Page 2: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

The Briefing Room

The Distributed Enterprise – How a Flexible Foundation Opens Doors

Page 3: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Page 4: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Page 5: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

Topics

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: BIG DATA ECOSYSTEM

September: INTEGRATION & DATA FLOW

October: ANALYTIC PLATFORMS

Page 6: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

Executive Summary

!   Information architectures are changing

!  Hybrid architectures will dominate

!   Flexibility will be increasingly critical

!   Embedded database systems will expand

!  Distributed computing is here to stay

Page 7: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

Analyst: Robin Bloor

Robin Bloor is Chief Analyst at The Bloor Group

[email protected] @robinbloor

Page 8: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

InfiniDB

! InfiniDB is a scalable, columnar database built for big data analytics, business intelligence and data warehousing

!   It is 100% open source and offers a MySQL interface

! InfiniDB for Apache Hadoop integrates with Hadoop’s file system (HDFS) to enable real-time analytics within the Hadoop cluster

Page 9: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

Guest: Jim Tommaney

Jim Tommaney is the Chief Technical Officer for InfiniDB, bringing 20+ years of enterprise data architecture and performance tuning experience to the team. Data warehouse architectures include clustered, large SMP, and distributed/partitioned systems for verticals including retail, web, and telecom. At InfiniDB he is responsible for delivering architecture and design for the InfiniDB product: a high performance, horizontally scalable and cost effective solution purpose built for data warehousing and analytics. He holds a Masters in Management Information Systems from the University of Texas at Dallas.

Page 10: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

The Briefing Room with InfiniDB

Page 11: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

11  

InfiniDB Design Principles

®

Scalable

     

Fast    

     

               Simple    

Copyright  ©  2014  InfiniDB.  All  Rights  Reserved.    

Page 12: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Segment Overview

Copyright  ©  2014  InfiniDB.  All  Rights  Reserved.    

structured  unstructured  

OLTP  

analyGcs  

Business Intelligence and Visualization Products Specialty Analytics Applications

InfiniDB Vertica

Impala w/ parquet

Infrastructure Products – Hadoop, Cloud/Hosting, Virtualization, Storage, etc.

NoSQL Products

Specialty Products (Splunk, etc.)

Traditional RDMS Products

ETL, MDM Products.

The  Big  Data  Ecosystem  

InfiniDB enables companies to analyze massive amounts of data in real-time on both Hadoop and non-Hadoop environments to discover deep and wide insights.

Page 13: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Radiant Advisors Open-Source SQL-on-Hadoop Benchmark Summary

Que

ry  

Page 14: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

What is InfiniDB?

Open Source Core

§  GPLv2 licensed scalable, MPP core

§  No restrictions on performance, syntax or scale

MySQL Compatible §  “Drop-in” replacement

for other MySQL storage engines

§  Full SQL syntax and capabilities regardless of platform

Apache Hadoop Friendly §  Native HDFS

integration leverages existing Hadoop deployments

§  Best in class SQL analytics over Hadoop

Page 15: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

(My)SQL for Hadoop

InfiniDB uses standard “Engine=InfiniDB” syntax:

15  

CREATE  TABLE  `game_warehouse`.`dim_Gtle`  (      `id`  INT,      `name`  VARCHAR(45),      `publisher`  VARCHAR(45),      `release_date`  DATE,      `language`  INT,      `plaUorm_name`  VARCHAR(45),      `version`  VARCHAR(45)  )  ENGINE=InfiniDB;    

Copyright  ©  2014  InfiniDB.  All  Rights  Reserved.    

Page 16: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

InfiniDB Architecture

§  User Module – Understands SQL Requests §  Performance Module – Distributed Processing Engine

or

Single Server MPP

Copyright  ©  2014  InfiniDB.  All  Rights  Reserved.    

Page 17: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

InfiniDB for Hadoop

§  User Module – maps to Name Node §  Performance Module – maps to Data Nodes

MPP

Hadoop  Distributed  File  System  

Hadoop  Name  Node  

Hadoop  Data  Nodes  

Copyright  ©  2014  InfiniDB.  All  Rights  Reserved.    

Page 18: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

InfiniDB DoW Differentiation

§  User Module – Understands SQL Requests §  Performance Module – Distributed Processing Engine

Unique Distribution of Work (DoW): •  Move the processing to the data

- Including complex queries, joins •  Primitives sent to distributed queues •  Primitive complete in sub-second •  C++, purpose built •  High, standard, low priority queues A primitive is a unit of work a single thread can accomplish without waits.

Single Server

Copyright  ©  2014  InfiniDB.  All  Rights  Reserved.    

Page 19: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

19  

Relational Foundations

ü Can run in-memory, but not required ü Pure column storage for vertical partitioning ü Automatic horizontal partitioning (grid

storage) ü Columnar compression ü Column-aware optimizer ü Transactional support ü Hadoop is a deployment option, not required

Page 20: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

20  

Trans-Relational Features

ü Shared meta-data layer allows for relational algebra to be applied at the plan level, and recursively to deliver partition elimination

ü Primitive DoW avoids complexity of assigning “right” number of resources to a given query

The processing model is more like a storage device than a

traditional database

Page 21: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

21  

Trans-Relational Features

ü Distributed-system aware optimizer moves the processing to the data even in complex SQL operations

ü N-Way join operation in primitive structure

ü Flexible join/aggregation behavior can be PM

based, UM based, or on disk

ü Handles nested query constructs while still

“moving the processing to the data”

Page 22: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

22  

Trans-Relational Hadoop Features

ü Parallel/scalable bulk loads – linear scale for load operations

ü Parallel/scalable transform/aggregate operations

ü Parallel/scalable extract ü Parallel/scalable input into R or other tools

deliver open access to predictive analytics

Page 23: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

23  

InfiniDB Design Principles

®

Scalable

     

Fast    

     

               Simple    

Copyright  ©  2014  InfiniDB.  All  Rights  Reserved.    

Page 24: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

InfiniDB Customers

Copyright  ©  2014  InfiniDB.  All  Rights  Reserved.    

Page 25: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

InfiniDB Deployment Options

Cloud

§  Amazon® machine image

Source Code

§  https://github.com/infinidb/infinidb

Apache Hadoop Distros:

§  Cloudera®

§  Hortonworks®

§  Apache Hadoop®

§  Also MapR®, IBM Big Insights®

Page 26: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analyst: Robin Bloor

Page 27: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Built To Scale?

Robin Bloor, PhD

Page 28: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Hadoop and the Data Warehouse

Hadoop and its multitude of components will not supersede the data warehouse:

u  The HDFS is suited to be a database data store (for column-stored data, but with row stores there’s a problem)

u  MapReduce is NOT an appropriate algorithm for database optimization

u  YARN is a useful capability for scheduling resource sharing

u  What is required is a database architecture AND an optimizer AND a SQL capability

Page 29: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

The “Old” Data Warehouse

Data wrangling is also a workload!

Page 30: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

The “New” Data Warehouse

Data wrangling is a much more significant workload! Analytics is also a significant workload!

Page 31: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Data Wrangling

Page 32: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

The Central Data Engine

At this point in time it looks reasonably certain

that the CENTRAL DATA ENGINE will be a scale-out column-store

SQL DBMS

Page 33: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

u  In general what is the DBA overhead to an InfiniDB database compared with, say, Oracle?

u  How does InfiniDB organize its data on HDFS? Is that different from the way it uses MySQL as a store?

u  Is there any qualitative difference between the HDFS and MySQL versions of InfiniDB?

u  Please explain the open source arrangement with InfiniDB.

Page 34: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

u  What do you see as the sweet spot for this database?

u  In respect to scale, what is your largest implementation by data volume?

u  Does InfiniDB have specific support for analytical applications?

Page 35: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

Page 36: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: BIG DATA ECOSYSTEM

September: INTEGRATION & DATA FLOW

October: ANALYTIC PLATFORMS

Page 37: The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

THANK YOU for your

ATTENTION!

Opening slide image courtesy of Wikimedia Commons