benefits of transferring real-time data to hadoop at scale

36
© 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Welcome to the Waitless World Benefits of Transferring Real-time Data to Hadoop at Scale

Upload: hortonworks

Post on 21-Jan-2018

277 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Welcome to the Waitless World

Benefits of Transferring Real-time Data to Hadoop at Scale

Page 2: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Guest Speakers

Ali BajwaPrincipal Partner Solutions Engineer, Hortonworks

Steve RobertsOffering Manager, Power Systems Big Data & Analytics Solutions, IBM

Dan PotterVP of Product Management & Marketing, Attunity

Page 3: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

• Connected customers, vehicles, devices

• Socially crowd-sourced requirements

• Digital design and analysis

• Digital prototypes and

tests (simulations)

• Connected factories, sensors, devices

• Human-robotic interaction

• 3D-printing on demand

• Connected trucks, inventory

• Location, traffic, weather-aware distribution

• Real-time inventory visibility

• Dynamic rerouting

• Connected customers, devices

• Omni-channel demand sensing

• Real-time Recommendations

• Connected assets

• Remote service monitoring & delivery

• Predictive maintenance

• OTA Updates

Development Manufacturing Distribution Marketing/Sales Service

The New Way of Business Is Fueled by Connected Data

Page 4: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Technology Trends: Shifting the Data Paradigm

Artificial IntelligenceInternet of Things Cloud Computing Streaming Data

Industrial Internet

Connected business

Consumer devices

Smart devices

Autonomy

Prescriptive analytics

SaaS/PaaS applications

Ephemeral use cases

Operational efficiency

Collaboration

Real-time applications

Targeted retail

Recommendations

Industrial applications

Page 5: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Hortonworks Enabling the Modern Data

Architecture• Our durable and reliable mission continues…

• Make Hadoop an enterprise viable data platform

• Bring all data under management—all sources and types

• Enable pre and post transaction analysis

Hortonworks consistent and continuous track record of innovation

Page 6: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Powering the Modern Data Architecture

DATA AT RESTDATA IN MOTION

ACTIONABLEINTELLIGENCE

COMPLETE DATA LIFECYCLE

MANAGEMENT

RUN CONTAINERIZED APPLICATIONSCONCURRENTLY

EDGECLOUD

H O L I S T I C M A N A G E M E N T , G O V E R N A N C E A N D S E C U R I T Y

ON-PREMISES

MULTI-WORKLOADS MULTI-TYPE MULTI-TIER

Data Science SQL Query Engine

Page 7: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Hortonworks Value: Platform Flexibility

CloudSensors/Sources

ConstrainedHigh-latency Localized context

On-premise and cloud Low-latencyGlobal context

Data Centers

Page 8: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Hortonworks DataFlow and Analytics Reference Platform

Applications

Edge/Sensor/3rd Party Data Flow and Streaming Analytics and Data Science

Field Data Capture Office, Datacenter or Cloud

Industrial Protocols such as OPC

Files / Other Unstructured Data

Video

IoT Gateways

PLC / RTU

SCADA, DCS, Historians Hortonworks Data Platform

SQL

Hortonworks DataFlow

Data Flow Managemen

t

MessageQueues

Stream Processing

In-stream Analytics

NoSQLMachine Learning

Resource Management

Distributed File StorageStructured Data Sets

Location 1

Time Series Storage

Data Acquisitio

n

Event Processin

g

Location N

Time Series Storage

Data Acquisitio

n

Event Processin

g

Page 9: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Complementing Attunity and IBM Ecosystem

Applications

Edge/Sensor/3rd Party Data Flow and Streaming Analytics and Data Science

Field Data Capture Office, Datacenter or Cloud

Industrial Protocols such as OPC

Files / Other Unstructured Data

Video

IoT Gateways

PLC / RTU

SCADA, DCS, Historians Hortonworks Data Platform

SQL

Hortonworks DataFlow

Data Flow Managemen

t

MessageQueues

IBM

Stream

Computing

In-stream Analytics

NoSQLMachine Learning

Resource Management

Distributed File StorageStructured Data Sets

Location 1

Time Series Storage

Data Acquisitio

n

Event Processin

g

Location N

Time Series Storage

Data Acquisitio

n

Event Processin

g

IBM

Bluemix

IBM

Bluemix

IBM Spectrum Scale

IBM Watson

IBM Watson

IBM also resells HDP and HDF

IBM Big

SQL

DATA INGESTION

Page 10: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Hortonworks DataPlane Service A common set of services that:

⬢ Supports enterprise deployment strategy and move to the cloud

⬢ Addresses compliance and regulatory requirements for enterprise

⬢ Eliminates policy silos and ensures security & governance moves with data

⬢ Simplifies data asset management and provides access for analysts and data scientists

⬢ Extensible to new services: Services enablement layer brings new offerings to market rapidly

Next Chapter: Announcing Hortonworks DataPlane ServiceEnabling the Modern Data Architecture

Page 11: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Enterprise Data Science at Scale

Enterprise- GradeLeverage enterprise-

grade security, governance and

operations

ToolsEnhance productivity by enabling data scientists

to use their favorite tools, technologies and

libraries

DeploymentCompress the time to insight by deploying

models into production faster

DataBuild more robust

models by using all the data in the data lake

The Power of Data Science for your Enterprise

Page 12: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

DATA AT REST

DATA-IN-MOTION

HDP®

HORTONWORKS

DATA PLATFORMPowered by Apache Hadoop®

HDF™

HORTONWORKS

DATAFLOWPowered by Apache™ NiFi

DATA-AT-REST

Powering ModernData Applications

IBM Analytics Hortonworks ResellIBM DSX

IBM BigSQL

IBM Analytics Re-sellBigInsights’ existing customers migrated to HDP

IBM resells HDP & HDF

IBM Systems Co-Sell• IBM Power Systems (Compute)• IBM Spectrum Scale (Storage)

+

Bringing it all Together

DATA INGESTION

Page 13: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Centralized

Mainframes

Cognitive Era

E-BusinessDistributed

Computing

Smarter Planet

Office

Productivity

Client/

Server

Personal

Computer

Data

Warehousing

Big Data &

Predictive Analytics

Cognitive

A new era of computing has emerged

Data InsightContext

Transactional Database

Business Intelligence

Big Data & Analytics

Actionable Insight in context

Reporting

Cloud

Page 14: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Accelerated compute and storage delivered

on prem, in the cloud or via Watson

Power Systems is now part of Cognitive Systems

REINVENTING COMPUTING FOR DATA-INTENSIVE AND COGNITIVE WORKLOADS

Page 15: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Open to the core for true differentiation in performance & cost

315+ OpenPOWER members across 31 countries

Ecosystem-driven Customer Choice

Growing ecosystem of OpenPOWER Servers

Growing ecosystem of OpenPOWER Innovation

Page 16: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Power Systems S822LC for Big Data - Not Just Another Intel Server

Linux by Redhat:

Redhat 7.2 Linux OS

Mellanox: InfiniBand/Ethernet Connectivity

in and out of server

HGST: Optional NVMe Adapters

Alpha Data with Xilinx FPGA:

Optional CAPI Accelerator

Broadcom: Optional PCIe Adapters

QLogic: Optional Fiber Channel PCIe

Samsung: SSDs & NVMe

Hynix, Samsung, Micron: DDR4

NVIDIA:

Tesla K80 GPU Accelerator

IBM: POWER8 CPU

16

Page 17: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation 17

Available until Dec 31, 2017!

Page 18: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

TCO at Scale with HDP on Power Systems with Elastic Storage Server

18

• Up to 3X reduction of storage and

compute infrastructure moving to

Power Systems and Elastic Storage

Server vs commodity scale out x86

• More flexible and scalable vs EMC

Isilon using IBM Spectrum Scale

• Position for future growth, avoid

hitting the data center wall with

cluster sprawl

E E

InfiniBand (RDMA) / 40 GigE / 10 GigE

Scale Compute Nodes

• IBM Power Systems

• Only Hadoop services

and HDFS client

ESS

HDP HDP HDP HDPHDP

ESSElastic Storage Server

(Powered by Spectrum Scale and Power Systems)

C C C C CC

C Spectrum Scale Clientv

HDP Hortonworks Data Platform

Scale Storage as Required

Page 19: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

ArtificialIntelligence

and

Cognitive Applications

MachineLearning

Deep Learning

(Neural Networks)

The deeper you go, the more value you gain, and the more you know

Page 20: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

simple machinelearning

deep learning

Page 21: Benefits of Transferring Real-Time Data to Hadoop at Scale

accident

risk

rate

90%

inspection

times

10X

number of

inspections

Page 22: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

enterprise-readysoftware distributionbuilt on open source

tools for ease of development

performancefaster training times

for data scientists

+

Page 23: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

9Days

Acceleration training …. days become hours

4 H

ou

rs

Recognition

Shape

Attenuation

Boundary

Recognition

Shape

Attenuation

Boundary

54x

Learning

runs with

Power 8

4 H

ou

rs

4 H

ou

rs

4 H

ou

rs

4 H

ou

rs

. . . . . . .

. . . . . . .

.

4 H

ou

rs

What will you do?

Iterate more and create more accurate models?

Create more models?

Both?

IBM S822LC for HPC

Page 24: Benefits of Transferring Real-Time Data to Hadoop at Scale

Data Integration

for Modern

Analytics

Data Ingestion Patterns

Page 25: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Data Integration for Modern Analytics

Page 26: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Modern Data Ingest

ME

TA

DA

TA

HIV

EO

PT

IMIZ

ED

ST

RE

AM

OP

TIM

IZE

D

CHANGE DATA CAPTURE

CLOUD ON PREM

WAREHOUSE MAINFRAME RDBMS SAP

CDC (log-based) for high

performance, low latency and

low impact

Single platform for all key

enterprise systems

Hive-optimized for HDP and

Stream-optimized for HDF

Point-and-Click with NO

coding and NO agents

Page 27: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

In Memory and File Optimized Data Transport

Real-Time Data IntegrationStreaming Change Data Capture (CDC)

– Apply transactions

sequentially

– Stream batched changes

– Integrate with DW native

loaders to ingest and

merge

– Stream changes to Kafka

message brokers

R1R1R2R1R2

R1R2Batch CDC

Data Warehouse

Ingest-Merge

SQL

n 2 1

SQL SQL

Transactional CDC

Message Encoded

CDC

Flexible Real-Time Options

Page 28: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Simplify Data IntegrationZero Footprint Architecture

– CDC identifies source

updates by scanning

change logs

– No software agents

required on sources or

targets

– Minimal administrative

tasks

• Log based CDC

• Source specific optimization

Hadoop

File

s

RDBMS

EDW

Mainframe

Hadoop

Files

RDBMS

EDW

Kafka

Streamlined Process

Page 29: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Simplify Data IntegrationGo Agile with Automation

– No manual coding

– Automated end-to-end

– Optimized and configurable

• Target schema creation

• Heterogeneous data type mapping

• Batch to CDC transition

• DDL change propagation

• Filtering

• Transformations

Hadoop

File

s

RDBMS

Mainframe

Hadoop

Files

RDBMS

Kafka

EDW EDW

Page 30: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

– Intuitive web-based GUI

– Drag and drop, wizard-assisted

configuration steps

– Consistent process for all

sources and targets

Simplify Data Integration

Guided User Experience

Page 31: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

zzzz

zz

RDBMS

Oracle

SQL Server

DB2 iSeries

DB2 z/OS

DB2 LUW

MySQL

PostgeSQL

Sybase ASE

Informix

DW

Exadata

Teradata

Netezza

Vertica

Hortonworks

Cloudera

MapR

HADOOP

DB2 for z/OS

IMS/DB

VSAM

SQL/MP

Enscribe

RMS

MAINFRAME

AWS RDS

Salesforce

Snowflake

CLOUD

RDBMS

Oracle

SQL Server

DB2 LUW

MySQL

PostgreSQL

Sybase ASE

Informix

DW

Microsoft PDW

Exadata

Teradata

Netezza

Vertica

Sybase IQ

Amazon Redshift

Actian Vector

SAP HANA

Hortonworks

Cloudera

MapR

Pivotal

Amazon EMR

HADOOP

MongoDB

NOSQL

Amazon RDS

Amazon Redshift

Amazon EMR

Google Cloud SQL

Google Cloud

Dataproc

Azure SQL DW

Azure SQL DB

CLOUD

Azure Event Hubs

Kafka

MapR

STREAMING

TARGETS

SOURCES

SAP

ECC on Oracle

ECC on SQL

ECC on DB2

SAP

HANA

12

Universal Data Integration

Page 32: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Feeding the Data Lake with Attunity Replicate

Results

4500 applications

DB2 MF SQL Oracle

• Consolidating massive data

volumes for global analytics

• Hadoop Data Lake with Kafka

• Minimizing labor and cost

• Realizing faster insights and

competitive advantage

Fortune 100 auto maker

Page 33: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

The results are impressive!

3x

Faster!

+

+

3 x faster than alternative solutions

Page 34: Benefits of Transferring Real-Time Data to Hadoop at Scale

Q&A

Page 35: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

REFERENCE CHARTS

35

Page 36: Benefits of Transferring Real-Time Data to Hadoop at Scale

© 2015 IBM Corporation

Welcome to the Waitless World

© 2017 IBM Corporation

Hortonworks HDP 3X POWER8 Price-Performance

Guarantee

36

IBM Power Systems guarantees the Power S822LC for Big Data system built with POWER8 delivers at least a 3X price-performance

advantage vs. x86 based results when running a customer application/workload with Tez/Hive LLAP on Hortonworks HDP under the

conditions noted below. A Worker Node is a server carrying out the HDP query functions, with one Worker Node per server.

3X price-performance means that the customer's documented throughput performance on the cluster of S822LC for Big Data Worker Nodes divided by the price of the cluster of Worker Nodes will be at least 3 times higher than the customer's documented throughput performance on the cluster of x86 based

Worker Nodes divided by the price of the cluster of x86 Worker Nodes.

EX: If queries per second on the cluster of S822LC Worker Nodes are 30,000 and 10,000 on the cluster of x86 based Worker Nodes, while the price of the S822LC Worker Node cluster is$10,000, and the price of the x86 based Worker Node cluster is $10,000, then the Throughput Performance Per Price would be exactly 3 times higher and the guarantee would be met."

Notes:

1. Client’s Power S822LC for BD Worker Nodes and the x86 Worker Nodes must be running at similar utilization rates of at least 50% or higher, using the same software stack as described in Note #4, and which are configured similarly.

2. Client’s Power S822LC for BD performance cannot be constrained by I/O subsystem. Specifically, the I/O subsystem on the Power S822LC for BD Worker Node must achieve greater than or equal I/O bandwidth and operations per second than

the x86 Worker Node.

3. Client’s Power S822LC for BD Worker Node’s physical memory must be the same or greater than the physical memory on the x86 Worker Node.

4. Applicable software stack is Tez/Hive LLAP on HDP 2.6 or later for both the Power S822LC and x86-based Worker Nodes.

5. Client is responsible for demonstrating comparable real-world representative workload between the Power S822LC for BD Worker Node and the x86 Worker Node through the use of the IBM provided tools and comparable tools on x86 systems.

6. 3X guarantee is based on a list price for x86 servers from Dell, Cisco, HP or Lenovo based on E5-2600 v4 or earlier processor technology and the IBM S822LC for Big Data.

The IBM Power S822LC for Big Data servers (22-core/2.89 GHz) used as Worker Nodes must be purchased from IBM or an authorized IBM Business Partner prior to

September 30, 2017. The guarantee period is valid for three (3) months from the date of purchase. The x86-based Worker Nodes must be comparably configured

branded servers from Cisco, Dell, HP, or Lenovo and the client is responsible for all Hortonworks licenses.

3X throughput performance per price means that the customer's documented throughput performance on the cluster of Power S822LC for BD Worker Nodes based on

either queries, operations or transactions per second divided by the price of the cluster of Worker Nodes will be at least 3 times higher than the customer's same

documented throughput performance on the cluster of x86 Worker Nodes divided by the price of said cluster of x86 Worker Nodes.

Remediation: IBM will provide additional performance optimization and tuning services consistent with IBM Best Practices, at no charge. If unable to reach the

guaranteed level of price-performance, IBM will provide additional equally configured Worker Nodes to those already purchased to reach the guaranteed level of price-

performance.

Only Available until Dec 31, 2017!