benefits of transferring real-time data to hadoop at scale
TRANSCRIPT
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Welcome to the Waitless World
Benefits of Transferring Real-time Data to Hadoop at Scale
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Guest Speakers
Ali BajwaPrincipal Partner Solutions Engineer, Hortonworks
Steve RobertsOffering Manager, Power Systems Big Data & Analytics Solutions, IBM
Dan PotterVP of Product Management & Marketing, Attunity
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
• Connected customers, vehicles, devices
• Socially crowd-sourced requirements
• Digital design and analysis
• Digital prototypes and
tests (simulations)
• Connected factories, sensors, devices
• Human-robotic interaction
• 3D-printing on demand
• Connected trucks, inventory
• Location, traffic, weather-aware distribution
• Real-time inventory visibility
• Dynamic rerouting
• Connected customers, devices
• Omni-channel demand sensing
• Real-time Recommendations
• Connected assets
• Remote service monitoring & delivery
• Predictive maintenance
• OTA Updates
Development Manufacturing Distribution Marketing/Sales Service
The New Way of Business Is Fueled by Connected Data
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Technology Trends: Shifting the Data Paradigm
Artificial IntelligenceInternet of Things Cloud Computing Streaming Data
Industrial Internet
Connected business
Consumer devices
Smart devices
Autonomy
Prescriptive analytics
SaaS/PaaS applications
Ephemeral use cases
Operational efficiency
Collaboration
Real-time applications
Targeted retail
Recommendations
Industrial applications
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Hortonworks Enabling the Modern Data
Architecture• Our durable and reliable mission continues…
• Make Hadoop an enterprise viable data platform
• Bring all data under management—all sources and types
• Enable pre and post transaction analysis
Hortonworks consistent and continuous track record of innovation
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Powering the Modern Data Architecture
DATA AT RESTDATA IN MOTION
ACTIONABLEINTELLIGENCE
COMPLETE DATA LIFECYCLE
MANAGEMENT
RUN CONTAINERIZED APPLICATIONSCONCURRENTLY
EDGECLOUD
H O L I S T I C M A N A G E M E N T , G O V E R N A N C E A N D S E C U R I T Y
ON-PREMISES
MULTI-WORKLOADS MULTI-TYPE MULTI-TIER
Data Science SQL Query Engine
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Hortonworks Value: Platform Flexibility
CloudSensors/Sources
ConstrainedHigh-latency Localized context
On-premise and cloud Low-latencyGlobal context
Data Centers
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Hortonworks DataFlow and Analytics Reference Platform
Applications
Edge/Sensor/3rd Party Data Flow and Streaming Analytics and Data Science
Field Data Capture Office, Datacenter or Cloud
Industrial Protocols such as OPC
Files / Other Unstructured Data
Video
IoT Gateways
PLC / RTU
SCADA, DCS, Historians Hortonworks Data Platform
SQL
Hortonworks DataFlow
Data Flow Managemen
t
MessageQueues
Stream Processing
In-stream Analytics
NoSQLMachine Learning
Resource Management
Distributed File StorageStructured Data Sets
Location 1
Time Series Storage
Data Acquisitio
n
Event Processin
g
Location N
Time Series Storage
Data Acquisitio
n
Event Processin
g
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Complementing Attunity and IBM Ecosystem
Applications
Edge/Sensor/3rd Party Data Flow and Streaming Analytics and Data Science
Field Data Capture Office, Datacenter or Cloud
Industrial Protocols such as OPC
Files / Other Unstructured Data
Video
IoT Gateways
PLC / RTU
SCADA, DCS, Historians Hortonworks Data Platform
SQL
Hortonworks DataFlow
Data Flow Managemen
t
MessageQueues
IBM
Stream
Computing
In-stream Analytics
NoSQLMachine Learning
Resource Management
Distributed File StorageStructured Data Sets
Location 1
Time Series Storage
Data Acquisitio
n
Event Processin
g
Location N
Time Series Storage
Data Acquisitio
n
Event Processin
g
IBM
Bluemix
IBM
Bluemix
IBM Spectrum Scale
IBM Watson
IBM Watson
IBM also resells HDP and HDF
IBM Big
SQL
DATA INGESTION
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Hortonworks DataPlane Service A common set of services that:
⬢ Supports enterprise deployment strategy and move to the cloud
⬢ Addresses compliance and regulatory requirements for enterprise
⬢ Eliminates policy silos and ensures security & governance moves with data
⬢ Simplifies data asset management and provides access for analysts and data scientists
⬢ Extensible to new services: Services enablement layer brings new offerings to market rapidly
Next Chapter: Announcing Hortonworks DataPlane ServiceEnabling the Modern Data Architecture
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Enterprise Data Science at Scale
Enterprise- GradeLeverage enterprise-
grade security, governance and
operations
ToolsEnhance productivity by enabling data scientists
to use their favorite tools, technologies and
libraries
DeploymentCompress the time to insight by deploying
models into production faster
DataBuild more robust
models by using all the data in the data lake
The Power of Data Science for your Enterprise
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
DATA AT REST
DATA-IN-MOTION
HDP®
HORTONWORKS
DATA PLATFORMPowered by Apache Hadoop®
HDF™
HORTONWORKS
DATAFLOWPowered by Apache™ NiFi
DATA-AT-REST
Powering ModernData Applications
IBM Analytics Hortonworks ResellIBM DSX
IBM BigSQL
IBM Analytics Re-sellBigInsights’ existing customers migrated to HDP
IBM resells HDP & HDF
IBM Systems Co-Sell• IBM Power Systems (Compute)• IBM Spectrum Scale (Storage)
+
Bringing it all Together
DATA INGESTION
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Centralized
Mainframes
Cognitive Era
E-BusinessDistributed
Computing
Smarter Planet
Office
Productivity
Client/
Server
Personal
Computer
Data
Warehousing
Big Data &
Predictive Analytics
Cognitive
A new era of computing has emerged
Data InsightContext
Transactional Database
Business Intelligence
Big Data & Analytics
Actionable Insight in context
Reporting
Cloud
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Accelerated compute and storage delivered
on prem, in the cloud or via Watson
Power Systems is now part of Cognitive Systems
REINVENTING COMPUTING FOR DATA-INTENSIVE AND COGNITIVE WORKLOADS
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Open to the core for true differentiation in performance & cost
315+ OpenPOWER members across 31 countries
Ecosystem-driven Customer Choice
Growing ecosystem of OpenPOWER Servers
Growing ecosystem of OpenPOWER Innovation
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Power Systems S822LC for Big Data - Not Just Another Intel Server
Linux by Redhat:
Redhat 7.2 Linux OS
Mellanox: InfiniBand/Ethernet Connectivity
in and out of server
HGST: Optional NVMe Adapters
Alpha Data with Xilinx FPGA:
Optional CAPI Accelerator
Broadcom: Optional PCIe Adapters
QLogic: Optional Fiber Channel PCIe
Samsung: SSDs & NVMe
Hynix, Samsung, Micron: DDR4
NVIDIA:
Tesla K80 GPU Accelerator
IBM: POWER8 CPU
16
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation 17
Available until Dec 31, 2017!
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
TCO at Scale with HDP on Power Systems with Elastic Storage Server
18
• Up to 3X reduction of storage and
compute infrastructure moving to
Power Systems and Elastic Storage
Server vs commodity scale out x86
• More flexible and scalable vs EMC
Isilon using IBM Spectrum Scale
• Position for future growth, avoid
hitting the data center wall with
cluster sprawl
E E
InfiniBand (RDMA) / 40 GigE / 10 GigE
Scale Compute Nodes
• IBM Power Systems
• Only Hadoop services
and HDFS client
ESS
HDP HDP HDP HDPHDP
ESSElastic Storage Server
(Powered by Spectrum Scale and Power Systems)
C C C C CC
C Spectrum Scale Clientv
HDP Hortonworks Data Platform
Scale Storage as Required
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
ArtificialIntelligence
and
Cognitive Applications
MachineLearning
Deep Learning
(Neural Networks)
The deeper you go, the more value you gain, and the more you know
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
simple machinelearning
deep learning
accident
risk
rate
90%
inspection
times
10X
number of
inspections
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
enterprise-readysoftware distributionbuilt on open source
tools for ease of development
performancefaster training times
for data scientists
+
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
9Days
Acceleration training …. days become hours
4 H
ou
rs
Recognition
Shape
Attenuation
Boundary
Recognition
Shape
Attenuation
Boundary
54x
Learning
runs with
Power 8
4 H
ou
rs
4 H
ou
rs
4 H
ou
rs
4 H
ou
rs
. . . . . . .
. . . . . . .
.
4 H
ou
rs
What will you do?
Iterate more and create more accurate models?
Create more models?
Both?
IBM S822LC for HPC
Data Integration
for Modern
Analytics
Data Ingestion Patterns
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Data Integration for Modern Analytics
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Modern Data Ingest
ME
TA
DA
TA
HIV
EO
PT
IMIZ
ED
ST
RE
AM
OP
TIM
IZE
D
CHANGE DATA CAPTURE
CLOUD ON PREM
WAREHOUSE MAINFRAME RDBMS SAP
CDC (log-based) for high
performance, low latency and
low impact
Single platform for all key
enterprise systems
Hive-optimized for HDP and
Stream-optimized for HDF
Point-and-Click with NO
coding and NO agents
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
In Memory and File Optimized Data Transport
Real-Time Data IntegrationStreaming Change Data Capture (CDC)
– Apply transactions
sequentially
– Stream batched changes
– Integrate with DW native
loaders to ingest and
merge
– Stream changes to Kafka
message brokers
R1R1R2R1R2
R1R2Batch CDC
Data Warehouse
Ingest-Merge
SQL
n 2 1
SQL SQL
Transactional CDC
Message Encoded
CDC
Flexible Real-Time Options
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Simplify Data IntegrationZero Footprint Architecture
– CDC identifies source
updates by scanning
change logs
– No software agents
required on sources or
targets
– Minimal administrative
tasks
• Log based CDC
• Source specific optimization
Hadoop
File
s
RDBMS
EDW
Mainframe
Hadoop
Files
RDBMS
EDW
Kafka
Streamlined Process
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Simplify Data IntegrationGo Agile with Automation
– No manual coding
– Automated end-to-end
– Optimized and configurable
• Target schema creation
• Heterogeneous data type mapping
• Batch to CDC transition
• DDL change propagation
• Filtering
• Transformations
Hadoop
File
s
RDBMS
Mainframe
Hadoop
Files
RDBMS
Kafka
EDW EDW
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
– Intuitive web-based GUI
– Drag and drop, wizard-assisted
configuration steps
– Consistent process for all
sources and targets
Simplify Data Integration
Guided User Experience
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
zzzz
zz
RDBMS
Oracle
SQL Server
DB2 iSeries
DB2 z/OS
DB2 LUW
MySQL
PostgeSQL
Sybase ASE
Informix
DW
Exadata
Teradata
Netezza
Vertica
Hortonworks
Cloudera
MapR
HADOOP
DB2 for z/OS
IMS/DB
VSAM
SQL/MP
Enscribe
RMS
MAINFRAME
AWS RDS
Salesforce
Snowflake
CLOUD
RDBMS
Oracle
SQL Server
DB2 LUW
MySQL
PostgreSQL
Sybase ASE
Informix
DW
Microsoft PDW
Exadata
Teradata
Netezza
Vertica
Sybase IQ
Amazon Redshift
Actian Vector
SAP HANA
Hortonworks
Cloudera
MapR
Pivotal
Amazon EMR
HADOOP
MongoDB
NOSQL
Amazon RDS
Amazon Redshift
Amazon EMR
Google Cloud SQL
Google Cloud
Dataproc
Azure SQL DW
Azure SQL DB
CLOUD
Azure Event Hubs
Kafka
MapR
STREAMING
TARGETS
SOURCES
SAP
ECC on Oracle
ECC on SQL
ECC on DB2
SAP
HANA
12
Universal Data Integration
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Feeding the Data Lake with Attunity Replicate
Results
4500 applications
DB2 MF SQL Oracle
• Consolidating massive data
volumes for global analytics
• Hadoop Data Lake with Kafka
• Minimizing labor and cost
• Realizing faster insights and
competitive advantage
Fortune 100 auto maker
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
The results are impressive!
3x
Faster!
+
+
3 x faster than alternative solutions
Q&A
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
REFERENCE CHARTS
35
© 2015 IBM Corporation
Welcome to the Waitless World
© 2017 IBM Corporation
Hortonworks HDP 3X POWER8 Price-Performance
Guarantee
36
IBM Power Systems guarantees the Power S822LC for Big Data system built with POWER8 delivers at least a 3X price-performance
advantage vs. x86 based results when running a customer application/workload with Tez/Hive LLAP on Hortonworks HDP under the
conditions noted below. A Worker Node is a server carrying out the HDP query functions, with one Worker Node per server.
3X price-performance means that the customer's documented throughput performance on the cluster of S822LC for Big Data Worker Nodes divided by the price of the cluster of Worker Nodes will be at least 3 times higher than the customer's documented throughput performance on the cluster of x86 based
Worker Nodes divided by the price of the cluster of x86 Worker Nodes.
EX: If queries per second on the cluster of S822LC Worker Nodes are 30,000 and 10,000 on the cluster of x86 based Worker Nodes, while the price of the S822LC Worker Node cluster is$10,000, and the price of the x86 based Worker Node cluster is $10,000, then the Throughput Performance Per Price would be exactly 3 times higher and the guarantee would be met."
Notes:
1. Client’s Power S822LC for BD Worker Nodes and the x86 Worker Nodes must be running at similar utilization rates of at least 50% or higher, using the same software stack as described in Note #4, and which are configured similarly.
2. Client’s Power S822LC for BD performance cannot be constrained by I/O subsystem. Specifically, the I/O subsystem on the Power S822LC for BD Worker Node must achieve greater than or equal I/O bandwidth and operations per second than
the x86 Worker Node.
3. Client’s Power S822LC for BD Worker Node’s physical memory must be the same or greater than the physical memory on the x86 Worker Node.
4. Applicable software stack is Tez/Hive LLAP on HDP 2.6 or later for both the Power S822LC and x86-based Worker Nodes.
5. Client is responsible for demonstrating comparable real-world representative workload between the Power S822LC for BD Worker Node and the x86 Worker Node through the use of the IBM provided tools and comparable tools on x86 systems.
6. 3X guarantee is based on a list price for x86 servers from Dell, Cisco, HP or Lenovo based on E5-2600 v4 or earlier processor technology and the IBM S822LC for Big Data.
The IBM Power S822LC for Big Data servers (22-core/2.89 GHz) used as Worker Nodes must be purchased from IBM or an authorized IBM Business Partner prior to
September 30, 2017. The guarantee period is valid for three (3) months from the date of purchase. The x86-based Worker Nodes must be comparably configured
branded servers from Cisco, Dell, HP, or Lenovo and the client is responsible for all Hortonworks licenses.
3X throughput performance per price means that the customer's documented throughput performance on the cluster of Power S822LC for BD Worker Nodes based on
either queries, operations or transactions per second divided by the price of the cluster of Worker Nodes will be at least 3 times higher than the customer's same
documented throughput performance on the cluster of x86 Worker Nodes divided by the price of said cluster of x86 Worker Nodes.
Remediation: IBM will provide additional performance optimization and tuning services consistent with IBM Best Practices, at no charge. If unable to reach the
guaranteed level of price-performance, IBM will provide additional equally configured Worker Nodes to those already purchased to reach the guaranteed level of price-
performance.
Only Available until Dec 31, 2017!