dmm100_15452_karlr_1
DESCRIPTION
DMMTRANSCRIPT
-
Public
Ruediger Karl
SAP HANA Product and Development Group, Walldorf
DMM100 SAP HANA Big Data Overview and Road Map
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 2 Public
Disclaimer
This presentation outlines our general product direction and should not be relied on in making a
purchase decision. This presentation is not subject to your license agreement or any other agreement
with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to
develop or release any functionality mentioned in this presentation. This presentation and SAP's
strategy and possible future developments are subject to change and may be changed by SAP at any
time for any reason without notice. This document is provided without a warranty of any kind, either
express or implied, including but not limited to, the implied warranties of merchantability, fitness for a
particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this
document, except if such damages were caused by SAP intentionally or grossly negligent.
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 3 Public
Welcome to the World of Big Data
By GE Look ahead Posted March 31, 2013
of YouTube video uploads
72 hrs
Wordpress blog posts
347
new websites
571
Flickr photo views
20 Million
Instagram photos
3,600
Email messages
204 Million
Google searches
2 Million Twitter tweets
278 Thousand
SOURCE: http://cult320spring2014.richardtoddstafford.com/?p=802
Every Minute of Every Day
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 4 Public
Turn new Signals into Business Value A breakthrough in todays information processing architecture is needed
Social
Mobile
Real-Time
Business
Cloud
Business
Transactions
In-Memory
Advanced
Analytics
Digital
Connections
Collaborative
Business
Big Data
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 5 Public
Big Data Processing Pipeline
Acquisition Recording
Cleaning Extraction
Modeling Analysis
Interpretation Visualization
Scalability, Performance, Heterogeneity, Security, System Architecture, Human Input
Volume Explosion in
the amount of
data
Velocity Fast collection,
processing
and
consumption
Variety Multiple data
formats
structured,
non-structured
Value Hidden value
of data is not
always visible
at first sight
Veracity Data
inconsistency,
incompleteness,
ambiguities
-
SAP HANA Platform for Big Data
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 7 Public
Cobbling Disparate Tools together is NOT the Answer!
Interfaces
Analytics
Open Source
Projects
Infrastructure
Interfaces Data Sources
Analytics Applications
Integrated Platform
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 8 Public
SAP HANA Integrated Platform for Big Data
Acquisition Streaming Data Services Smart Data Integration
and Quality
Stored
Procedures Geospatial Federation
Store
Process
Lumira InfiniteInsight SAS Visualize
Extended
Storage
Column Store, Row Store
Text XS AFL / PAL
Query Compilation and Execution
Log-Data Hadoop Events RDBMS Social Networks
SAP BI
R
Calculation Engine SQL Engine
Non-SAP
Graph Federation
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 9 Public
Streaming node
SAP HANA smart data streaming (new: SPS09)
Receive events from various sources and capture data in HANA (> 1Mio events/s)
Smart capture (transform, filter, aggregate, enrich)
Streaming node(s) run on their own host(s), but administered by HANA Studio/Cockpit
Streaming nodes are supported on the same OS-platforms as HANA
Various built-in in/out adapters + Adapter Toolkit for custom adapters
Retention windows define how many or how long events are kept
Apply continuous query operators to one or more input streams to produce a new stream
Query Language: CCL (SQL-like), CCLScript (script-like)
Define the dataflow and the continuous queries
Streaming node
SAP HANA Platform
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 10 Public
HANA smart data access Ad-hoc access to remote data sources
Key Elements
Virtual Tables enable access to remote data access just like local table
Supports data location agnostic development
Smart query processing including query decomposition with predicate push-down,
No special syntax to access heterogeneous data sources
Functional compensation allows customers to use the full power of HANA
Automatic data type translation enables remote data types to be mapped to HANA data types
Transactions + Analytics
SAP HANA
Virtual
Tables
HANA
Tables
DB2
Hadoop
Teradata MaxDB
IQ MSSQL
Oracle
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 11 Public
Query Federation Architecture
Access Methods
Row/Column
tables
DB Catalog
Virtual Tables
SQL Processor
Views
Adapter Framework
Virtual Access Layer
Query Federation Remote Query
Execution
Query
Optimization
Data Type
Conversion
Remote SQL
generation
Built-in
Adapters
Driver Manager
Primary Goals
Maximize performance by reducing data
transfer and latency
Extend SAP HANA query processing
capabilities to remote data
Optimizer Extensions
The optimizer knows whether certain
operations can be shipped to a remote
server/data source and be able to consider
this in its costing
The optimizer is able to generate valid plans in
cases where HANA specific features cannot
be used
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 12 Public
SAP HANA Calculation Engine Complex Analytics on heterogeneous data
The SAP HANA calculation engine is a common execution runtime, combining data of different domain-specifc models and languages in
one execution plan
The HANA modelling environment or compilers of the different domain-languages,
such as SQLSqript, MDX, WIPE, FOX, Graph, generate calculation scenarios
Calculation scenarios are parameterized, directed acyclic graph of calculation
nodes, e.g. Script-nodes, SQL-nodes, Cpp-nodes, R-nodes and custom-nodes
The calculation engine maps the logical nodes to physical operators, transforms the
logical plan into an optimized physical execution plan (heuristic + cost-based) and
coordinates the execution
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 13 Public
In-Database Predictive Analytics
R Integration
Enables the use of the R open source
environment (> 3,500 packages) in the
context of the HANA in-memory
database
R script is embedded within SQLScript
(RLANG procedures)
AFL
On demand library loading framework
for registered and supported libraries
Predictive Analysis and Business
Function Library have been released
as pre-built AFL content
The Application Function Modeler
provides a graphical editor
SAS integration
SAS Embedded Process enables
utilization of native SAS predictive
models for scoring and selected SAS
algorithms inside SAP HANA.
SAP HANA Platform
In-Memory Processing Engine
SAS Embedded
Process (EP)
SAS
Algorithms
HANA Calculation
Engine
SAS Predictive
Models
Application Function Library
SAP HANA
Script Server
R-Script
PAL
BFL
UDF
KXEN
SAS
SQL Script
SQL
R-Engine
R-Machine
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 14 Public
In-Database Geospatial Processing
Real-Time Data
SAP Data Spatial Data
Spatial Data
Business Data Spatial
Data
Types
Spatial
Functions
Calc
Model /
Views
Geo-
Content
Geo-
Services
Columnar
Spatial
Storage
Non-SAP Data
SAP
HANA
Real-Time Data
OLTP Analytics Planning Predictive Text Spatial
Store, process, manipulate, retrieve, and share
spatial data
OGC Compliant
Unified Modeling Platform
Optimized in-memory spatial processing
Esri Support
3D point support
Spatial datatypes
New native content and services from
Nokia/HERE
Native geocoding trigger
-
HANA Hadoop Integration You cant have a conversation about Big Data without talking about Hadoop
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 16 Public
Apache Hadoop is open source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers
Server Cluster Distributed Filesystem (HDFS)
Designed for massive scale of processors,
memory, and local attached storage
File
Designed for reliable and fault-tolerant
computing;
working on file blocks (64MB), distributed
redundantly across the datanodes
Namenode
Datanodes
Map-Reduce Framework
Designed for a massively parallel programming
model;
handles scheduling, dispatching, execution.
monitoring, communication
Job Tracker
Task Trackers
Hadoop explained
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 17 Public
Enterprise Readiness with Hadoop 2.0
Provision, Manage & Monitor
Ambari Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle &
Governance
Falcon Sqoop Flume
NFS WebHDFS
YARN : Data Operating System
DATA MANAGEMENT
SECURITY DATA ACCESS GOVERNANCE &
INTEGRATION
Authentication Authorization
Accounting Data Protection
Storage: HDFS
Resources: YARN Access: Hive, Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Others
Spark, ISV engines
1
N
HDFS (Hadoop Distributed File System)
Batch
Map Reduce
Deployment Choice Linux Windows On-
Premise
Cloud
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 18 Public
HANA - Hadoop Integration
Data Services ESP
Stored
Procedures Geospatial Federation
SAP HANA
Studio (AFM) KXEN, Lumira SAS
Columnstore, Rowstore
Text XS AFL / PAL
Query Compilation and Execution
SAP BI 4.0
R
Calculation Engine SQL Engine
Graph Federation
Hive
File Adapter ODBC
SQL UDF
Execute MR Jobs
(new: SPS09)
JDBC
Design Goals
Flexible options to access seamlessly data in
Hadoop cluster
Leverage Hive to provide SQL-based query
processing from HANA
Provide native HDFS access and MR Job
execution from HANA for non-SQL processing in
Hadoop
Co-innovate with major Hadoop distributors
Hadoop
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 19 Public
Option 1: HANA SDA Integration with Hadoop (>=SPS06)
SAP HANA smart data access provides seamless ad-hoc query processing to Hadoop via Hive
Virtual
Table
SAP HANA
Federation
Layer
Driver (Computer, Optimizer, Executor)
Metastore
Web Interface
Command Line Interface ODBC JDBC
HIVE
HIVEQL
Hadoop
MR Jobs
ODBC
SAP HANA
App
SQL
Hive is an open source data warehouse solution on top of Hadoop with direct access to HDFS
Technical View:
Hive provides mapping between unstructured file format and structured table-format
Hive optimizes queries and initiates MR Jobs in Hadoop
HANA virtual table is mapped to Hive table
HANA CalcView support for virtual tables
HANA provides local and remote caching capabilities (>=SPS07)
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 20 Public
Option 2: HANA vUDF Integration with Hadoop (SPS09)
HANA virtual User-Defined Function leverages the use of native features in Hadoop (Map/Reduce, HDFS access)
Map-Reduce Jobs allow to analyze unstructured binary data in a Hadoop Cluster
vUDF
SAP HANA
Hadoop
Agent
WebHCat
Hadoop
MR Jobs HTTP
SAP HANA
App
SQL
MR Job Invocation:
HANA vUDF is a new type of user-defined table function and can be used wherever a normal UDF could be used (in any SQL fragment)
The vUDF is connected to a Hadoop destination (remote source)
The vUDF defines the mapper and reducer job classes, the input data file, job package name as well as the return (table-)parameter
At the very first vUDF call, the associated job package is pushed and deployed into the Hadoop destination
The map and reduce java classes can be created and stored in the HANA development environment
WebHDFS
native HDFS
access
-
Big Data Scenarios
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 22 Public
Industry-wide opportunities through big data intelligence
Aerospace & Defense Automotive Consumer Products Financial Services
Healthcare & Life Science
Manufacturing Public Sector Retail
Sports & Entertainment Tech Utilities
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 23 Public
Machine Data Insight
Predictive
Maintenance
Location Based
Services
Remote Asset
Management
Distributed Process
Monitoring
Insight from detailed
data about physical
assets
Integrated and
connected in real-time
Predict and act with
awareness of condition
and environment
Lower Costs New and Improved
Services New Business Models
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 24 Public
Omni-Channel Data Insight
Unstructured Data
Insight & Analytics
Simulation &
Prediction
Performance
Optimization
Insight from many data
sources and types
Integrated and
connected in real-time
with enterprise data
Predict, forecast, and
optimize for greater
insights & performance
Maximize Business & IT
Benefits
Optimize Organizational
Performance
Increase Sales & Improve
Customer Service
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 25 Public
PdMS Sample Scenario: Machine Health Control Center
The Machine Health Control Center is the common entry
point for warranty managers, service managers, engineers,
dealers and fleet managers to get an overview on a
populations of vehicles or machines.
It serves the purpose of highlighting key information like
actual status, emerging issues, warranty claims, current
locations, planned services in one single view. Visualization
is based on a map where appropriate or in list format.
Selection by region, vehicle or machine type, status,
customer, dealer etc. helps to focus the users activities and
information needs.
For each individual vehicle or machine a drill down into the
Material Health Fact Sheet which provides any level of detail
requested. Follow up actions and workflows can be triggered
for both individual or a group of vehicles and machines.
-
Outlook
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 27 Public
Planned Innovations
Develop and execute custom-based M/R Jobs in
HANA
Execute custom map reduce methods from SAP HANA and consume the results from HANA queries via vUDF
Develop and store MR Java classes in HANA Studio
Direct HDFS access via WebHDFS without HIVE/SDA or MR Jobs
Performance
SDA optimization for connectivity, semi joins & relocation when large amounts of data is queried from SAP HANA and the
number of Hadoop nodes are increased
Concurrent execution of HIVE jobs from SAP HANA
Enterprise readiness
Support Hortonworks HDP-2.0, CDH-5.1 Hadoop distributions with SAP HANA
Extended Hadoop Integration
Trigger and consume output of PIG jobs from HANA
Support HBase as a store for HANA federated queries
Leverage Hive ACID-capabilites (> Hive 0.13)
Leverage Apache Spark (inmemory SQL engine)
Performance
Parallel connections from HANA to Hadoop for loading and queries
Extend HANA compute engine to Hadoop
Enterprise readiness
Use Hadoop as tiered storage
Integrate Hadoop partner distros into HANA cockpit
Common authentication and authorization
Connect HANA admin console to Ambari
SPS09 Future Direction
This is the current state of planning and may be changed by SAP at any time.
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 28 Public
SAP d-code Virtual Hands-on Workshops and SAP d-code Online Continue your SAP d-code education after the event!
SAP d-code Online
Access replays of keynotes, Demo Jam, SAP d-code live interviews, select lecture sessions, and more!
Hands-on replays
http://sapdcode.com/online
SAP d-code Virtual Hands-on Workshops
Access hands-on workshops post-event
Starting January 2015
Complementary with your SAP d-code registration
http://sapdcodehandson.sap.com
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 29 Public
Further Information
SAP Education and Certification Opportunities
www.sap.com/education
Watch SAP d-code Online
www.sapcode.com/online
SAP Public Web
scn.sap.com
www.sap.com
www.sapbigdata.com
www.sap.com/solution/big-data/software/overview.html
-
30 2014 SAP SE or an SAP affiliate company. All rights reserved.
Feedback Please complete your session evaluation for
DMM100 SAP HANA Big Data Overview and Road Map.
Thanks for attending this SAP TechEd && d-code session.
2014 SAP SE or an SAP affiliate company. All rights reserved. 30 Public
-
2014 SAP SE or an SAP affiliate company. All rights reserved. 31 Public
2014 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an
SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE
(or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark
information and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or
SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing
herein should be construed as constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or
release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for
any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.