dmm100_15452_karlr_1

31
Public Ruediger Karl SAP HANA Product and Development Group, Walldorf DMM100 SAP HANA Big Data Overview and Road Map

Upload: konduru25

Post on 17-Dec-2015

6 views

Category:

Documents


1 download

DESCRIPTION

DMM

TRANSCRIPT

  • Public

    Ruediger Karl

    SAP HANA Product and Development Group, Walldorf

    DMM100 SAP HANA Big Data Overview and Road Map

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 2 Public

    Disclaimer

    This presentation outlines our general product direction and should not be relied on in making a

    purchase decision. This presentation is not subject to your license agreement or any other agreement

    with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to

    develop or release any functionality mentioned in this presentation. This presentation and SAP's

    strategy and possible future developments are subject to change and may be changed by SAP at any

    time for any reason without notice. This document is provided without a warranty of any kind, either

    express or implied, including but not limited to, the implied warranties of merchantability, fitness for a

    particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this

    document, except if such damages were caused by SAP intentionally or grossly negligent.

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 3 Public

    Welcome to the World of Big Data

    By GE Look ahead Posted March 31, 2013

    of YouTube video uploads

    72 hrs

    Wordpress blog posts

    347

    new websites

    571

    Flickr photo views

    20 Million

    Instagram photos

    3,600

    Email messages

    204 Million

    Google searches

    2 Million Twitter tweets

    278 Thousand

    SOURCE: http://cult320spring2014.richardtoddstafford.com/?p=802

    Every Minute of Every Day

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 4 Public

    Turn new Signals into Business Value A breakthrough in todays information processing architecture is needed

    Social

    Mobile

    Real-Time

    Business

    Cloud

    Business

    Transactions

    In-Memory

    Advanced

    Analytics

    Digital

    Connections

    Collaborative

    Business

    Big Data

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 5 Public

    Big Data Processing Pipeline

    Acquisition Recording

    Cleaning Extraction

    Modeling Analysis

    Interpretation Visualization

    Scalability, Performance, Heterogeneity, Security, System Architecture, Human Input

    Volume Explosion in

    the amount of

    data

    Velocity Fast collection,

    processing

    and

    consumption

    Variety Multiple data

    formats

    structured,

    non-structured

    Value Hidden value

    of data is not

    always visible

    at first sight

    Veracity Data

    inconsistency,

    incompleteness,

    ambiguities

  • SAP HANA Platform for Big Data

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 7 Public

    Cobbling Disparate Tools together is NOT the Answer!

    Interfaces

    Analytics

    Open Source

    Projects

    Infrastructure

    Interfaces Data Sources

    Analytics Applications

    Integrated Platform

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 8 Public

    SAP HANA Integrated Platform for Big Data

    Acquisition Streaming Data Services Smart Data Integration

    and Quality

    Stored

    Procedures Geospatial Federation

    Store

    Process

    Lumira InfiniteInsight SAS Visualize

    Extended

    Storage

    Column Store, Row Store

    Text XS AFL / PAL

    Query Compilation and Execution

    Log-Data Hadoop Events RDBMS Social Networks

    SAP BI

    R

    Calculation Engine SQL Engine

    Non-SAP

    Graph Federation

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 9 Public

    Streaming node

    SAP HANA smart data streaming (new: SPS09)

    Receive events from various sources and capture data in HANA (> 1Mio events/s)

    Smart capture (transform, filter, aggregate, enrich)

    Streaming node(s) run on their own host(s), but administered by HANA Studio/Cockpit

    Streaming nodes are supported on the same OS-platforms as HANA

    Various built-in in/out adapters + Adapter Toolkit for custom adapters

    Retention windows define how many or how long events are kept

    Apply continuous query operators to one or more input streams to produce a new stream

    Query Language: CCL (SQL-like), CCLScript (script-like)

    Define the dataflow and the continuous queries

    Streaming node

    SAP HANA Platform

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 10 Public

    HANA smart data access Ad-hoc access to remote data sources

    Key Elements

    Virtual Tables enable access to remote data access just like local table

    Supports data location agnostic development

    Smart query processing including query decomposition with predicate push-down,

    No special syntax to access heterogeneous data sources

    Functional compensation allows customers to use the full power of HANA

    Automatic data type translation enables remote data types to be mapped to HANA data types

    Transactions + Analytics

    SAP HANA

    Virtual

    Tables

    HANA

    Tables

    DB2

    Hadoop

    Teradata MaxDB

    IQ MSSQL

    Oracle

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 11 Public

    Query Federation Architecture

    Access Methods

    Row/Column

    tables

    DB Catalog

    Virtual Tables

    SQL Processor

    Views

    Adapter Framework

    Virtual Access Layer

    Query Federation Remote Query

    Execution

    Query

    Optimization

    Data Type

    Conversion

    Remote SQL

    generation

    Built-in

    Adapters

    Driver Manager

    Primary Goals

    Maximize performance by reducing data

    transfer and latency

    Extend SAP HANA query processing

    capabilities to remote data

    Optimizer Extensions

    The optimizer knows whether certain

    operations can be shipped to a remote

    server/data source and be able to consider

    this in its costing

    The optimizer is able to generate valid plans in

    cases where HANA specific features cannot

    be used

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 12 Public

    SAP HANA Calculation Engine Complex Analytics on heterogeneous data

    The SAP HANA calculation engine is a common execution runtime, combining data of different domain-specifc models and languages in

    one execution plan

    The HANA modelling environment or compilers of the different domain-languages,

    such as SQLSqript, MDX, WIPE, FOX, Graph, generate calculation scenarios

    Calculation scenarios are parameterized, directed acyclic graph of calculation

    nodes, e.g. Script-nodes, SQL-nodes, Cpp-nodes, R-nodes and custom-nodes

    The calculation engine maps the logical nodes to physical operators, transforms the

    logical plan into an optimized physical execution plan (heuristic + cost-based) and

    coordinates the execution

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 13 Public

    In-Database Predictive Analytics

    R Integration

    Enables the use of the R open source

    environment (> 3,500 packages) in the

    context of the HANA in-memory

    database

    R script is embedded within SQLScript

    (RLANG procedures)

    AFL

    On demand library loading framework

    for registered and supported libraries

    Predictive Analysis and Business

    Function Library have been released

    as pre-built AFL content

    The Application Function Modeler

    provides a graphical editor

    SAS integration

    SAS Embedded Process enables

    utilization of native SAS predictive

    models for scoring and selected SAS

    algorithms inside SAP HANA.

    SAP HANA Platform

    In-Memory Processing Engine

    SAS Embedded

    Process (EP)

    SAS

    Algorithms

    HANA Calculation

    Engine

    SAS Predictive

    Models

    Application Function Library

    SAP HANA

    Script Server

    R-Script

    PAL

    BFL

    UDF

    KXEN

    SAS

    SQL Script

    SQL

    R-Engine

    R-Machine

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 14 Public

    In-Database Geospatial Processing

    Real-Time Data

    SAP Data Spatial Data

    Spatial Data

    Business Data Spatial

    Data

    Types

    Spatial

    Functions

    Calc

    Model /

    Views

    Geo-

    Content

    Geo-

    Services

    Columnar

    Spatial

    Storage

    Non-SAP Data

    SAP

    HANA

    Real-Time Data

    OLTP Analytics Planning Predictive Text Spatial

    Store, process, manipulate, retrieve, and share

    spatial data

    OGC Compliant

    Unified Modeling Platform

    Optimized in-memory spatial processing

    Esri Support

    3D point support

    Spatial datatypes

    New native content and services from

    Nokia/HERE

    Native geocoding trigger

  • HANA Hadoop Integration You cant have a conversation about Big Data without talking about Hadoop

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 16 Public

    Apache Hadoop is open source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers

    Server Cluster Distributed Filesystem (HDFS)

    Designed for massive scale of processors,

    memory, and local attached storage

    File

    Designed for reliable and fault-tolerant

    computing;

    working on file blocks (64MB), distributed

    redundantly across the datanodes

    Namenode

    Datanodes

    Map-Reduce Framework

    Designed for a massively parallel programming

    model;

    handles scheduling, dispatching, execution.

    monitoring, communication

    Job Tracker

    Task Trackers

    Hadoop explained

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 17 Public

    Enterprise Readiness with Hadoop 2.0

    Provision, Manage & Monitor

    Ambari Zookeeper

    Scheduling

    Oozie

    Data Workflow, Lifecycle &

    Governance

    Falcon Sqoop Flume

    NFS WebHDFS

    YARN : Data Operating System

    DATA MANAGEMENT

    SECURITY DATA ACCESS GOVERNANCE &

    INTEGRATION

    Authentication Authorization

    Accounting Data Protection

    Storage: HDFS

    Resources: YARN Access: Hive, Pipeline: Falcon

    Cluster: Knox

    OPERATIONS

    Script

    Pig

    Search

    Solr

    SQL

    Hive/Tez, HCatalog

    NoSQL

    HBase Accumulo

    Stream

    Storm

    Others

    Spark, ISV engines

    1

    N

    HDFS (Hadoop Distributed File System)

    Batch

    Map Reduce

    Deployment Choice Linux Windows On-

    Premise

    Cloud

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 18 Public

    HANA - Hadoop Integration

    Data Services ESP

    Stored

    Procedures Geospatial Federation

    SAP HANA

    Studio (AFM) KXEN, Lumira SAS

    Columnstore, Rowstore

    Text XS AFL / PAL

    Query Compilation and Execution

    SAP BI 4.0

    R

    Calculation Engine SQL Engine

    Graph Federation

    Hive

    File Adapter ODBC

    SQL UDF

    Execute MR Jobs

    (new: SPS09)

    JDBC

    Design Goals

    Flexible options to access seamlessly data in

    Hadoop cluster

    Leverage Hive to provide SQL-based query

    processing from HANA

    Provide native HDFS access and MR Job

    execution from HANA for non-SQL processing in

    Hadoop

    Co-innovate with major Hadoop distributors

    Hadoop

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 19 Public

    Option 1: HANA SDA Integration with Hadoop (>=SPS06)

    SAP HANA smart data access provides seamless ad-hoc query processing to Hadoop via Hive

    Virtual

    Table

    SAP HANA

    Federation

    Layer

    Driver (Computer, Optimizer, Executor)

    Metastore

    Web Interface

    Command Line Interface ODBC JDBC

    HIVE

    HIVEQL

    Hadoop

    MR Jobs

    ODBC

    SAP HANA

    App

    SQL

    Hive is an open source data warehouse solution on top of Hadoop with direct access to HDFS

    Technical View:

    Hive provides mapping between unstructured file format and structured table-format

    Hive optimizes queries and initiates MR Jobs in Hadoop

    HANA virtual table is mapped to Hive table

    HANA CalcView support for virtual tables

    HANA provides local and remote caching capabilities (>=SPS07)

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 20 Public

    Option 2: HANA vUDF Integration with Hadoop (SPS09)

    HANA virtual User-Defined Function leverages the use of native features in Hadoop (Map/Reduce, HDFS access)

    Map-Reduce Jobs allow to analyze unstructured binary data in a Hadoop Cluster

    vUDF

    SAP HANA

    Hadoop

    Agent

    WebHCat

    Hadoop

    MR Jobs HTTP

    SAP HANA

    App

    SQL

    MR Job Invocation:

    HANA vUDF is a new type of user-defined table function and can be used wherever a normal UDF could be used (in any SQL fragment)

    The vUDF is connected to a Hadoop destination (remote source)

    The vUDF defines the mapper and reducer job classes, the input data file, job package name as well as the return (table-)parameter

    At the very first vUDF call, the associated job package is pushed and deployed into the Hadoop destination

    The map and reduce java classes can be created and stored in the HANA development environment

    WebHDFS

    native HDFS

    access

  • Big Data Scenarios

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 22 Public

    Industry-wide opportunities through big data intelligence

    Aerospace & Defense Automotive Consumer Products Financial Services

    Healthcare & Life Science

    Manufacturing Public Sector Retail

    Sports & Entertainment Tech Utilities

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 23 Public

    Machine Data Insight

    Predictive

    Maintenance

    Location Based

    Services

    Remote Asset

    Management

    Distributed Process

    Monitoring

    Insight from detailed

    data about physical

    assets

    Integrated and

    connected in real-time

    Predict and act with

    awareness of condition

    and environment

    Lower Costs New and Improved

    Services New Business Models

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 24 Public

    Omni-Channel Data Insight

    Unstructured Data

    Insight & Analytics

    Simulation &

    Prediction

    Performance

    Optimization

    Insight from many data

    sources and types

    Integrated and

    connected in real-time

    with enterprise data

    Predict, forecast, and

    optimize for greater

    insights & performance

    Maximize Business & IT

    Benefits

    Optimize Organizational

    Performance

    Increase Sales & Improve

    Customer Service

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 25 Public

    PdMS Sample Scenario: Machine Health Control Center

    The Machine Health Control Center is the common entry

    point for warranty managers, service managers, engineers,

    dealers and fleet managers to get an overview on a

    populations of vehicles or machines.

    It serves the purpose of highlighting key information like

    actual status, emerging issues, warranty claims, current

    locations, planned services in one single view. Visualization

    is based on a map where appropriate or in list format.

    Selection by region, vehicle or machine type, status,

    customer, dealer etc. helps to focus the users activities and

    information needs.

    For each individual vehicle or machine a drill down into the

    Material Health Fact Sheet which provides any level of detail

    requested. Follow up actions and workflows can be triggered

    for both individual or a group of vehicles and machines.

  • Outlook

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 27 Public

    Planned Innovations

    Develop and execute custom-based M/R Jobs in

    HANA

    Execute custom map reduce methods from SAP HANA and consume the results from HANA queries via vUDF

    Develop and store MR Java classes in HANA Studio

    Direct HDFS access via WebHDFS without HIVE/SDA or MR Jobs

    Performance

    SDA optimization for connectivity, semi joins & relocation when large amounts of data is queried from SAP HANA and the

    number of Hadoop nodes are increased

    Concurrent execution of HIVE jobs from SAP HANA

    Enterprise readiness

    Support Hortonworks HDP-2.0, CDH-5.1 Hadoop distributions with SAP HANA

    Extended Hadoop Integration

    Trigger and consume output of PIG jobs from HANA

    Support HBase as a store for HANA federated queries

    Leverage Hive ACID-capabilites (> Hive 0.13)

    Leverage Apache Spark (inmemory SQL engine)

    Performance

    Parallel connections from HANA to Hadoop for loading and queries

    Extend HANA compute engine to Hadoop

    Enterprise readiness

    Use Hadoop as tiered storage

    Integrate Hadoop partner distros into HANA cockpit

    Common authentication and authorization

    Connect HANA admin console to Ambari

    SPS09 Future Direction

    This is the current state of planning and may be changed by SAP at any time.

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 28 Public

    SAP d-code Virtual Hands-on Workshops and SAP d-code Online Continue your SAP d-code education after the event!

    SAP d-code Online

    Access replays of keynotes, Demo Jam, SAP d-code live interviews, select lecture sessions, and more!

    Hands-on replays

    http://sapdcode.com/online

    SAP d-code Virtual Hands-on Workshops

    Access hands-on workshops post-event

    Starting January 2015

    Complementary with your SAP d-code registration

    http://sapdcodehandson.sap.com

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 29 Public

    Further Information

    SAP Education and Certification Opportunities

    www.sap.com/education

    Watch SAP d-code Online

    www.sapcode.com/online

    SAP Public Web

    scn.sap.com

    www.sap.com

    www.sapbigdata.com

    www.sap.com/solution/big-data/software/overview.html

  • 30 2014 SAP SE or an SAP affiliate company. All rights reserved.

    Feedback Please complete your session evaluation for

    DMM100 SAP HANA Big Data Overview and Road Map.

    Thanks for attending this SAP TechEd && d-code session.

    2014 SAP SE or an SAP affiliate company. All rights reserved. 30 Public

  • 2014 SAP SE or an SAP affiliate company. All rights reserved. 31 Public

    2014 SAP SE or an SAP affiliate company. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an

    SAP affiliate company.

    SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE

    (or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark

    information and notices.

    Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.

    National product specifications may vary.

    These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its

    affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or

    SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing

    herein should be construed as constituting an additional warranty.

    In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or

    release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for

    any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-

    looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place

    undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.