lenovo intelligent insights with sap data hub running on ...€¦ · single point of access to all...

1

Lenovo Intelligent Insights with SAP Data Hub

running on SUSE CaaS Platform

BOV-1141

Marcel Scherer, Lenovo

March 25th, 2020

2

Agenda

1. Why SUSE?

2. Influencing Trends in Data

3. What is SAP Data Hub or Lenovo Intelligent Insights?

4. A Reference Architecture for SAP Data Hub

5. Use Cases

3

Why SUSE?

4

We Maintain A Vital Partnership

Joint events

Joint publications

Joint solution development

5

Influencing

Trends In Data

Cloud Computing

Objectives:

Flexibility

Economical Reasons

Central Data Storage

Facility

Ease of use

Automation

PaaS

SaaS

IaaS

7

Big Data

Extracted Business Data

Unstructured data which is

complex to analyze

Data that can be assigned to

„The four V‘s“

Volume – scale/mass of data

Variety – different forms of data

Velocity – high speed of data

streams

Veracity – uncertanty of data

AI – Artificial Intelligence

AIMachine

LearningDeep

Learning

OutputInput

DATA

9

What Is SAP

Data Hub Or

Lenovo Intelligent

Insights?

10

Single Point Of Access To All Of Your Data

Lenovo Intelligent Insights

11

Find More Sophisticated Ways To Use Data

Make data

consumable!

12

Maximize Value From Distributed Data Assets

Refineenrich, transform, reuse, curate

Discoveryour data landscape and its interconnections.

Governand secure data assets transparently with compliance

Orchestrateyour data using modular data pipelines

13

How To Create Your Menu

Data Sources Tooling

Hadoop

S/4 HANA

SalesforceIoT/Edge

Data Warehouse

Excel

Plain Files

S3 Storage

Video

Social Media

Image Data

Mobile Devices

SAP Data Hub

Tensor Flow

Kafka

R

Java

ABAP

Python

SAP HANA

14

Ease your Journey Through Digital Transformation!

Research &

Development Production LogisticsSales

IT

Administration/Board

Le

no

vo

In

telli

ge

nt In

sig

hts

Gain comprehensive insight Make company data accessible

15

Scalable Container Platform

Intelligent Insights uses a non-monolithic

micro services architecture based on

containers.

If workload exeeds resources just

cluster simply can be extended.

Program is designed to be

transportable. It can be run on

premise as well as on IaaS.

16

User Interface

17

A Reference

Architecture For

SAP Data Hub

Defined jointly by

SUSE and Lenovo

18

Goals

1. Ultimative Scalability

2. Flexibility

3. Resiliency

4. An Enterprise-ready Solution

19

Components Of The Solution

SAP Data Hub

Application

SAP Data Hub

Distributed Runtime

Kubernetes

Operating System

Server Hardware for Compute

Fabric for Networking and Storage

Storage

Server Hardware

Operating System

Hadoop

Implementation

optional

?

SUSE Container as a Service Platform

SUSE Linux Enterprise Server for SAP

Lenovo ThinkSystem SR630

SLES

ThinkSystem SR630

Hortonworks

Data Platform

Lenovo ThinkSystem 25GbE Rack Switches

Lenovo ThinkSystem SR650 + SUSE Enterprise Storage

21

SUSE Enterprise Storage

Unlimited Scalability with self-managing Technology

• Highly Scalable

• Reduces Cost

• Seamlessly adapts to business

needs

• Interfaces:

• Block: iSCSI, RBD

• Object: S3-API

• File: CephFS, NFS

• Integrates with CaaSP

• High Performance

22

An Overall Scalable Solution

4x SUSE Enterprise Storage

1x Kubernetes Master

3x Kubernetes Worker

1x Docker Registry

1x Dashboard

(more for HA)

4x HDFS Node (optional)

Redundand Networking (10GbE for

Compute, 25GbE for Storage,

1GbE for Management)

Dynamically provisioned

storage for

SAP Data Hub

Docker Registry

HDFS

S3 Object Storage

Compute Nodes for

Kubernetes Cluster and

Management

Connection to Managed

Systems

Compute Nodes for

Hadoop Big Data

Platform

Scale

Add

Additional

Node!

23

Sizing – A Matter Of Dependencies

DependencyCPUMain

MemoryDisk

Size &

Velocity

Complexity

of Pipelines

Concurrent

Users

SAP per cluster recommends a

minimum of 256GB main memory

for production purposes

SAP recommends a minimum of 64

„CPUs“ per cluster for production.

One CPU from a Kubernetes point of

view corresponds to one CPU core.

2.100 GB disk space (persistent

volumes) are required for running

OS and software.

5.500 GB external storage for

checkpoint store.

100 GB for each Kubernetes

Worker to store Docker images.

At least 60GB free space on

container registry node.

Number of concurrent users have

crucial influence on the defaults:

Type of user (admin, developer,

data governancer, ...)

Activity

Type of workload they produce

and run (concurrent jobs)

Size and type of data heavily

influence sizing of the overall

solution.

Velocity new data arrives at the data

source via data streams are crucial,

too, for sizing of the Kubernetes

cluster.

Example: Steady video streaming

has different requirements than

analyzing static text data.

The pipeline models can become

complex in two dimensions:

1. Structures with many entities

2. Structures including scripts that

produce heavy load

To a certain amount the purpose

must be clear in advance to allow a

proper sizing.

25

IoT Ingestion & Orchestration

Understand real-world performance

Tackle the challenge of integrating and

analyzing vast quantities of raw data

and events from disparatesemi-

structured sources, having low-level

semantics and no business context

Solve the point-to-point challenge of

distributed heterogeneous environments

spanning messaging systems, cloud

storages, SAP data management

solutions, andenterprise apps

Event-driven pipelines scaling to

executions of many pipelines in

parallel, at any time

Data Catalogingand

Governance

Understand and secure yourdata

Crawl through data stores togather

valuable metadata and store it in a

centralized information catalog

Profile source data to gain adeeper

understanding of the data to create

meaningful data pipelines

Move to centralized data access

and control for all orchestration,

data refinement, scheduling,

and monitoring

Data Science & Machine

Learning

Machine learning and predictive analytics

One unified tool to process machine

learning and advanced analytics

algorithms on any mix of engines,

both SAP (HANA PAL, Leonardo ML

etc.) and non-SAP (Python, R,

Spark, TensorFlow etc.)

On the same tool, handle data

ingestion and preparation from any

source of any kind, solving point-to-

point challenges

Easily infuse machine learning and

predictive into any target business

process

Data Warehousing

Rapidly integrate and leverage new

data sources

Acquire new data sources with

previously siloed data from

traditional data warehouses, data

marts, enterprise applications,and

Big Data stores

Combine all types of sources

including structured and

unstructured data, and enable a

large variety of processing on them

Seamlessly process large data sets

across highly distributed

landscapes and close to the data

source, moving only high value data

SAP Data Hub use case frameworks

26

Visit related lectures:

Running SAP Data Hub on Kubernetes with SUSE CaaS Platform ([TUT-1212], 2020/03/26)

by Kevin Klinger, SUSE

Running Machine Learning workloads on Kubernetes/SUSE CaaS Platform with SAP Data Intelligence ([SPO-1471],

2020/03/26)

by Andreas Engel, SAP

Q&A

27

General Disclaimer

This document is not to be construed as a promise by any participating company to

develop, deliver, or market a product. It is not a commitment to deliver any material,

code, or functionality, and should not be relied upon in making purchasing

decisions. SUSE makes no representations or warranties with respect to the contents of

this document, and specifically disclaims any express or implied warranties of

merchantability or fitness for any particular purpose. The development, release, and

timing of features or functionality described for SUSE products remains at the sole

discretion of SUSE. Further, SUSE reserves the right to revise this document and to

make changes to its content, at any time, without obligation to notify any person or entity

of such revisions or changes. All SUSE marks referenced in this presentation are

trademarks or registered trademarks of SUSE, LLC, Inc. in the United States and other

countries. All third-party trademarks are the property of their respective owners.

lenovo intelligent insights with sap data hub running on ...€¦ · single point of access to all...

Documents