lenovo intelligent insights with sap data hub running on ...€¦ · single point of access to all...
TRANSCRIPT
1
Lenovo Intelligent Insights with SAP Data Hub
running on SUSE CaaS Platform
BOV-1141
Marcel Scherer, Lenovo
March 25th, 2020
2
Agenda
1. Why SUSE?
2. Influencing Trends in Data
3. What is SAP Data Hub or Lenovo Intelligent Insights?
4. A Reference Architecture for SAP Data Hub
5. Use Cases
3
Why SUSE?
4
We Maintain A Vital Partnership
Joint events
Joint publications
Joint solution development
5
Influencing
Trends In Data
Cloud Computing
Objectives:
Flexibility
Economical Reasons
Central Data Storage
Facility
Ease of use
Automation
PaaS
SaaS
IaaS
7
Big Data
Extracted Business Data
Unstructured data which is
complex to analyze
Data that can be assigned to
„The four V‘s“
Volume – scale/mass of data
Variety – different forms of data
Velocity – high speed of data
streams
Veracity – uncertanty of data
AI – Artificial Intelligence
AIMachine
LearningDeep
Learning
OutputInput
DATA
9
What Is SAP
Data Hub Or
Lenovo Intelligent
Insights?
10
Single Point Of Access To All Of Your Data
Lenovo Intelligent Insights
11
Find More Sophisticated Ways To Use Data
Make data
consumable!
12
Maximize Value From Distributed Data Assets
Refineenrich, transform, reuse, curate
Discoveryour data landscape and its interconnections.
Governand secure data assets transparently with compliance
Orchestrateyour data using modular data pipelines
13
How To Create Your Menu
Data Sources Tooling
Hadoop
S/4 HANA
SalesforceIoT/Edge
Data Warehouse
Excel
Plain Files
S3 Storage
Video
Social Media
Image Data
Mobile Devices
SAP Data Hub
Tensor Flow
Kafka
R
Java
ABAP
Python
SAP HANA
14
Ease your Journey Through Digital Transformation!
Research &
Development Production LogisticsSales
IT
Administration/Board
Le
no
vo
In
telli
ge
nt In
sig
hts
Gain comprehensive insight Make company data accessible
15
Scalable Container Platform
Intelligent Insights uses a non-monolithic
micro services architecture based on
containers.
If workload exeeds resources just
cluster simply can be extended.
Program is designed to be
transportable. It can be run on
premise as well as on IaaS.
16
User Interface
17
A Reference
Architecture For
SAP Data Hub
Defined jointly by
SUSE and Lenovo
18
Goals
1. Ultimative Scalability
2. Flexibility
3. Resiliency
4. An Enterprise-ready Solution
19
Components Of The Solution
SAP Data Hub
Application
SAP Data Hub
Distributed Runtime
Kubernetes
Operating System
Server Hardware for Compute
Fabric for Networking and Storage
Storage
Server Hardware
Operating System
Hadoop
Implementation
optional
?
SUSE Container as a Service Platform
SUSE Linux Enterprise Server for SAP
Lenovo ThinkSystem SR630
SLES
ThinkSystem SR630
Hortonworks
Data Platform
Lenovo ThinkSystem 25GbE Rack Switches
Lenovo ThinkSystem SR650 + SUSE Enterprise Storage
20
21
SUSE Enterprise Storage
Unlimited Scalability with self-managing Technology
• Highly Scalable
• Reduces Cost
• Seamlessly adapts to business
needs
• Interfaces:
• Block: iSCSI, RBD
• Object: S3-API
• File: CephFS, NFS
• Integrates with CaaSP
• High Performance
22
An Overall Scalable Solution
4x SUSE Enterprise Storage
1x Kubernetes Master
3x Kubernetes Worker
1x Docker Registry
1x Dashboard
(more for HA)
4x HDFS Node (optional)
Redundand Networking (10GbE for
Compute, 25GbE for Storage,
1GbE for Management)
Dynamically provisioned
storage for
SAP Data Hub
Docker Registry
HDFS
S3 Object Storage
Compute Nodes for
Kubernetes Cluster and
Management
Connection to Managed
Systems
Compute Nodes for
Hadoop Big Data
Platform
Scale
Add
Additional
Node!
23
Sizing – A Matter Of Dependencies
DependencyCPUMain
MemoryDisk
Size &
Velocity
Complexity
of Pipelines
Concurrent
Users
SAP per cluster recommends a
minimum of 256GB main memory
for production purposes
SAP recommends a minimum of 64
„CPUs“ per cluster for production.
One CPU from a Kubernetes point of
view corresponds to one CPU core.
2.100 GB disk space (persistent
volumes) are required for running
OS and software.
5.500 GB external storage for
checkpoint store.
100 GB for each Kubernetes
Worker to store Docker images.
At least 60GB free space on
container registry node.
Number of concurrent users have
crucial influence on the defaults:
Type of user (admin, developer,
data governancer, ...)
Activity
Type of workload they produce
and run (concurrent jobs)
Size and type of data heavily
influence sizing of the overall
solution.
Velocity new data arrives at the data
source via data streams are crucial,
too, for sizing of the Kubernetes
cluster.
Example: Steady video streaming
has different requirements than
analyzing static text data.
The pipeline models can become
complex in two dimensions:
1. Structures with many entities
2. Structures including scripts that
produce heavy load
To a certain amount the purpose
must be clear in advance to allow a
proper sizing.
25
IoT Ingestion & Orchestration
Understand real-world performance
Tackle the challenge of integrating and
analyzing vast quantities of raw data
and events from disparatesemi-
structured sources, having low-level
semantics and no business context
Solve the point-to-point challenge of
distributed heterogeneous environments
spanning messaging systems, cloud
storages, SAP data management
solutions, andenterprise apps
Event-driven pipelines scaling to
executions of many pipelines in
parallel, at any time
Data Catalogingand
Governance
Understand and secure yourdata
Crawl through data stores togather
valuable metadata and store it in a
centralized information catalog
Profile source data to gain adeeper
understanding of the data to create
meaningful data pipelines
Move to centralized data access
and control for all orchestration,
data refinement, scheduling,
and monitoring
Data Science & Machine
Learning
Machine learning and predictive analytics
One unified tool to process machine
learning and advanced analytics
algorithms on any mix of engines,
both SAP (HANA PAL, Leonardo ML
etc.) and non-SAP (Python, R,
Spark, TensorFlow etc.)
On the same tool, handle data
ingestion and preparation from any
source of any kind, solving point-to-
point challenges
Easily infuse machine learning and
predictive into any target business
process
Data Warehousing
Rapidly integrate and leverage new
data sources
Acquire new data sources with
previously siloed data from
traditional data warehouses, data
marts, enterprise applications,and
Big Data stores
Combine all types of sources
including structured and
unstructured data, and enable a
large variety of processing on them
Seamlessly process large data sets
across highly distributed
landscapes and close to the data
source, moving only high value data
SAP Data Hub use case frameworks
26
Visit related lectures:
Running SAP Data Hub on Kubernetes with SUSE CaaS Platform ([TUT-1212], 2020/03/26)
by Kevin Klinger, SUSE
Running Machine Learning workloads on Kubernetes/SUSE CaaS Platform with SAP Data Intelligence ([SPO-1471],
2020/03/26)
by Andreas Engel, SAP
Q&A
27
General Disclaimer
This document is not to be construed as a promise by any participating company to
develop, deliver, or market a product. It is not a commitment to deliver any material,
code, or functionality, and should not be relied upon in making purchasing
decisions. SUSE makes no representations or warranties with respect to the contents of
this document, and specifically disclaims any express or implied warranties of
merchantability or fitness for any particular purpose. The development, release, and
timing of features or functionality described for SUSE products remains at the sole
discretion of SUSE. Further, SUSE reserves the right to revise this document and to
make changes to its content, at any time, without obligation to notify any person or entity
of such revisions or changes. All SUSE marks referenced in this presentation are
trademarks or registered trademarks of SUSE, LLC, Inc. in the United States and other
countries. All third-party trademarks are the property of their respective owners.