exploring new big data architectures
TRANSCRIPT
Use this title slide only with an image
Timo Elliott, July 2016
Exploring New Big Data Architectures
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 2Internal
What Is Big Data?
Nah, just kidding…
Have you noticed that “false information” spelled backwards is “false information”?
Did you know that THIS MORNING there is more data in the world
than EVER BEFORE?!
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 3Internal
Big Data Architectures = Digital Business
By 2018, 40% of enterprise architecture teams teams will be distinguished as leaders by their primary focus on applying disruptive technologies to drive business innovation.
By 2018, 40% of enterprise architecture teams will be responsible for advancing the organization's digital business strategy.
By 2018, the new economics of connections will drive organizations to increase investments in connected physical assets and systems by 30%.
By 2018, 20% of enterprise architects will use business ecosystem modeling to identify and predict business moments.
By 2017, 20% of EA will be responsible for identifying new business designs that leverage business algorithms.
Source: Gartner, Predicts 2016: Five Key Trends Driving Enterprise Architecture Into the Future
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 4Internal
“Modern BI”
DATA Self-servicedata preparation
Structured/Unstructured
Internal/External
Batch/Streaming
Integration, blending
Cleansing, augmentation
Agile modeling
BI DBColumnar
In-memory
Self-servicedata analysis
Data discovery
Visual exploration
Dashboards/storytelling
Agile Iteration
OptionalData warehouse
Semantic layers
OLAP Cubes
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 5Internal
Data-Driven Approach
Push:• From IT• Data-Driven• Data to Insight• Technology-Centric
A.S.P.I.R.E.
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 6Internal
Value-Driven Approach
Pull:• From LOB• Outcome-Driven• Insight to Data• Use-Case-Centric
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 7Internal
Combination Approach
Push:• From IT• Data-Driven• Data to Insight• Technology-Centric
Pull:• From LOB• Outcome-Driven• Insight to Data• Use-Case-Centric
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 8Internal
Invest in Big Data Architectures
INFORMATION ANALYTICS
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 9Internal
Invest in Self-Service Data Discovery Tools
“Through 2020 spending on self-service visual discovery and data preparation market will grow 2.5x faster than traditional IT-controlled tools for similar functionality”
– IDC, 2015
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 10Internal
Invest in Self-Service Data Preparation
SAP Agile Data Preparation
I.e., “Data Blending” — combine, merge, cleanse data
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 11Internal
SAP BusinessObjects Cloud
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 12Internal
You Need Both of These…
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 13Internal
Architecture as Platform For the Future: Innovate & Renovate
Source: Hortonworks
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 14Internal
A Common Question
“We like SAP ERP (and HANA), we like Hadoop, and your BI tools are a standard. But we don’t understand how it’s all going to fit together. Help!”
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 15Internal
What is Hadoop?
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 16Internal
It’s Time to Hug The Elephant!
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 17Internal
“Classic” Hadoop Use Cases
Semi-structured data loading / processing• First web data, now IoT / documents / images, etc.
Offload traditional relational DW• Typically no reduction in existing DW, but new data increasingly tiered
Queryable alternative to tape backups• E.g. when upgrade to different ERP system, keep copy of all old data
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 18Internal
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 19Internal
Coca-Cola East Japan Architecture
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 20Internal
Other Interesting Hadoop Use Cases
Fast scale up / down• Game apps company: big fan of Teradata, and found it cheaper to run than Hadoop, but when
individual games became a hit, they needed to be able to scale up (and down) fast
Avoid “brittle” ETL, push schema creation to the business• Large investment bank had dozens of different CRM setups, thousands of ETL jobs that kept
breaking – kept traditional DW, but added data lake -- “it’s all in there – have fun!”
Excel on steroids / exploration• Big, one-off decisions• We don’t know what we don’t know
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 21Internal
Sandboxing / Data Extensions
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 22Internal
Not Just a Data Store – A Platform
Far more than a batch-driven data store• Many still have an out of date view – Yarn / Spark etc• ”Data at Rest and Data in Motion”• But still not for “transactions” any time soon
Still maturing, still a lot of work, but has proved enterprise value• In particular, overcame biggest security & auditing concerns – Kerberos integration, encryption,
tokenization, Apache Ranger… • Low capital costs to try things out (but don’t underestimate time / training / expertise needed)
Considered the heart of “digital transformation” in some large organizations…• ...At least by the team implementing Hadoop! (but there’s typically a large ”traditional IT”
modernization effort going on at the same time)
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 23Internal
Centrica (British Gas)
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 24Internal
Zurich Insurance
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 25Internal
Renault Big Data Architecture
“We just intercept data in motion”• Two-speed taken to extreme QualitySales and
MarketingSupply Chain Engineering
Consumers
Open DataInternet of Things
Producers
Batch (RDBMS, Files)
Messages, Logs
Streaming, Data Flow
NFS Gateway, Sqoop, Spark SQL
FLUMELOGSTASH
KAFKA PRODUCERS Kafka
Broker(Topics)
Spark Streaming
Elasticsearch
HBASEHIVEHDFS
Spark SQLSpark RDD
YARN + HDFS
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 26Internal
Apache Atlas – Open Data Governance
Data Classification• Import or define taxonomy business-oriented annotations for data
• Define, annotate, and automate capture of relationships between data sets and underlying elements including source, target, and derivation processes
• Export metadata to third-party systems
Centralized Auditing• Capture security access information for every application, process, and interaction with data
• Capture the operational information for execution, steps, and activities
Search & Lineage (Browse)• Pre-defined navigation paths to explore the data classification and audit information
• Text-based search features locates relevant data and audit event across Data Lake quickly and accurately
• Browse visualization of data set lineage allowing users to drill-down into operational, security, and provenance related information
Security & Policy Engine • Rationalize compliance policy at runtime based on data classification schemes
• Advanced definition of policies for preventing data derivation based on classification (i.e. re-identification)
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Tag Based Policies
Data Lifecycle Management
Real Time Tag Based Access Control
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Energy
PPDM
Retail
PCIPII
Other
CWM
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 27Internal
Atlas & Ranger
Source: Hortonworks
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 28Internal
Apache Nifi – Data in Motion
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 29Internal
Apache Flink – Data Stream Analytics
Flink
Historic data
Kafka, RabbitMQ, ...
HDFS, JDBC, ...
ETL, Graphs,Machine LearningRelational, …
Low latency,windowing, aggregations, ...
Event logs
Real-time data streams
(master)
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 31Internal
Zeppelin – Analytic “Notebooks”
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 32Internal
Result of All This: Data Complexity For The Foreseeable Future
Data Warehouse
Hybrid Transaction/
Analytical Processing
Hadoop,MongoDB,Spark, etc Personal
Data / BI
Where does data arrive?When does it need to move?Where does modeling happen?What can users do themselves?What governance is required?
Big Data Architectures got complicated
What we would like — consistent, seamless solution
Data
Feeds
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 33Internal
An Example “Target Big Data Architecture” ETL
IngestionExtracting data from source systems and making it available for up-stream consumption
SourcesExisting and new data sets from external and internal sources
Big Data Platform (Data Lake)Core technology set enabling very high volume computation and storage for raw data and ready to use processed data.
RelationalTraditional RDBMS
Performance ClustersFit-for-purpose clusters targeting real-time and near-real time use cases providing faster storage and access to data
Real-time StreamingReal-time ingestion of data, enabling event processing and visualization
DataServices & Interface LayersETL and APIs that allow data to be extracted from the data platforms and be further analyzed , visualized or exported
Exploratory AnalyticsNew and existing applications to support data discovery and advanced analytics
Application ConsumptionDashboards, reporting and web services to expose the underlying data to external users
Data Management and GovernanceCentralized user management for proper authentication and authorization, meta data management.
InMemory/Appliance
EDWTrad sources
CustomerMobile value chainFixed value chain
Network probes
Machine Logs
Interaction logs
Social media
Others
Event stream Processing
APIs
Connectors
ODBCInformatica
BusinessObjects
Customer facing services
SAS
Others
SAS Visual Analytics
SAS EG/EM
New Analytical Tools
Existing New
Ready to use (Hadoop)
Raw data (Hadoop)
Black boxSemantic
Layer
Splunk
Splunk
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 34Internal
The SAP focus: End-to-end value chain
SAP HANA PlatformSPATIAL
PROCESSING
ANALYTICS, TEXT, GRAPH, PREDICTIVE
ENGINES
CONSUME
COMPUTE
STORAGE
SOURCE
INGEST
Application Development Environment
Transformations & Cleansing
Smart Data IntegrationSmart Data Quality
StreamProcessing
Smart Data Streaming
STREAM PROCESSING
LogsTextOLTP Social Machine GeoERP SensorStore & forward
Mobile applications and BI
Smart Data Access
Virtual Tables
User Defined Functions
101010010101101001110
Dynamic Tiering
Aged datain Disk
In-Memory
Data model& data
Calculation engine
Fastcomputing
Column Storage
High performance analytics
Series Data Storage
Store time-series data
Reporting &Dashboards
High Performance Applications
Data Exploration& Visualization
Adhoc & OLAP Analytics
PredictiveAnalysis
Business Planning & Forecasting Lumira / BI
Hadoop / NoSQL
MapReduce
YARN
HDFS
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 35Internal
The Journey so far..HANA & Hadoop Integration
HANA & Hadoop Integration SQL on Hadoop via Smart Data Access (virtual tables)
– Hive (SPS06) Remote caching with Hive (SPS07) Connectivity to Apache Spark using ODBC Execution of MR-Jobs via HANA (Virtual Functions)
and direct access to HDFS (SPS 09) Spark SQL adapter via SDA (SPS10) Join relocation to Hadoop thru SparkRDD Unified Admin thru Ambari integration for Hortonworks
Key Benefits Deep Integration for storage & processing Optimized data access between HANA & Hadoop Data tiering to Hadoop for cold storage
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 36Internal
Data tiering with SAP HANA
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 37Internal
Data Lifecycle Manager (DLM) for Hadoop as a tier
Define a data aging strategy with DLM Leverage SAP HANA Dynamic Tiering (Warm-Store), Hadoop or SAP Sybase IQ in SAP HANA native use cases with a tool based approach to model aging rules on tables to displace ‘aged’ data to HANA extended tables to optimize the memory footprint of data in SAP HANA.
SAP HANA
Data Lifecycle Manager
HOT-STORE(Column Table)
WARM-STORE(Extended Table)
DATA MOVEMENT
*
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 38Internal
SAP HANA VoraWhat’s Inside and What Does It Do?
DemocratizeData Access
Make PrecisionDecisions
SimplifyBig DataOwnership
SAP HANA Vora is an in-memory query engine which leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop. Drill Downs on HDFS
Mashup API EnhancementsCompiled Queries
HANA-Spark AdapterUnified LandscapeOpen Programming
Any Hadoop Clusters
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 39Internal
SAP Big Data Platform – “Hadoop Inside”Vision
HANA native BigData Dynamic Tiering Smart Data Streaming NoSQL | Graph | Geo |
TimeSeries
HANA & Hadoop SDA Hive | Spark MapReduce | HDFS Admin & Monitoring User Mgmt / Security
Hadoop Extension Vora Engine Integrated with HANA and
Hadoop
HANA Data Management Platform
Instant Results
SAP HANAIn-Memory
0.0sec ∞Infinite Storage Raw Data
HADOOPVora
Information Management | Text | Search | Graph | Geospatial | Predictive
Smart Data Streaming
Administration | Monitoring | Operations | User Management | Security
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 40Internal
Key Features -- Vora SQL Engine
#FEA433
Components
Written FromScratch
Multi Platform
Compressed Columns
Parallel QueryProcessing
In Memory Storage Fast Column Scans
Cache EfficientAlgorithms
Code Generation
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 41Internal
SAP HANA Vora Modeler
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 42Internal
SQL/OLAP on Big Data
• Hierarchical data storage of contextual data supports structured analysis
• Fast drill-down interaction aids in root-cause analysis
• Familiar OLAP tool enables experienced business analysts derive useful insights from contextual data
• Support for HDFS, Parquet and ORC formats
• LLVM/Clang – JIT compilation of query plans and execution
Hadoop/NoSQL DATA
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 43Internal
SQL-on-Hadoop using Vora
A different context allows access to SAP HANA data from Spark SQL
Creates an in-memory data object, similar to a Spark dataframe
Load data from HDFS, temp dable will be distributed across Hadoop cluster
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 44Internal
SAP Predictive Analytics 3.0
Native Spark Modeling
Standalone or included in SAP HANA
Predictive Factory
Integration with cloud & other apps
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 45Internal
DW Directions
SAP HANA DW SAP HANA DWSAP HANA DWOptional Components
DW Foundation
PowerDesigner
HANA EIM
Business Warehouse
SAP HANA Platform
Planning and Definition2015
Market presence in Data Warehousing with a clear roadmap
Strong and simplified offering with tight integration
Convergence into one technology stack addressing BW and SQL-based
DW needs
DWH Foundation
PowerDesigner
HANA EIM
Business Warehouse
SAP HANA Platform
DW Modeling DW ETL & DM
SAP HANA Platform
Analytics , BI Suite, Predictive Analytics , BI Suite, Predictive Analytics , BI Suite, Predictive
HadoopSAP HANA Vora
HadoopSAP HANA Vora
HadoopSAP HANA Vora
This is the current state of planning and may be changed by SAP at any time.
Execution and Delivery2016-2018 Vision
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 46Internal
SAP HANA DW – Future-proof data management platform
?
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 48Internal
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 49Internal
The Big Big Picture
Embrace Hadoop as if it were SAP technology
HANA Hadoop
What SAP does best: business process (live!)
Vora
“infrastructure”
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 50Internal
Looking Forward to the Future: “Data Refineries”
Nobody believes that a single big data warehouse is THE solution any more• But they’re not going away any time soon • “Data warehouses are dead! Long live data warehousing!”
Instead:
Enterprise Information Catalog – transparency• Search for data: origin, owner, trust level, sensitivity, formats, how to order…
Data Factories – workflows, not just data• The collective know-how on getting, refining, displaying data
More info from Mike Ferguson, here:http://www.slideshare.net/HadoopSummit/organising-the-data-lake-information-management-in-a-big-data-world
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 51Internal
Suite
Applications
S/4HANA
DigitalBoardroomIcon
Analytics
C4A
BOBJ
ExtensionsApplicationsIoT
HANA Cloud Platform
(Micro-) Services
IoTPlatform
Identity Management
Business Network
CEC
Platform
HANAEnterprise
Computing Platform
any DB Hadoop
VoraDistributed Computing
Platform
SAP Platform for Digital Transformation