les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · security •protects...

14
IBM proprietary. Specifications subject to change without notice. © 2019 IBM Corporation Les nouvelles architectures data lake dans le cloud hybride Marceau GABIN Cloud Platform Sales [email protected] +33 624740334 Christophe BURGAUD Data Architect & Data Scientist [email protected] +33-612 360 852

Upload: others

Post on 13-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

© 2019 IBM Corporation

Les nouvelles architectures data lake

dans le cloud hybride

Marceau GABINCloud Platform [email protected]

+33 624740334

Christophe BURGAUDData Architect & Data [email protected]

+33-612 360 852

Page 2: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

2Page© 2019 IBM Corporation

Contents

1. The different data lake architectures

2. Data lake storage is evolving

3. Cloud Object Storage ready for data & AI

Page 3: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

3Page 3Page© 2019 IBM Corporation

Global presence of IBM in the different strategyCustomisation vs. Standardisation

Page 4: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

4Page

Data Lake Typical Architecture

Locale Cloud

Page 5: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

5Page

mgN

daN daNdaN daNdaN

mgN

mgN : Management Node (Physical or VM)

daN : Data Node (Physical or VM)

SATA diskSATA disk

SATA disk

SATA disk

SATA disk

Data Lake with Hadoop and Local Storage

Page 6: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

6Page

mgN

cpN cpNcpN cpNcpN

mgNmgN : Management Node (Physical or VM)

cpN : Compute Node (Physical or VM)

Spectrum ScalePOSIX with HDFS Transparency

Data Lake with Hadoop and Centralized Storage

Page 7: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

7Page

mgN

cpN cpNcpN cpNcpN

mgN

mgN : Management Node (Physical or VM)

cpN : Compute Node (Physical or VM)

S3

IBM

Cloud Object Storage

Public ou dédié

SATA disk

Data Lake with Hadoop and Soft Storage and COS (S3)

Page 8: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

8Page

Hybrid Data Lake Architecture with Hadoop PaaS and COS (S3) on IBM Cloud

Page 9: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

9Page© 2017 IBM Corporation

IBM Cloud Object StorageRedefines availability, security and economics of data storage.

Always-on

Availability

• Tolerates a catastrophic regional outage without down time or intervention

• Continuous availability architecture

• Some legacy providers place the burden of data management and cost for creating and

maintaining an out of region second copy on the client

Built-in

Security• Protects against digital and physical breeches

• Provide strong data-at-rest confidentiality by combining encryption and information dispersal

Better Cloud

Storage

Economics

• More cost-efficient than competitors

• Enterprise class support, at no additional cost

• We cap costs for workloads with unpredictable data access pattern

Simplicity• Immediate consistency (v. eventual consistency) simplifies app development

• Integrated with the rest of IBM Cloud, IBM Bluemix, IBM Watson, IBM Video Services

• No “black boxes,” we share how our technology delivers durability, availability, security

Page 10: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

10Page 10Page© 2019 IBM Corporation

Cloud Object Storageis ideal for Spark analytics

– Data not well protected in HDFS

– Need to scale and manage long-lived, tightly coupled compute and storage infrastructure

– Data well protected in Cloud Object Storage

– Use infinitely scalable, cloud managed storage and ephemeral compute.

Traditional deployment Deployment with Object Storage

HDFS HDFS

Cloud Object Storage

Spark SparkSpark Spark

Page 11: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

11Page 11Page© 2019 IBM Corporation

IBM SQL Query Service

Page 12: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

12PageIBM Cloud/ © 2019 IBM Corporation

12

AI & Big Data AnalyticsCloud Object Storage is an integral part of IBM Analytics Engine, Watson Studio, SQL Query and other IBM Cloud Services to provide self-service data analytics and business intelligence solutions that go well beyond the scalability, security, and cost efficiencies of traditional solutions.

Perform Apache Spark Analytics directly against data stored in Object Storage

IBM Cloud Object Storage offers optimized connectively to Apache Spark and can be

used as a low-cost, scalable persistent storage layer for analytics.

Query data in place

Combining SQL Query with data in IBM Cloud Object Storage creates

an active workspace for a range of big data analytics use cases. IBM SQL Query is a serverless, interactive querying service for analyzing data directly in IBM

Cloud Object Storage.

12

Build and Analyze IoT Pipelines

IBM Cloud Object Storage is perfectly suited to storing massive amounts of IoT data at low cost and allows analytics frameworks

to access the data directly. Data pipelines can be easily set up and managed to generate analytics-ready data, which can be

analyzed directly by Watson using Spark as a Service.

Move data from HDFS clusters to Cloud Object Storage

Free up space on expensive Hadoop cluster by using IBM Big Replicate to efficiently move data between Hadoop data clusters.

You can also use IBM COS Distributed Copy (DistCp), an open source tool for migrating large amounts data from Hadoop to Cloud Object

Storage.

Store data for AI training models

IBM Cloud Object Storage is integrated with IBM Watson Studio to accelerate machine and deep learning

workflows required to infuse AI into your business. Build and train AI models, and prepare and analyze

data, in a single, integrated environment.

Page 13: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

13Page© 2019 IBM Corporation

Thank you

Q&A

Page 14: Les nouvelles architectures data lake dans le cloud hybride · 2019-01-23 · Security •Protects against digital and physical breeches •Provide strong data-at-rest confidentiality

IBM proprietary. Specifications subject to change without notice.

© 2019 IBM Corporation

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE AUTHOR MAKE NO

REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE

ACCURACY OR COMPLETETENSS OF THE CONTENTS OF THIS WORK

AND SPECIFICALLY DISCALIM ALL WARRANTIES, INCLUDING WITHOUT

LIMITATION WARRANTIES OF FITNESS FOR A PARTICUALR PURPOSE.

THE SOWFTWARE VERSION OR SOFTLAYER’S OFFERING MAY HAVE

CHANGED OR DISAPPERAED BETWEEN WHEN THIS WORK WAS

WRITTEN AND WHEN IT IS READ.

THE PRICE IS FOR PLANNING PURPOSES ONLY AND IS NOT A FINAL

COMMITMENT BY IBM. THIS PRICE IS SUBJECT TO CHANGE FOR

REASONS SUCH AS, BUT NOT LIMITED TO, REFINEMENT OF SCOPE AND

ASSUMPTIONS AND DEPENDENCIES, AND NEGOTIATION OF TERMS AND

CONDITIONS. A FIRM PRICE PROPOSAL CAN BE PREPARED AT THE

CUSTOMER'S REQUEST.

Disclaimer