cwin16 - paris- m rapid

16
For internal use only Move to industrialized Big data : mRapid ingestion framework Paris, 26/09/2016, Edmond SEGALEN

Upload: capgemini

Post on 09-Jan-2017

194 views

Category:

Technology


0 download

TRANSCRIPT

For internal use only

Move to industrialized Big data :

mRapid ingestion framework Paris, 26/09/2016, Edmond

SEGALEN

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 2

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Table of Contents

Why mRapid?

What mRapid is doing?

Reference architecture

How to: process flows

Roadmap

Contact information

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 3

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Why mRapid?

You have hundreds of databases, a

mainframe, thousands of files (CSV, flat files, JSON, XML, PDF…)

to ingest to data lake?

For accelerating such volumes of internal or

external data ingestion: Capgemini created an solution

named: mRapid.

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 4

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Table of Contents

Why mRapid?

What mRapid is doing?

Reference architecture

How to: process flows

Roadmap

Contact information

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 5

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

What mRapid is doing?

It’s Capgemini metadata driven ingestion framework for data lake,

Leverages Capgemini’s in-house accelerators as well as Hortonworks Data Flow (HDF)/

Apache NiFi for various ingest patterns such as:

JSON to AVRO,

XML to AVRO,

RDBMS to AVRO,

Kafka/JMS ingest,

Web services ingest,

compliance with Rest API.

Benefits of mRapid:

Storage options are NOT limited to Hive, but can be extended to

provide option for appropriate big data storage technology, such as HDFS, NoSQL in

addition to Hive,

leverage efficient storage formats like Avro, ORC and Parquet,

leverage compression codec like Snappy, LZMA.

Lower time to market and faster on-boarding of new source systems

Better control on the SLA parameters (expected duration, due dates),

Supports migration from existing workloads as well as existing warehouses and

analytics platforms. Common and streamlined ingestion utility for various ingestion

patterns,

Reconciliation and exception alerting.

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 6

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Table of Contents

Why mRapid?

What mRapid is doing?

Reference architecture

How to: process flows

Roadmap

Contact information

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 7

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Reference architecture

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 8

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Table of Contents

Why mRapid?

What mRapid is doing?

Reference architecture

How to: process flows

Roadmap

Contact information

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 9

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Data ingestion modules process flow

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 10

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Operations metadata process flow

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 11

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Source data structure change management process flow

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 12

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Table of Contents

Why mRapid?

What mRapid is doing?

Reference architecture

How to: process flows

Roadmap

Contact information

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 13

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

mRapid roadmap

What have we already done?

Ingestion Archetypes comprehended at the moment

Mainframe EBCDIC (Fixed width) to HDFS/Hive

SAS Dataset to HDFS/Hive

Delimited or Fixed Width file to HDFS/Hive

Integration with any industry standard CDC tool

Seamless integration with Hadoop platforms

As-Is transfer from native to Hadoop

JSON

XML

Weblogs

Basic Integration with HDF/Apache NiFi

Enables the creation of 100s of ingestion jobs

programmatically

Exposing mRapid as a web service

What are we building now?

Enhanced Audit Logging and Operations Metadata

Real-time source integration

Integration with authentication, authorization tools

Apache Atlas integration

Advanced Nifi flow and orchestration with HDF 2.0

Improved GUI of MetaApp

Column Mapping enhancement

Data Steward mRAPID Job creation service

SOAP XML Message

Command Centre /

External App mRAPID

Job execution service

SOAP XML Message

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 14

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Table of Contents

Why mRapid?

What mRapid is doing?

Reference architecture

How to: process flows

Roadmap

Contact information

Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 15

Industrialisation des projets Big data : mRapid | 26 Septembre 2016

Contact information

Manuel

Sevilla I&D Global

Global Head of Big Data, Analytics and MDM

[email protected]

Insert

contact

picture

Insert

contact

picture

Insert

contact

picture

Insert

contact

picture

Anne-Laure

Thieullent I&D Global

Big Data Europe Director

[email protected]

Sunil

Patil I&D India

Manager

[email protected]

Edmond

SEGALEN I&D France

Chief Architect

[email protected]

www.capgemini.com

The information contained in this presentation is proprietary and confidential.

It is for Capgemini and Sogeti internal use only. Copyright © 2016 Capgemini and Sogeti. All rights reserved.

Rightshore® is a trademark belonging to Capgemini.

No part of this document may be modified, deleted or expanded by any process or means without prior written permission from Capgemini.

www.sogeti.com

About Capgemini and Sogeti

With more than 180,000 people in over 40 countries, Capgemini is a

global leader in consulting, technology and outsourcing services. The

Group reported 2015 global revenues of EUR 11.9 billion. Together

with its clients, Capgemini creates and delivers business, technology

and digital solutions that fit their needs, enabling them to achieve

innovation and competitiveness. A deeply multicultural organization,

Capgemini has developed its own way of working, the Collaborative

Business Experience™, and draws on Rightshore®, its worldwide

delivery model.

Sogeti is a leading provider of technology and software testing,

specializing in Application, Infrastructure and Engineering

Services. Sogeti offers cutting-edge solutions around Testing,

Business Intelligence & Analytics, Mobile, Cloud and Cyber

Security. Sogeti brings together more than 23,000 professionals in

15 countries and has a strong local presence in over 100 locations

in Europe, USA and India. Sogeti is a wholly-owned subsidiary of

Cap Gemini S.A., listed on the Paris Stock Exchange.