cwin16 - paris- m rapid
TRANSCRIPT
For internal use only
Move to industrialized Big data :
mRapid ingestion framework Paris, 26/09/2016, Edmond
SEGALEN
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 2
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
Why mRapid?
What mRapid is doing?
Reference architecture
How to: process flows
Roadmap
Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 3
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Why mRapid?
You have hundreds of databases, a
mainframe, thousands of files (CSV, flat files, JSON, XML, PDF…)
to ingest to data lake?
For accelerating such volumes of internal or
external data ingestion: Capgemini created an solution
named: mRapid.
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 4
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
Why mRapid?
What mRapid is doing?
Reference architecture
How to: process flows
Roadmap
Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 5
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
What mRapid is doing?
It’s Capgemini metadata driven ingestion framework for data lake,
Leverages Capgemini’s in-house accelerators as well as Hortonworks Data Flow (HDF)/
Apache NiFi for various ingest patterns such as:
JSON to AVRO,
XML to AVRO,
RDBMS to AVRO,
Kafka/JMS ingest,
Web services ingest,
compliance with Rest API.
Benefits of mRapid:
Storage options are NOT limited to Hive, but can be extended to
provide option for appropriate big data storage technology, such as HDFS, NoSQL in
addition to Hive,
leverage efficient storage formats like Avro, ORC and Parquet,
leverage compression codec like Snappy, LZMA.
Lower time to market and faster on-boarding of new source systems
Better control on the SLA parameters (expected duration, due dates),
Supports migration from existing workloads as well as existing warehouses and
analytics platforms. Common and streamlined ingestion utility for various ingestion
patterns,
Reconciliation and exception alerting.
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 6
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
Why mRapid?
What mRapid is doing?
Reference architecture
How to: process flows
Roadmap
Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 7
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Reference architecture
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 8
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
Why mRapid?
What mRapid is doing?
Reference architecture
How to: process flows
Roadmap
Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 9
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Data ingestion modules process flow
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 10
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Operations metadata process flow
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 11
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Source data structure change management process flow
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 12
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
Why mRapid?
What mRapid is doing?
Reference architecture
How to: process flows
Roadmap
Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 13
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
mRapid roadmap
What have we already done?
Ingestion Archetypes comprehended at the moment
Mainframe EBCDIC (Fixed width) to HDFS/Hive
SAS Dataset to HDFS/Hive
Delimited or Fixed Width file to HDFS/Hive
Integration with any industry standard CDC tool
Seamless integration with Hadoop platforms
As-Is transfer from native to Hadoop
JSON
XML
Weblogs
Basic Integration with HDF/Apache NiFi
Enables the creation of 100s of ingestion jobs
programmatically
Exposing mRapid as a web service
What are we building now?
Enhanced Audit Logging and Operations Metadata
Real-time source integration
Integration with authentication, authorization tools
Apache Atlas integration
Advanced Nifi flow and orchestration with HDF 2.0
Improved GUI of MetaApp
Column Mapping enhancement
Data Steward mRAPID Job creation service
SOAP XML Message
Command Centre /
External App mRAPID
Job execution service
SOAP XML Message
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 14
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
Why mRapid?
What mRapid is doing?
Reference architecture
How to: process flows
Roadmap
Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 15
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Contact information
Manuel
Sevilla I&D Global
Global Head of Big Data, Analytics and MDM
Insert
contact
picture
Insert
contact
picture
Insert
contact
picture
Insert
contact
picture
Anne-Laure
Thieullent I&D Global
Big Data Europe Director
Sunil
Patil I&D India
Manager
Edmond
SEGALEN I&D France
Chief Architect
www.capgemini.com
The information contained in this presentation is proprietary and confidential.
It is for Capgemini and Sogeti internal use only. Copyright © 2016 Capgemini and Sogeti. All rights reserved.
Rightshore® is a trademark belonging to Capgemini.
No part of this document may be modified, deleted or expanded by any process or means without prior written permission from Capgemini.
www.sogeti.com
About Capgemini and Sogeti
With more than 180,000 people in over 40 countries, Capgemini is a
global leader in consulting, technology and outsourcing services. The
Group reported 2015 global revenues of EUR 11.9 billion. Together
with its clients, Capgemini creates and delivers business, technology
and digital solutions that fit their needs, enabling them to achieve
innovation and competitiveness. A deeply multicultural organization,
Capgemini has developed its own way of working, the Collaborative
Business Experience™, and draws on Rightshore®, its worldwide
delivery model.
Sogeti is a leading provider of technology and software testing,
specializing in Application, Infrastructure and Engineering
Services. Sogeti offers cutting-edge solutions around Testing,
Business Intelligence & Analytics, Mobile, Cloud and Cyber
Security. Sogeti brings together more than 23,000 professionals in
15 countries and has a strong local presence in over 100 locations
in Europe, USA and India. Sogeti is a wholly-owned subsidiary of
Cap Gemini S.A., listed on the Paris Stock Exchange.