big data presentation, explanations and use cases in industrial sector

26
Big Data explanations & use cases in industrial sector September 2015 Nicolas SARRAMAGNA https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587

Upload: nicolas-sarramagna

Post on 10-Feb-2017

440 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Big data presentation, explanations and use cases in industrial sector

Big Data

explanations &

use cases in industrial sector

September 2015

Nicolas SARRAMAGNA https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587

Page 2: Big data presentation, explanations and use cases in industrial sector

CONTENTS

What’s Big Data ?

1. Definition, 3 V

2. General use cases

3. Technologies used

4. Market Overview

Big Data in Industrial sector

1. What for ?

2. Vision

3. Demo Poc / PoV

Page 3: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data – 3V

SEPTEMBER 2015

3

BIG DATA : New contexts on data -> 3V

New business ambitions, new technologies

VOLUME : MASSIFICATION AND AUTOMATION OF DATA EXCHANGES 80% data created last 12 months

30 billions of contents on FB each month, Flickr 5 billions of page, 2 billions videos read on sur Youtube each day

VARIETY : MULTIPLICATION OF SOURCES AND TYPES Mails, documents, logs (applications, networks, systems), databases, sensor data, open data, social networks,

blogs, forums, articles, browsing history, geolocation data, …

Structured data (DB), semi-structured (html page, tweet, xml), unstructured (mail content, excel, ppt, video, audio)

VELOCITY : NEED TO COLLECT AND PROCESS DATA IN REAL TIME Risk management (fraud, security of the SI – SIEM)

Real time route optimization

Personalized advertising

Page 4: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data – new technologies

SEPTEMBER 2015

4

BIG DATA : More efficient components but also throughput I/O -> grid architecture

New technological knowledge : storage of large volumes of data in a cluster at a lower cost, distributed computing,

data mining industrialized, on-demand IT architecture with the cloud

ORIGIN OF BIG DATA index the web and search engine for Google, Yahoo - years ~2006

Page 5: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data - general use cases IT

SEPTEMBER 2015

5

COMPLETE THE ARCHITECTURE OF THE DATA Vision of a Data lake / Enterprise data hub

Bringing closer data applications and not duplicate data for each application

"Deliver" managed data

REDUCE STORAGE COSTS AND COMPUTING COSTS Big Data technologies use commodity hardware and / or cloud and parallel computing

STRONG TECHNICAL CONSTRAINTS Manage + 1000 transactions / seconde

Flow of + 1000 events to collect / seconde

Computing + 10 threads /core cpu

Storage of data set +10To for actions

Require major adaptations and material logic without big data technologies

Page 6: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data - general use cases business

SEPTEMBER 2015

6

END-USER CENTRIC Products recommendation

Optimization of ads

PROCESS CENTRIC Detection of unexpected events : fraud, network, predictive maintenance

Path optimization

DIVERSIFICATION OF THE BUSINESS MODEL Orange : resale of geolocation data

Page 7: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data – misconceptions

SEPTEMBER 2015

7

Only used for

unstructured data

Only needed for

massive data sets

Only available from

open-source

Replaces my current

BI platform

Used with structured

and unstructured data

To store and analyse

all size of data

It is complimentary to

our existing BI

strategy and

investments

Big Data will become esential for Business Intelligence

All big editors are on

the bridge

Page 8: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data – BD completes the architecture of the data

SEPTEMBER 2015

8

Page 9: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data – BI opportunities

SEPTEMBER 2015 FOOTER CAN BE PERSIZED AS FOLLOW: INSERT / HEADER AND FOOTER

9

THE PAST - BI

BIG DATA ANALYTICS

Page 10: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data - technologies under the hood - standard Hadoop

SEPTEMBER 2015 FOOTER CAN BE PERSONALIZED AS FOLLOW: INSERT / HEADER AND FOOTER

10

PLATEFORME HADOOP

Page 11: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data - technologies under the hood

SEPTEMBER 2015 FOOTER CAN BE PERSONALIZED AS FOLLOW: INSERT / HEADER AND FOOTER

11

COLLECT Spark, flume, Sqoop

Inject data into HDFS and NoSql DB : command line, API REST, API Java, streaming injection, massive injection,

from RDBMS injection

STORAGE Cloud, Hadoop -> distributed file system HDFS (large and small data set)

NoSql, : not only sql : db distributed, schema-less : CAP theorem, DB key-value, column, document, graph oriented

Page 12: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data - technologies under the hood

SEPTEMBER 2015

12

ANALYSIS Data Science, Map / Reduce, Spark

Analysis, clean data

Goal : build a model

Machine Learning : 1 data set to train the model (67% of the data set), 1 data set to evaluate the model (33%)

VISUALIZATION DataViz : all visual representation techniques to do data mining.

Build indicators decision easier

Give indicator whatever size or type of data

Innovate : give new perspectives to discover new opportunities

Tableau, QlikView, Power Pivot

Take data with ODBC connector, JDBC connector, API REST, native connector of the DataViz tool

Page 13: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data - technologies under the hood

SEPTEMBER 2015

13

CONCEPTS OF A BIG DATA ARCHITECTURE

Data and actions distributed : the file-system, jobs (Map/Reduce, Spark, …) , databases (noSql)

Data and actions co-location : replication, treatments strategy in Hadoop

Horizontal elasticity : master / nodes architecture

Shared nothing : when a node breaks down, no data is lost. Each node is independent.

Design for failure : when a node breaks down, the cluster continues to work.

Page 14: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data - technologies under the hood

SEPTEMBER 2015 FOOTER CAN BE PERSONALIZED AS FOLLOW: INSERT / HEADER AND FOOTER

14

HDFS : HADOOP DISTRIBUTED FILE SYSTEM Name node : master of the system. Maintains and manages blocks presents on the datanodes

Data nodes : slaves deployed on each machine and provide actual storage. Serve read and write requests for the

clients

Page 15: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

What’s Big Data – technologies under the hood - storage costs

SEPTEMBER 2015 FOOTER CAN BE PERSIZED AS FOLLOW: INSERT / HEADER AND FOOTER

15

USE COMMODITY HARDWARE In Big Data, the data center is not a collection of servers but is a collection of co-located cpus, ram and local disks

1 MILLION $ GETS ->

Page 16: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

COTS DISTRIBUTION Cloudera, n°1

Hortonworks, n°2

MapR, n°3

CLOUD (BASED ON A DISTRIB) Microsoft – Azure

Amazon - AWS

APPLIANCE EDITEURS, COSTS++ Terradata

Oracle

What’s Big Data - market Overview

SEPTEMBER 2015

16

leaders

Page 17: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

CLOUDERA Business model editor, 5-6k€ / year / node

Amazon deploy Cloudera

Better maturity than others distributions

HORTONWORKS Free, business model based on support : 15k€ / year / slot of 4 nodes or per slot of 50To

Azure, Amzon deploy Hortonworks

Less mature than Cloudera on security, administration

MAPR Business model editor

Divergence with the standard Hadoop

Big Data – positioning of the distributions

SEPTEMBER 2015

17

020406080

100

Cloudera

Hortonworks

MapR

Between distributions, ratio 1 to 4

Page 18: Big data presentation, explanations and use cases in industrial sector

CONTENTS

What’s Big Data ?

1. Definition, 3 V

2. Use cases

3. Technologies under the hood

4. Market Overview

Big Data in Industrial sector

1. What for ?

2. Vision

3. Demo Poc / PoV

Page 19: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Big Data in Industrial sector – What for ? - use cases IT

BUILD A DATA LAKE

Reduce cost, move cold data from DataWarehouse

Break the storage of the data in silos

Stock raw data and can work (data mining) with all of the data

Open the data, enrich them with metadata

LOG ANALYSIS AND MONITORING - SIEM Monitoring of applications, networks, systems logs -> Splunk

PREDICTIVE MAINTENANCE Monitoring of sensor data, predict breakdowns inter plants

SEPTEMBER 2015 FOOTER CAN BE PERSONALIZED AS FOLLOW: INSERT / HEADER AND FOOTER

19

Page 20: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Big Data in Industrial sector – What for ? - use cases HR

SKILLS VISION AND MANAGEMENT

Cross informations from professional networks : viadeo, linkedin and internal HR informations : build a map of the

skills in PO

Build and manage groups of skills, enrich internal RH tools

E REPUTATION

Follow in real time the data about your brand, about the competitors, the customers

Monitoring of social networks (twitter, facebook), press news, financial news, forums, blogs, …

Quickly react in according with the results if necessary

SEPTEMBER 2015

20

Page 21: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Big Data in Industrial sector – What for ? - use cases Marketing

VISION 360 OF CUSTOMERS, SUPPLIERS, COMPETITORS

Have as much information about a company : social, legal, financial, competitive position.

Evaluate risk, opportunity to work together

VISION OF THE ROI OF PLANTS

Real-time indicators from plants : invest, number of bumpers, tanks

Rank the plants, predict gain

SEPTEMBER 2015

21

Page 22: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Big Data in Industrial sector – Vision & Roadmap

2016 : BEGIN TO BUILD A DATA LAKE

Make the data directly available for BI, Data Science and / or to transfer it in a Datawarehouse

Collect data and manage it (who has access, metadata)

Infrastructure : hybrid with cloud / on premise / appliance ?

2016 : CREATE A NEW CROSS-DIVISION SERVICE AROUND THE DATA

DataViz : create reporting, use your current dataViz tools -> current BI analyst, no change

Data IS : know his data and could give metadata to classify it -> current IS , no change

Data engineer : use collecting tools, coding jobs, transform data -> new skills

Data Administrator IT : Big Data architecture integration and monitoring -> new skills

Data Analysis & data mining : cross analysis the data, apply models, design indicators to the dataViz -> new skills

2016+ : IMPLEMENT OTHER USER CASES Begin small and accelerate

SEPTEMBER 2015

22

Page 23: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Big Data in Industrial sector – Data Lake

DATA LAKE / ENTERPRISE DATA HUB / DATA RESERVOIR Low cost storage of heterogeneous data (semi, non-structured and structured data)

Raw data storage but data enriched and classified by metadata – a data reservoir, not a SWAMP

Used for data exploration, analysis and data mining

Data schema on read : old ETL, new ELT

Can be directly used for BI (ELT mode)

DATA LAKE AND DATA WAREHOUSE Complete the sources of the data warehouse

Could stock cold data from Data Warehouse

Feed the Data Warehouse

DATA LAKE VISION Stores aggregated data, can stock all the data

Data Lake centric vision : bring applications to Data and not copy Data to Applications

SEPTEMBER 2015

23

Page 24: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Big Data in Industrial sector – Data Lake - infrastructure

BIG DATA INFRASTRUCTURE hybrid with cloud : NO if you want to keep your data inside (security), network effort, cloud skills

appliance : infra, license, deployment -> TCO ++

On-premise : best compromise between cost, convenience of deployment and usages.

CHOICE : ON-PREMISE INFRASTRUCTURE Go for Cloudera (better administration and security functionalities, ‘real-time’ module : Impala) or Hortonworks

Send your IT training : dev, admin, data mining

SEPTEMBER 2015

24

Page 25: Big data presentation, explanations and use cases in industrial sector

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Big Data in Industrial sector – Proof of Concept – Proof of Value

SEPTEMBER 2015

25

SUBJECT : E-REPUTATION

GOALS Put in place indicators of e-Reputation of your enterprise/competitors/suppliers/customers

from various sources : news, social network

Experiment of big data tools

INDICATORS Who speaks about ? How (positive, negative, neutral) ? What’s the content ? Where in the world ? From what

source ?

Different views of e-Reputation : financial, HR, societal, commercial

DEMO