hadoop2, spark - big data paris 2020 cedric carbone.pdf · hadoop2, spark big data, real time,...
TRANSCRIPT
![Page 1: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/1.jpg)
Cédric Carbone Twitter : @carbone
Hadoop2, Spark Big Data, real time, machine learning & use cases
![Page 2: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/2.jpg)
Agenda
• Map Reduce
• Hadoop v1 limits
• Hadoop v2 and YARN
• Apache Spark
• Streaming : Spark vs Storm
• Machine Learning : Recommender System
• Use Case : Next Product To Buy
• Q&A
![Page 3: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/3.jpg)
What’s hadoop • The Apache™ Hadoop® project develops open-
source software for reliable, scalable, distributed computing.
• Java framework for storage and running data transformation on large cluster of commodity hardware
• Licensed under the Apache v2 license
• Created from Google's MapReduce, BigTable and Google File System (GFS) papers
![Page 4: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/4.jpg)
HDFS : Distributed Storage
• Distributed, • Scalable, • Portable, • Reliable file system for the Hadoop framework. Metadata / data separation:
• Name Nodes • Data Nodes
![Page 5: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/5.jpg)
Map Reduce • Map() : parse inputs and generate 0 to n <key,
value>
• Reduce() : sums all values of the same key and generate a <key, value>
WordCount Example
• Each map take a line as an input and break into words – It emits a key/value pair of the word and 1
• Each Reducer sums the counts for each word – It emits a key/value pair of the word and sum
![Page 6: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/6.jpg)
Map Reduce
Data Node 1
Data Node 2
![Page 7: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/7.jpg)
Map Reduce
![Page 8: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/8.jpg)
Map Reduce
![Page 9: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/9.jpg)
Map Reduce
![Page 10: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/10.jpg)
Map Reduce
![Page 11: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/11.jpg)
Hadoop MapReduce v1
![Page 12: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/12.jpg)
Hadoop MapReduce v1
![Page 13: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/13.jpg)
Hadoop MapReduce v1
![Page 14: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/14.jpg)
Not good for low-latency jobs on smallest dataset
Hadoop MapReduce v1
![Page 15: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/15.jpg)
Hadoop MapReduce v1
Good for off-line batch jobs on massive data
![Page 16: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/16.jpg)
Hadoop 1
• Batch ONLY
– High latency jobs
HDFS (Redundant, Reliable Storage)
MapReduce1 Cluster Resource Management + Data Processing
BATCH
HIVE Query
Pig Scripting
Cascading Accelerate Dev.
![Page 17: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/17.jpg)
Hadoop2 : Big Data Operating System
• Customers want to store ALL DATA in one place and interact with it in MULTIPLE WAYS
– Simultaneously & with predictable levels of service
– Data analysts and real-time applications
HDFS (Redundant, Reliable Storage)
MapReduce1 Data Processing
BATCH
YARN (Cluster Resource Management)
Other Data Processing
…
![Page 18: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/18.jpg)
Hadoop2 : Big Data Operating System
• Customers want to store ALL DATA in one place and interact with it in MULTIPLE WAYS
– Simultaneously & with predictable levels of service
– Data analysts and real-time applications
HDFS (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH (MapReduce)
INTERACTIVE (Tez)
STREAMING (Storm, Samza Spark Streaming)
GRAPH (Giraph, GraphX)
Machine Learning
(Spark MLLIb)
In-Memory (Spark)
ONLINE (Hbase HOYA)
OTHER (ElasticSearch)
![Page 19: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/19.jpg)
Stinger.next
![Page 20: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/20.jpg)
Stinger.next
![Page 21: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/21.jpg)
https://spark.apache.org
Apache Spark™ is a fast and general engine for large-scale data processing.
![Page 22: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/22.jpg)
The most active project
0
50
100
150
200
250
Patches
MapReduce Storm
Yarn Spark
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
Lines Added
MapReduce Storm
Yarn Spark
![Page 23: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/23.jpg)
Spark won the Daytona GraySort contest!
Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Sort on disk 100TB of data 3x faster than Hadoop MapReduce using 10x fewer machines.
![Page 24: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/24.jpg)
RDD & Operation
Resilient Distributed Datasets (RDDs)
Operations
➜ Transformations (e.g. map, filter, groupBy)
➜ Actions (e.g. count, collect, save)
![Page 25: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/25.jpg)
Spark
scala> val textFile = sc.textFile("README.md")
➜ textFile: spark.RDD[String] = spark.MappedRDD@2ee9b6e3
scala> textFile.count()
➜ res0: Long = 126
scala> textFile.first()
➜ res1: String = # Apache Spark
scala> val linesWithSpark = textFile.filter(line =>
line.contains("Spark"))
➜ linesWithSpark: spark.RDD[String]=spark.FilteredRDD@7dd4
scala>
textFile.filter(line=>line.contains("Spark")).count()
➜ res3: Long = 15
![Page 26: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/26.jpg)
Streaming
Streaming
![Page 27: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/27.jpg)
Storm
![Page 28: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/28.jpg)
Storm
![Page 29: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/29.jpg)
Storm vs Spark
Spark Streaming Storm Storm Trident
Processing model Micro batches Record-at-a-time Micro batches
Thoughput ++++ ++ ++++
Latency Second Sub-second Second
Reliability Models Exactly once At least once Exactly once
Embedded Hadoop Distro HDP, CDH, MapR HDP HDP
Support Databricks N/A N/A
Community ++++ ++ ++
Spark Storm
Scope Batch, Streaming, Graph, ML, SQL Streaming only
![Page 30: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/30.jpg)
Machine Learning Library (Mllib)
![Page 31: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/31.jpg)
Collaborative Filtering
![Page 32: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/32.jpg)
Collaborative Filtering (learning)
![Page 33: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/33.jpg)
Collaborative Filtering (learning)
![Page 34: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/34.jpg)
Collaborative Filtering (learning)
![Page 35: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/35.jpg)
Collaborative Filtering : Let’s use the model
![Page 36: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/36.jpg)
Collaborative Filtering : similar behaviors
![Page 37: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/37.jpg)
Collaborative Filtering Prediction
![Page 38: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/38.jpg)
Netflix Prize (2009) Netflix is a provider of on-demand Internet streaming media
![Page 39: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/39.jpg)
Input Data
UserID::MovieID::Rating::Timestamp 1::1193::5::978300760 1::661::3::978302109 1::914::3::978301968 Etc… 2::1357::5::978298709 2::3068::4::978299000 2::1537::4::978299620
![Page 40: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/40.jpg)
Matric Factorization
![Page 41: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/41.jpg)
The result
1 ; Lyndon Wilson ; 4.608531808535918 ; 858 ; Godfather, The (1972) 1 ; Lyndon Wilson ; 4.596556961095434 ; 318 ; Shawshank Redemption, The (1994) 1 ; Lyndon Wilson ; 4.575789377957803 ; 527 ; Schindler's List (1993) 1 ; Lyndon Wilson ; 4.549694932928024 ; 593 ; Silence of the Lambs, The (1991) 1 ; Lyndon Wilson ; 4.46311974037361 ; 919 ; Wizard of Oz, The (1939) 2 ; Benjamin Harrison ; 4.99545499047152 ; 318 ; Shawshank Redemption, The (1994) 2 ; Benjamin Harrison ; 4.94255532354725 ; 356 ; Forrest Gump (1994) 2 ; Benjamin Harrison ; 4.80168679606128 ; 527 ; Schindler's List (1993) 2 ; Benjamin Harrison ; 4.7874247577586795 ; 1097 ; E.T. the Extra-Terrestrial (1982) 2 ; Benjamin Harrison ; 4.7635998147872325 ; 110 ; Braveheart (1995) 3 ; Richard Hoover ; 4.962687467351026 ; 110 ; Braveheart (1995) 3 ; Richard Hoover ; 4.8316542374095315 ; 318 ; Shawshank Redemption, The (1994) 3 ; Richard Hoover ; 4.7307103243995385 ; 356 ; Forrest Gump (19
![Page 42: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/42.jpg)
Real Time Big Data Use Case Next Gen Data Marketing Platform
Next Product To Buy
![Page 43: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/43.jpg)
“2013 Definitive Guide to Social Marketing” - Marketo.
Ready for Omni-channel? Traditional marketing
Current approach cannot keep up…
200m people on Do Not Call list
99.9%
of online banners are never clicked.
44%
of direct
marketing is never opened.
86% of TV viewers
skip commercials
Buyers complete
60%
of their research before reaching out
to vendors.
![Page 44: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/44.jpg)
Statement
2000 2010 2013 2015
Multi Channel
Cross Channel
Omni Channel
Consumer Graph
![Page 45: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/45.jpg)
Next Product to Buy in Action
Open data
Premium data
1
![Page 46: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/46.jpg)
Next Product to Buy in Action
ERP
CRM Loyalty
Brand data
Open data
Premium data
1
![Page 47: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/47.jpg)
Next Product to Buy in Action
CRM Loyalty
ERP Brand data
Open data
Premium data
2
![Page 48: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/48.jpg)
Next Product to Buy in Action
CRM Loyalty
ERP Brand data
Open data
Premium data
3
![Page 49: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/49.jpg)
Next Product to Buy in Action
CRM Loyalty
ERP Brand data
Open data
Premium data
4
![Page 50: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/50.jpg)
Next Product to Buy in Action
CRM Loyalty
ERP Brand data
Open data
Premium data
4
![Page 51: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/51.jpg)
Next Product to Buy in Action
CRM Loyalty
ERP Brand data
Open data
Premium data
4
![Page 52: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/52.jpg)
Next Product to Buy in Action
CRM Loyalty
ERP Brand data
Open data
Premium data
5
![Page 53: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/53.jpg)
Brand Premium Open Social Influans
Sales
Social Interactions
Graph
Fine Tune
Engage
OnBoard
Suggest
+
+
![Page 54: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/54.jpg)
Real Time Big Data Use Case Next Gen Data Marketing Platform
Next Product To Buy
➜ Right Person
➜ Right Product
➜ Right Price
➜ Right Time
➜ Right Channel
![Page 55: Hadoop2, Spark - Big Data Paris 2020 Cedric CARBONE.pdf · Hadoop2, Spark Big Data, real time, machine learning & use cases . Agenda •Map Reduce ... engine for large-scale data](https://reader035.vdocuments.net/reader035/viewer/2022062505/5ed3911da1895f794116ab63/html5/thumbnails/55.jpg)
Cédric Carbone
@carbone
www.hugfrance.fr
@hugfrance
Questions?
W e g r a p h c o n s u m e r s