future of data meetup : boontadata
TRANSCRIPT
@benjguin
boontadata
Benjamin GuinebertièreTechnical EvangelistMicrosoft France@benjguin
@benjguin
Introduction
@benjguin
Agenda- Introduction- Big Data Architectures- Stream Processing Challenges- boontadata- Conclusion
@benjguin
Big Data Architectures
@benjguin
Big Data Processing Engines
data stream
data lake
@benjguin
events log
cold path
hot path
Lambda architecture
Events
storage
NearReal Time
DBSQL/noSQL
batch
query &
mergeStorage
, DB SQL/noSQL
@benjguin
cold path
hot path
Non lambda Architecture
Events
events log
NearReal Time DB
SQL/noSQLbatch
query
@benjguin
Kappa Architecture
https://www.oreilly.com/ideas/questioning-the-lambda-architecture , By Jay Kreps, July 2, 2014
@benjguin
Exemples de services(dans Azure)
http://dev.microsoft.fr/data
@benjguin
Stream Processing Challenges
@benjguin
Events:Example
Time Object A Object B10:00:00 100 200010:00:03 180010:00:04 8510:00:07 40 250010:00:08 10010:00:09 3000
Aggregations:Time Window Object A (sum) Object B (average)10:00:05 185 190010:00:10 140 2750
@benjguin
A distributed system
object A
object B
brokerbroker
broker processing engine
@benjguin
Best Case10
:00:
00
10:0
0:01
10:0
0:02
10:0
0:03
10:0
0:04
10:0
0:05
10:0
0:06
10:0
0:07
10:0
0:08
10:0
0:09
10:0
0:10
10:0
0:11
10:0
0:12
A,m
1,10
0 A,m
4,8
5 A,m
5,4
0 A,m
7,10
0
B,m
2,20
00 B,m
3,18
00 B,m
6,25
00 B,m
8,30
00
@benjguin
duplicate events10
:00:
00
10:0
0:01
10:0
0:02
10:0
0:03
10:0
0:04
10:0
0:05
10:0
0:06
10:0
0:07
10:0
0:08
10:0
0:09
10:0
0:10
10:0
0:11
10:0
0:12
A,m
1,10
0 A,m
4,8
5 A,m
5,4
0 A,m
7,10
0
B,m
2,20
00 B,m
3,18
00 B,m
6,25
00 B,m
8,30
00B,m
2,20
00
A,m
1,10
0
A,m
5,4
0B,
m8,
3000 B,
m8,
3000
@benjguin
out of order events10
:00:
00
10:0
0:01
10:0
0:02
10:0
0:03
10:0
0:04
10:0
0:05
10:0
0:06
10:0
0:07
10:0
0:08
10:0
0:09
10:0
0:10
10:0
0:11
10:0
0:12
A,m
1,10
0 A,m
4,8
5 A,m
5,4
0A,m
7,10
0
B,m
2,20
00 B,m
3,18
00 B,m
6,25
00 B,m
8,30
00
@benjguin
late events10
:00:
00
10:0
0:01
10:0
0:02
10:0
0:03
10:0
0:04
10:0
0:05
10:0
0:06
10:0
0:07
10:0
0:08
10:0
0:09
10:0
0:10
10:0
0:11
10:0
0:12
A,m
1,10
0A,m
4,8
5 A,m
5,4
0 A,m
7,10
0
B,m
2,20
00 B,m
3,18
00 B,m
6,25
00 B,m
8,30
00
@benjguin
Watermark
from Apache Flink documentation
@benjguin
Watermark
from Apache Flink documentation
@benjguin
boontadata
@benjguin
Storm
Flink
Samza
Spark Streaming
…
@benjguin
IOT simulator
broker
noSQL Database
Streaming engine #1
Streaming engine #2
Streaming engine #...
compare
boontadata
@benjguin
IOT simulator(python)
Kafka broker
Cassandra
Apache Spark Streaming
Apache Flink
Apache Kafka Streams,…
compare(python)
boontadata-streams
@benjguin
IOT simulator(python)
IOT Hub
DocumentDb
Azure Stream Analytics
HDInsight Storm
HDInsight Spark Streaming,
…
compare(python)
boontadata-paas
@benjguin
Implement other engines: Apache
Samza, Apex,…
@benjguin
http://boontadata.io
@benjguin
let’s run it
@benjguin
inject
@benjguin
Spark Streaming – processing time
@benjguin
Flink – processing time
@benjguin
Flink – event time
@benjguin
Azure Stream Analyticspyclient
container
@benjguin
Run other tests(other seed, several objects, …)
@benjguin
Conclusion
@benjguin
• A number of streaming engines with subtle differences=> do your own tests
• Contribute: http://boontadata.io
Conclusion
@benjguin
@benjguin
Benjamin Guinebertière Technical Evangelist, Microsoft FranceAzure, data insights, machine learning @benjguin | http://3-4.fr
@benjguin
© 2014 Microsoft Corporation. All rights reserved.