future of data meetup : boontadata

38
@benjguin boontadata Benjamin Guinebertière Technical Evangelist Microsoft France @benjguin

Upload: abdelkrim-hadjidj

Post on 20-Mar-2017

34 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Future of Data Meetup : Boontadata

@benjguin

boontadata

Benjamin GuinebertièreTechnical EvangelistMicrosoft France@benjguin

Page 2: Future of Data Meetup : Boontadata

@benjguin

Introduction

Page 3: Future of Data Meetup : Boontadata

@benjguin

Agenda- Introduction- Big Data Architectures- Stream Processing Challenges- boontadata- Conclusion

Page 4: Future of Data Meetup : Boontadata

@benjguin

Big Data Architectures

Page 5: Future of Data Meetup : Boontadata

@benjguin

Big Data Processing Engines

data stream

data lake

Page 6: Future of Data Meetup : Boontadata

@benjguin

events log

cold path

hot path

Lambda architecture

Events

storage

NearReal Time

DBSQL/noSQL

batch

query &

mergeStorage

, DB SQL/noSQL

Page 7: Future of Data Meetup : Boontadata

@benjguin

cold path

hot path

Non lambda Architecture

Events

events log

NearReal Time DB

SQL/noSQLbatch

query

Page 8: Future of Data Meetup : Boontadata

@benjguin

Kappa Architecture

https://www.oreilly.com/ideas/questioning-the-lambda-architecture , By Jay Kreps, July 2, 2014

Page 9: Future of Data Meetup : Boontadata

@benjguin

Exemples de services(dans Azure)

http://dev.microsoft.fr/data

Page 10: Future of Data Meetup : Boontadata

@benjguin

Stream Processing Challenges

Page 11: Future of Data Meetup : Boontadata

@benjguin

Events:Example

Time Object A Object B10:00:00 100 200010:00:03 180010:00:04 8510:00:07 40 250010:00:08 10010:00:09 3000

Aggregations:Time Window Object A (sum) Object B (average)10:00:05 185 190010:00:10 140 2750

Page 12: Future of Data Meetup : Boontadata

@benjguin

A distributed system

object A

object B

brokerbroker

broker processing engine

Page 13: Future of Data Meetup : Boontadata

@benjguin

Best Case10

:00:

00

10:0

0:01

10:0

0:02

10:0

0:03

10:0

0:04

10:0

0:05

10:0

0:06

10:0

0:07

10:0

0:08

10:0

0:09

10:0

0:10

10:0

0:11

10:0

0:12

A,m

1,10

0 A,m

4,8

5 A,m

5,4

0 A,m

7,10

0

B,m

2,20

00 B,m

3,18

00 B,m

6,25

00 B,m

8,30

00

Page 14: Future of Data Meetup : Boontadata

@benjguin

duplicate events10

:00:

00

10:0

0:01

10:0

0:02

10:0

0:03

10:0

0:04

10:0

0:05

10:0

0:06

10:0

0:07

10:0

0:08

10:0

0:09

10:0

0:10

10:0

0:11

10:0

0:12

A,m

1,10

0 A,m

4,8

5 A,m

5,4

0 A,m

7,10

0

B,m

2,20

00 B,m

3,18

00 B,m

6,25

00 B,m

8,30

00B,m

2,20

00

A,m

1,10

0

A,m

5,4

0B,

m8,

3000 B,

m8,

3000

Page 15: Future of Data Meetup : Boontadata

@benjguin

out of order events10

:00:

00

10:0

0:01

10:0

0:02

10:0

0:03

10:0

0:04

10:0

0:05

10:0

0:06

10:0

0:07

10:0

0:08

10:0

0:09

10:0

0:10

10:0

0:11

10:0

0:12

A,m

1,10

0 A,m

4,8

5 A,m

5,4

0A,m

7,10

0

B,m

2,20

00 B,m

3,18

00 B,m

6,25

00 B,m

8,30

00

Page 16: Future of Data Meetup : Boontadata

@benjguin

late events10

:00:

00

10:0

0:01

10:0

0:02

10:0

0:03

10:0

0:04

10:0

0:05

10:0

0:06

10:0

0:07

10:0

0:08

10:0

0:09

10:0

0:10

10:0

0:11

10:0

0:12

A,m

1,10

0A,m

4,8

5 A,m

5,4

0 A,m

7,10

0

B,m

2,20

00 B,m

3,18

00 B,m

6,25

00 B,m

8,30

00

Page 17: Future of Data Meetup : Boontadata

@benjguin

Watermark

from Apache Flink documentation

Page 18: Future of Data Meetup : Boontadata

@benjguin

Watermark

from Apache Flink documentation

Page 19: Future of Data Meetup : Boontadata

@benjguin

boontadata

Page 20: Future of Data Meetup : Boontadata

@benjguin

Storm

Flink

Samza

Spark Streaming

Page 21: Future of Data Meetup : Boontadata

@benjguin

IOT simulator

broker

noSQL Database

Streaming engine #1

Streaming engine #2

Streaming engine #...

compare

boontadata

Page 22: Future of Data Meetup : Boontadata

@benjguin

IOT simulator(python)

Kafka broker

Cassandra

Apache Spark Streaming

Apache Flink

Apache Kafka Streams,…

compare(python)

boontadata-streams

Page 23: Future of Data Meetup : Boontadata

@benjguin

IOT simulator(python)

IOT Hub

DocumentDb

Azure Stream Analytics

HDInsight Storm

HDInsight Spark Streaming,

compare(python)

boontadata-paas

Page 24: Future of Data Meetup : Boontadata

@benjguin

Implement other engines: Apache

Samza, Apex,…

Page 25: Future of Data Meetup : Boontadata

@benjguin

http://boontadata.io

Page 26: Future of Data Meetup : Boontadata

@benjguin

let’s run it

Page 27: Future of Data Meetup : Boontadata

@benjguin

inject

Page 28: Future of Data Meetup : Boontadata

@benjguin

Spark Streaming – processing time

Page 29: Future of Data Meetup : Boontadata

@benjguin

Flink – processing time

Page 30: Future of Data Meetup : Boontadata

@benjguin

Flink – event time

Page 31: Future of Data Meetup : Boontadata

@benjguin

Azure Stream Analyticspyclient

container

Page 32: Future of Data Meetup : Boontadata

@benjguin

Run other tests(other seed, several objects, …)

Page 33: Future of Data Meetup : Boontadata

@benjguin

compare the code

http://boontadata.io

Page 34: Future of Data Meetup : Boontadata

@benjguin

Conclusion

Page 35: Future of Data Meetup : Boontadata

@benjguin

• A number of streaming engines with subtle differences=> do your own tests

• Contribute: http://boontadata.io

Conclusion

Page 36: Future of Data Meetup : Boontadata

@benjguin

Page 37: Future of Data Meetup : Boontadata

@benjguin

Benjamin Guinebertière Technical Evangelist, Microsoft FranceAzure, data insights, machine learning @benjguin | http://3-4.fr

Page 38: Future of Data Meetup : Boontadata

@benjguin

© 2014 Microsoft Corporation. All rights reserved.