apache flink® and iot: how statefulevent-time processing ... · apachecon na 2017 - apache flink®...

39
1 Aljoscha Krettek @aljoscha ApacheCon North America May, 2017 Apache Flink® and IoT: How Stateful Event-Time Processing Enables Accurate Analytics

Upload: others

Post on 20-May-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

1

Aljoscha Krettek@aljoscha

ApacheCon North AmericaMay, 2017

Apache Flink® and IoT: How Stateful Event-Time Processing Enables Accurate Analytics

Page 2: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

What I’d Like to Talk About

2

§ IoT and event-time stream processing

§ Stateful stream processing

§ Streaming architecture and Flink

Page 3: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

3

Original creators of Apache Flink®

Providers of the dA Platform, a supported

Flink distribution

Page 4: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

IoT and Event-time Stream Processing

4

Page 5: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Example Event Sources

5

Page 6: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

A Simple Definition

6

IoT use cases from the system’s perspective:

A large number of (distributed) things continuously generating a large amount of data.

Page 7: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

IoT: Some Insights

7

§ Data is continuously produced → Stream Processing

§ Events have a timestamp→ Event-time based processing

§ Data/Events can arrive with huge delays/out-of-order

§ Most analyses happen on time windows

Page 8: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

What Is Event-Time Processing

8

1977 1980 1983 1999 2002 2005 2015

Processing Time

EpisodeIV

EpisodeV

EpisodeVI

EpisodeI

EpisodeII

EpisodeIII

EpisodeVII

Event Time

Page 9: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

What Is Event-Time Processing

9

1312735961112

1234567891011121314Processing Time

Event timestamp

Message Queue

Page 10: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

What’s The Problem?

10

13

12

735961112

1234567891011121314Processing Time

Processing-Time Windows 137356

12 137 356Event-Time Windows

12

1112

Mismatch between event time and processing time.

Page 11: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Sources of Time Mismatch§ Big Mismatch• Network disconnects• Slow network

§ Small Mismatch• The nature of distributed systems• Differing system clock time

11

Page 12: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Small Event-Time Mismatch

12

Robust Stream Processing with Apache Flink®:A Simple Walkthroughhttp://data-artisans.com/robust-stream-processing-flink-walkthrough/

Page 13: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

13

Page 14: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

14

Page 15: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

15

Page 16: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Recap: Event-Time§ IoT use cases need event-time

processing§ Even small mismatch of event

time/processing time will lead to wrong results

16

Page 17: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

(Stateful) Stream Processing

17

Page 18: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Stream Processing

18

Computation

Computations on never-ending “streams” of data records (“events”)

Page 19: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Distributed Stream Processing

19

Computation

Computation spread across many machines

Computation Computation

Page 20: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Stateful Stream Processing

20

Computation

State

State is usually partitioned by some key in the data

Page 21: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Stateful Stream Processing II

21

§ Result depends on history of stream§ A stateful stream processor should

gives the tools to manage state• Recover, roll back, version upgrade, etc.

Page 22: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

22

app state

app state

app state

event log

Queryservice

Page 23: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Recap: Stateful Streams§ Continuous processing of data that is

continuously generated§ I.e., pretty much all “big” data§ It’s all about state and time§ Flink does all of that

23

Page 24: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Operational Issues

24

Page 25: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Operational Questions§ What happens in case of failures?§ What if I need to update my code/Flink?§ Can I re-process my data?§ How can I execute my programs?

25

Page 26: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Failure Handling§ JobManager High-Availability using

ZooKeeper§ Periodic checkpoints of state to

persistent storage (HDFS, S3, …)§ In case of failure: rollback to previous

consistent state

26

Page 27: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Savepoints§ A persistent snapshot of all state§ When starting an application, state can

be initialized from a savepoint§ In-between savepoint and restore we can

update Flink version or user code

27

Page 28: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Closing

28

Page 29: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

TL;DR§ Stateful stream processing is nice 😎§ IoT use cases require proper time

management§ Apache Flink is a stateful stream

processor with plenty of nifty features

29

Page 30: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

30

Thank you!

@aljoscha@ApacheFlink@dataArtisans

Page 31: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Backup Slides

31

Page 32: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Event-Time Processing

32

Page 33: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

What Is Event-Time Processing

33

1977 1980 1983 1999 2002 2005 2015

Processing Time

EpisodeIV

EpisodeV

EpisodeVI

EpisodeI

EpisodeII

EpisodeIII

EpisodeVII

Event Time

Page 34: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

What Is Event-Time Processing

34

1312735961112

1234567891011121314Processing Time

Event timestamp

Message Queue

Page 35: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

What is Event-Time Streaming§ Events have timestamps

§ Processing depends on timestamps

§ An event-time stream processor should give you the tools to reason about time• Handle streams that are out of

order35

Your code

state

t3 t1 t2t4 t1-t2 t3-t4

Page 36: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

Recap: Event-Time§ IoT use cases need event-time

processing§ Even small mismatch of event

time/processing time will lead to wrong results

36

Page 37: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

History of Flink

37

Page 38: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

A brief History of Flink

38

January ‘10 December ‘14

v0.5 v0.6 v0.7

March ‘16

Flink ProjectIncubation

Top LevelProject

v0.8 v0.10

Release1.0

ProjectStratosphere

(Flink precursor)

v0.9

April ‘14

Page 39: Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink® and IoT- How Stateful Event-Time Processing Enables Accurate Analytics (Aljoscha

A brief History of Flink

39

January ‘10 December ‘14

v0.5 v0.6 v0.7

March ‘16

Flink ProjectIncubation

Top LevelProject

v0.8 v0.10

Release1.0

ProjectStratosphere

(Flink precursor)

v0.9

April ‘14

The academia gap:Reading/writing papers, teaching, worrying about

thesis

Realizing this might be interesting to people

beyond academia(even more so,

actually)