adventures in timespace - how apache flink handles time and windows
TRANSCRIPT
![Page 2: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/2.jpg)
Adventures in Timespace
![Page 3: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/3.jpg)
3
Why Windows*?
*not Microsoft Windows…
![Page 4: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/4.jpg)
4
That’s why…
![Page 5: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/5.jpg)
5
StreamingBatch
![Page 6: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/6.jpg)
6
In Streaming:Arriving data never stops!
![Page 7: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/7.jpg)
7
Solution:Put elements into buckets,these are called windows
![Page 8: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/8.jpg)
8
Window (5 min)Count
#Hashtags
Just saw #Trump on #CNN, super cool. :D
Trump: 2394Cheese: 12984
Money: 42
![Page 9: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/9.jpg)
9
What I didn’t mention• tweets have a
timestamp, their event time
• tweets from across the globe arrive with delay
=> tweets with different timestamps arrive out-of-order
![Page 10: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/10.jpg)
Window (5 min)Count
#Hashtags
12:34 (13.10.2015):Just saw #Trump on #CNN, super cool. :D
Trump: 2394Cheese: 12984
Money: 42
These arrive with 3 minutes slack
Form windows based on processing time of the machine.
Processing Time != Event Time
10
![Page 11: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/11.jpg)
11
Why do people use this?• easy to implement• low latency• this is what systems
give you (Spark Streaming, Apex, Samza, Storm)*
*not Google Cloud Dataflow
![Page 12: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/12.jpg)
12
Lets look at a morecomplex example.
![Page 13: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/13.jpg)
13
Window (5 min)Correlate Tweets
and News
something...
These still have 3 min slack.
These have 8 min slack.
12:33 (13.10.2015):Donald Trump speaks at Cheese conference.Processing Time != Event
Time
![Page 14: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/14.jpg)
Processing Time != Event Time
=> Mismatch in thetimespace continuum
![Page 15: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/15.jpg)
15
Use cases• out-of-order elements• sources with delay• recovery/fault-tolerance• “catching up” with a
stream
Who does it?• Google Cloud
Dataflow• Apache Flink
![Page 16: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/16.jpg)
16
How can we do this?
![Page 17: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/17.jpg)
17
We need aGlobal Clockthat runs onevent time instead of processing time.
![Page 18: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/18.jpg)
18
This is a source
This is our window operator
1
0
0
0 0
1
2
1
2
1
1
This is the current event-time time
2
2
2
2
2
This is a watermark.
![Page 19: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/19.jpg)
19
Now, show me the API!
![Page 20: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/20.jpg)
20
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(ProcessingTime);
DataStream<Tweet> text = env.addSource(new TwitterSrc());
DataStream<Tuple2<String, Integer>> counts = text .flatMap(new ExtractHashtags()) .keyBy(“name”) .timeWindow(Time.of(5, MINUTES) .apply(new HashtagCounter());
Processing Time
![Page 21: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/21.jpg)
21
Event TimeStreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(EventTime);
DataStream<Tweet> text = env.addSource(new TwitterSrc());text = text.assignTimestamps(new MyTimestampExtractor());
DataStream<Tuple2<String, Integer>> counts = text .flatMap(new ExtractHashtags()) .keyBy(“name”) .timeWindow(Time.of(5, MINUTES) .apply(new HashtagCounter());
![Page 22: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/22.jpg)
22
TL;DL*• stream data is infinite• windows are helpful• event-time != processing
time• watermarks to the rescue• Flink can do it
*too long, didn’t listen
![Page 23: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/23.jpg)
flink.apache.org@ApacheFlink
![Page 24: Adventures in Timespace - How Apache Flink Handles Time and Windows](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a9761a28ab28388b4577/html5/thumbnails/24.jpg)
32-35
24-27
20-23
8-110-3
4-7
24Tumbling Windows of 4 Seconds
123412
4
59
9 0
20
20
22212326323321
26
353642