deep dive into concepts and tools for analyzing streaming...
TRANSCRIPT
![Page 1: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/1.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dr. Steffen Hausmann
Sr. Solutions Architect, Amazon Web Services
Deep Dive into Concepts and Tools for
Analyzing Streaming Data
![Page 2: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/2.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data originates in real-time
Photo by mountainamoeba
https://www.flickr.com/photos/mountainamoeba/2527300028/
![Page 3: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/3.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Analytics is done in batches
Photo by PracticalHacks
https://www.flickr.com/photos/29225844@N05/2828724211
![Page 4: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/4.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Insights are Perishable
Photo by Lucas Cobb
https://www.flickr.com/photos/cobblucas/4780005097/
![Page 5: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/5.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Analyzing Streaming Data on AWS
![Page 6: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/6.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges of Stream Processing
Photo by FollowYour Nose
https://www.flickr.com/photos/laprimadonna/3294467673
![Page 7: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/7.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparing Streams and Relations
𝑅 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟
Relation
𝑆 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟 × 𝑇𝑖𝑚𝑒
Stream
7
now
![Page 8: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/8.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Querying Streams and Relations
Relation Stream
Fixed data and ad-hoc queries
Fixed queries and
continuously ingested data
![Page 9: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/9.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges of Querying Infinite Streams
SELECT * FROM S WHERE color = ‘black’
SELECT * FROM S JOIN S’
SELECT color, COUNT(1) FROM S GROUP BY color
... NOT EXISTS (SELECT * FROM S WHERE color = ‘red’)
![Page 10: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/10.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 11: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/11.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Analyzing Streaming Data on AWS
• Runs standard SQL queries on
top of streaming data
• Fully managed and scales
automatically
• Only pay for the resources your
queries consume
Amazon Kinesis Analytics
• Open-source stream processing
framework
• Included in Amazon Elastic Map
Reduce (EMR)
• Flexible APIs with Java and
Scalar, SQL, and CEP support
Apache Flink
SQL
![Page 12: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/12.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Queries over Streams
Photo by Brad Greenlee
https://www.flickr.com/photos/bgreenlee/91309374/
![Page 13: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/13.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Non-monotonic OperatorsTumbling Windows
SELECT STREAM color, COUNT(1)
FROM ...
GROUP BY STEP(rowtime BY INTERVAL ‘10’ SECOND), color;
t1 t3 t5 t6 t9
10 sec
SQL
![Page 14: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/14.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Non-monotonic OperatorsSliding Windows
SELECT STREAM color, COUNT(1) OVER w
FROM ...
GROUP BY color
WINDOW w AS (RANGE INTERVAL ’10’ SECOND PRECEDING);
t1 t3 t5 t6 t9
SQL
![Page 15: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/15.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Non-monotonic OperatorsSession Windows
t5 t6t1 t3 t8 t9
stream.keyBy(<key selector>).window(EventTimeSessionWindows.withGap(Time.minutes(10))).<windowed transformation>(<window function>);
session gap
![Page 16: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/16.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SELECT STREAM *
FROM S AS s JOIN S’ AS t
ON s.color = t.color
SELECT STREAM *
FROM S OVER w AS s JOIN S’ OVER w AS t
ON s.color = t.color
WINDOW w AS (RANGE INTERVAL ‘10’ SECOND PRECEDING);
Evaluating Unbounded Queries
t2 t4 t8t7
t1 t3 t5 t6 t9
S
S‘
SQL
![Page 17: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/17.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Time Semantics
![Page 18: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/18.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of Events
t1 t3 t8t7
Event Time
t1 t3 t8 7
Processing Time
t7
t11
t11
![Page 19: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/19.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of EventsUsing processing time based windows
t1 t3 t8 t7
Processing
Time
processing
time
count
0
processing
time
count
10
t11
![Page 20: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/20.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of EventsUsing multiple time-windows
SELECT STREAM
STEP(rowtime BY INTERVAL ’10’ SECOND) AS processing_time,
STEP(event_time BY INTERVAL ’10’ SECOND) AS event_time,
color,
COUNT(1)
FROM ...
GROUP BY processing_time, event_time, color;
SQL
![Page 21: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/21.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of EventsUsing multiple time-windows
t1 t3 t8 t7
Processing
Time
processing
time
event time count
0 0
processing
time
event time count
10 0
10 10
t11
![Page 22: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/22.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of EventsUsing event time and watermarks
t1 t3 t8 t710 20
event time count
0
event time count
10
0
Processing
Time
t11
![Page 23: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/23.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adding Watermarks to a Stream
- Periodic watermarks
- Assuming ascending timestamps
- Punctuated watermarks
stream.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<MyEvent>() {
@Overridepublic long extractAscendingTimestamp(MyEvent element) {
return element.getCreationTime();}
});
![Page 24: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/24.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing Semantics
Photo by Dominic Alves
https://www.flickr.com/photos/dominicspics/6854063597/
![Page 25: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/25.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Consuming Data from a Stream
Consumer
Output sink
![Page 26: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/26.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing SemanticsAt-most Once Semantics
Consumer
Output sink
Offset store
pos 561
pos 561
pos 1105
pos 1105
![Page 27: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/27.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing SemanticsAt-least Once Semantics
Consumer
Output sink
Offset store
pos 561
pos 0
pos 0
![Page 28: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/28.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing SemanticsExactly-once Semantics
• At-least-once event delivery plus
message deduplication
• Keep a transaction log of
processed messages
• On failure, replay events and
remove duplicated events for
every operator
Message Deduplication
• State for each operator is
periodically checkpointed
• On failure, rewind operator to
the previous consistent state
Distributed Snapshots
![Page 29: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/29.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Go Build!
![Page 30: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/30.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete the session
survey in the summit mobile app.
![Page 31: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/31.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
![Page 32: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights](https://reader030.vdocuments.net/reader030/viewer/2022040609/5ecd46c840d0b75dff551b7f/html5/thumbnails/32.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Watermarks and Allowed Lateness
t3 t1 t8 t480
Processing
Time
stream.keyBy(<key selector>).window(<window assigner>).allowedLateness(<time>).sideOutputLateData(lateOutputTag)
t5