modern stream processing - goto conference · data or query? 4 data changes slowly compared to fast...
TRANSCRIPT
![Page 1: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/1.jpg)
1
![Page 2: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/2.jpg)
Modern Stream Processing with Apache Flink®
Till Rohrmann
GOTO Berlin 2017
2
![Page 3: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/3.jpg)
3
Original creators ofApache Flink®
dA Platform 2Open Source Apache Flink + dA Application Manager
![Page 4: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/4.jpg)
What changes faster? Data or Query?
4
Data changes slowlycompared to fast changing queries
ad-hoc queries, data exploration, ML training and
(hyper) parameter tuning
Batch ProcessingUse Case
Data changes fastapplication logic
is long-lived
continuous applications,data pipelines, standing queries,
anomaly detection, ML evaluation, …
Stream ProcessingUse Case
![Page 5: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/5.jpg)
Batch Processing
5
![Page 6: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/6.jpg)
Stream Processing
6
![Page 7: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/7.jpg)
7
Apache Flink in a Nutshell
![Page 8: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/8.jpg)
Apache Flink in a Nutshell
8
Queries
Applications
Devices
etc.
Database
Stream
File / ObjectStorage
Stateful computations over streams real-time and historic
fast, scalable, fault tolerant, in-memory,event time, large state, exactly-once
HistoricData
Streams
Application
![Page 9: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/9.jpg)
9
Event Streams State (Event) Time Snapshots
The Core Building Blocks
real-time andhindsight
complexbusiness logic
consistency without-of-order data
and late data
forking /versioning /time-travel
![Page 10: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/10.jpg)
Powerful Abstractions
10
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dynamic tables)
Stream- & Batch Data Processing
High-levelAnalytics API
Stateful Event-Driven Applications
val stats = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … }
out.collect(…) // emit events state.update(…) // modify state
// schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) }
Layered abstractions tonavigate simple to complex use cases
![Page 11: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/11.jpg)
Hardened at scale
11
Athena X Streaming SQLPlatform Service
Streaming Platform as a Service
Fraud detection Streaming Analytics Platform
100s jobs, 1000s nodes, TBs state metrics, analytics, real time ML Streaming SQL as a platform
![Page 12: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/12.jpg)
12
Distributed application infrastructure
![Page 13: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/13.jpg)
Good old centralized architecture
13
The big meancentral database
$$$
The grumpyDBA
Application Application Application Application
![Page 14: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/14.jpg)
Modern distributed app. architecture
14
Application
Sensor
APIsApplication
Application
Application
The limit to what you can do is how sophisticatedcan you compute over the stream!
Boils down to: How well do you handle state and time
![Page 15: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/15.jpg)
15
A Flink-favored approach
![Page 16: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/16.jpg)
Event Sourcing + Memory Image
16
event logpersists events (temporarily)
event /command
Process
main memory
update local variables/structures
periodically snapshot the
memory
![Page 17: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/17.jpg)
Event Sourcing + Memory Image
17
Recovery: Restore snapshot and replay events since snapshot
event logpersists events (temporarily)
Process
![Page 18: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/18.jpg)
Distributed Memory Image
18
Distributed application, many memory images. Snapshots are all consistent together.
![Page 19: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/19.jpg)
Stateful Event & Stream Processing
19
Scalable embedded state
Access at memory speed & scales with parallel operators
![Page 20: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/20.jpg)
Compute, State, and Storage
20
Classic tiered architecture Streaming architecture
database layer
compute layer
application state + backup
compute+
stream storage and
snapshot storage (backup)
application state
![Page 21: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/21.jpg)
Performance
21
synchronous reads/writes across tier boundary
asynchronous writes of large blobs
all modificationsare local
Classic tiered architecture Streaming architecture
![Page 22: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/22.jpg)
Consistency
22
distributed transactions
at scale typically at-most / at-least once
exactly onceper state =1 =1
Classic tiered architecture Streaming architecture
![Page 23: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/23.jpg)
Scaling a Service
23separately provision additional
database capacity
provision compute and state together
Classic tiered architecture Streaming architecture
provisioncompute
![Page 24: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/24.jpg)
Rolling out a new Service
24provision a new database
(or add capacity to an existing one)simply occupies some
additional backup space
Classic tiered architecture Streaming architecture
provision compute and state together
![Page 25: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/25.jpg)
What users built on checkpoints…
▪ Upgrades and Rollbacks ▪ Cross Datacenter Failover ▪ State Archiving ▪ Application Migration ▪ Spot Instance Region Chasing ▪ A/B testing ▪ …
25
![Page 26: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/26.jpg)
26
What is the next wave of stream processing
applications?
![Page 27: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/27.jpg)
What changes faster? Data or Query?
27
Data changes slowlycompared to fast changing queries
ad-hoc queries, data exploration, ML training and tuning
Batch ProcessingUse Case
Data changes fastapplication logic
is long-lived
continuous applications,data pipelines, standing queries,
anomaly detection, ML evaluation, …
Stream ProcessingUse Case
![Page 28: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/28.jpg)
Analytics & Business Logic
28
Stream
Source analytics metricsBusiness
Logic
Stream Processor
events
ApplicationDatabase
![Page 29: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/29.jpg)
Blending Analytics & Business Logic
29
Stream
Source StreamingApplication
Stream Processor
events
RPCs
(Actor) Messages
…
![Page 30: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/30.jpg)
AthenaX by Uber
30
![Page 31: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/31.jpg)
31
Can one build an entire sophisticated web application (say a social network)
on a stream processor?
(Yes, we can!™)
![Page 32: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/32.jpg)
32
Social network implemented using event sourcing and CQRS (Command Query Responsibility Segregation) on Kafka/Flink/Elasticsearch/Redis
More: https://data-artisans.com/blog/drivetribe-cqrs-apache-flink
@
![Page 33: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/33.jpg)
33
The next wave of stream processing applications…
… is all types of stateful applications that react to
data and time!
![Page 34: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/34.jpg)
34
Stateful stream applications
versioning, upgrading, rollback, duplicating,
migrating, …
Continuous applications
![Page 35: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/35.jpg)
The dA Platform Architecture
35
dA Platform 2
Apache Flink Stateful stream processing
Kubernetes Container platform
Logging
Streams from Kafka, Kinesis,
S3, HDFS, Databases, ... dA
Application Manager
Application lifecycle management
Metrics
CI/CD
Real-time Analytics
Anomaly- & Fraud Detection
Real-time Data Integration
Reactive Microservices (and more)
![Page 36: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/36.jpg)
Versioned Applications, not Jobs/Jars
36
Stream ProcessingApplication
Version 3
Version 2
Version 1
Code and Application Snapshot
upgrade
upgrade
New Application
Version 3a
Version 2afork /
duplicate
![Page 37: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/37.jpg)
Deployments, not Flink Clusters
37
Testing / QA Kubernetes Cluster Production Kubernetes Cluster
Threat Metrics App. Testing
Activity Monitor Application
Fraud Detection Application
Fraud Detection App. Testing
![Page 38: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/38.jpg)
Hooks for CI/CD pipelines
38
Kubernetes Cluster
Application Version 1
CI Service
dA Application
Managerpush
update
triggerCI
upgrade API
stateful upgrade
Application Version 2
![Page 39: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/39.jpg)
39
Thank you!@stsffap @ApacheFlink @dataArtisans
![Page 40: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/40.jpg)
We are hiring!
data-artisans.com/careers
40
![Page 41: Modern Stream Processing - GOTO Conference · Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec57af17810c0214a0c2f46/html5/thumbnails/41.jpg)
41