![Page 1: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/1.jpg)
UfukCelebi@iamuce
The Stream Processoras a Database
![Page 2: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/2.jpg)
The(Classic)UseCaseRealtimeCountsandAggregates
2
![Page 3: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/3.jpg)
(Real-)TimeSeriesStatistics
3
StreamofEvents Real-timeStatistics
![Page 4: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/4.jpg)
TheArchitecture
4
collect messagequeue
analyze serve&store
![Page 5: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/5.jpg)
TheFlinkJob
5
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")
![Page 6: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/6.jpg)
TheFlinkJob
6
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")
![Page 7: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/7.jpg)
TheFlinkJob
7
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")
![Page 8: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/8.jpg)
TheFlinkJob
8
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")
![Page 9: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/9.jpg)
TheFlinkJob
9
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")
![Page 10: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/10.jpg)
TheFlinkJob
10
KafkaSource map() window()/
sum() Sink
KafkaSource map() window()/
sum() Sink
filter()
filter()
keyBy()
keyBy()
State
State
![Page 11: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/11.jpg)
Puttingitalltogether
11
Periodically(everysecond)flushnewaggregates
toRedis
![Page 12: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/12.jpg)
TheBottleneck
12
Writestothekey/valuestoretaketoolong
![Page 13: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/13.jpg)
Queryable State
13
![Page 14: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/14.jpg)
QueryableState
14
![Page 15: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/15.jpg)
QueryableState
15
Optional,andonlyattheendof
windows
![Page 16: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/16.jpg)
QueryableState:ApplicationView
16
Database
realtimeresults olderresults
Application QueryService
currenttimewindows
pasttimewindows
![Page 17: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/17.jpg)
QueryableStateEnablers§ Flinkhasstateasafirstclasscitizen
§ Stateisfaulttolerant (exactlyoncesemantics)
§ Stateispartitioned (sharded)togetherwiththeoperatorsthatcreate/updateit
§ Stateiscontinuous (notminibatched)
§ Stateisscalable
17
![Page 18: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/18.jpg)
StateinFlink
18
window()/sum()
Source/filter()/map()
Stateindex(e.g.,RocksDB)
Eventsarepersistentandordered (perpartition/key)
inthemessagequeue(e.g.,ApacheKafka)
Eventsflowwithoutreplicationor synchronouswrites
![Page 19: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/19.jpg)
StateinFlink
19
window()/sum()
Source/filter()/map()
Triggercheckpoint Injectcheckpointbarrier
![Page 20: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/20.jpg)
StateinFlink
20
window()/sum()
Source/filter()/map()
Takestatesnapshot Triggerstatecopy-on-write
![Page 21: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/21.jpg)
StateinFlink
21
window()/sum()
Source/filter()/map()
Persiststatesnapshots Durablypersistsnapshots
asynchronously
Processingpipelinecontinues
![Page 22: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/22.jpg)
QueryableState:Implementation
22
QueryClient
StateRegistry
window()/sum()
JobManager TaskManager
ExecutionGraph
StateLocationServer
deploy
status
Query:/job/state-name/key
StateRegistry
window()/sum()
TaskManager
(1)Getlocationof"key-partition"of"job"
(2)Lookuplocation
(3)Respondlocation
(4)Querystate-nameandkey
localstate
register
![Page 23: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/23.jpg)
QueryableStatePerformance
23
![Page 24: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/24.jpg)
Conclusion
24
![Page 25: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/25.jpg)
Takeaways§ Streamingapplicationsareoftennotboundbythestream
processoritself.Crosssysteminteraction isfrequentlybiggestbottleneck
§ Queryablestatemitigatesabigbottleneck:Communicationwithexternalkey/valuestorestopublishrealtimeresults
§ ApacheFlink'ssophisticatedsupportforstatemakesthispossible
25
![Page 26: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/26.jpg)
TakeawaysPerformanceofQueryableState
§ Datapersistenceisfastwithlogs• Appendonly,andstreamingreplication
§ Computedstateisfastwithlocaldatastructuresandnosynchronousreplication
§ Flink'scheckpointmethodmakescomputedstatepersistentwithlowoverhead
26
![Page 27: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/27.jpg)
Questions?§ eMail:[email protected]§ Twitter:@iamuce§ Code/Demo:https://github.com/dataArtisans/flink-
queryable_state_demo
27
![Page 28: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/28.jpg)
Appendix
28
![Page 29: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/29.jpg)
Flink Runtime+APIs
29
DataStreamAPI
RuntimeDistributedStreamingDataFlow
TableAPI&StreamSQL
ProcessFunction API
Building Blocks: Streams, Time, State
![Page 30: The Stream Processor as a Database - Francisco...§Streaming applications are often not bound by the stream processor itself. Cross system interactionis frequently biggest bottleneck](https://reader034.vdocuments.net/reader034/viewer/2022043021/5f3d687b915b1f2edf4c92fb/html5/thumbnails/30.jpg)
ApacheFlinkArchitectureReview
30