![Page 1: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/1.jpg)
Apache Flink的过去、现在和未来
杨克特(鲁尼)
阿里巴巴高级技术专家
![Page 2: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/2.jpg)
过去
![Page 3: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/3.jpg)
一切从2014年开始
2009 - 2014 2014
• 柏林工业大学博士生项目
• 基于流式 runtime 的批处理引擎
• 2014 年 8 月份发布 Flink 0.6.0
![Page 4: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/4.jpg)
Flink 0.7
RuntimeDistributed Streaming Dataflow
DataStream APIStream Processing
DataSet APIBatch Processing
2014 年 12 月份发布 – 开始正式支持 DataStream
![Page 5: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/5.jpg)
Flink 0.9
SinkSource Offset Computation State
Periodic Snapshots
2015 年 6 月份发布 – 开始内置支持 State
![Page 6: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/6.jpg)
Global Checkpoint
新数据 老数据
Checkpoint Barrier N Checkpoint Barrier N-1
Part of
Checkpoint N+1
Part of
Checkpoint N
Part of
Checkpoint N-1
• 吞吐和延迟不再是一个 tradeoff
• 支持精准一次的语义,同时对性能的影响较低
基于 Chandy – Lamport 算法
![Page 7: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/7.jpg)
Flink 1.0 版本基石
Checkpoint
基于 Chandy-Lamport
算法,实现了分布式一
致性快照,提供了一致
性的语义。
丰富的 State API。
ValueState,
ListState, MapState
BroadcastState。
支持基于事件时间的计
算,实现 Watermark 机
制。乱序数据处理,迟
到数据容忍。
开箱即用的滚动、滑
动、会话窗口。以及
灵活的自定义窗口。
State Time Window
![Page 8: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/8.jpg)
2015 年阿里巴巴开始使用 Flink 并持续贡献社区
![Page 9: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/9.jpg)
重构分布式架构
Client
Dispatcher Job Manager Task Manager
Resource Manager Cluster Manager
Task Manager
1. Submit job
2. Start job
3. Request slots
4. Allocate Container
5. Start Task Manager
6. Schedule Task
YARN RM K8S RM
![Page 10: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/10.jpg)
增量 Checkpoint
时间
全量状态
增量状态
增量 snapshot
![Page 11: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/11.jpg)
基于 credit 的流控机制
![Page 12: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/12.jpg)
Streaming SQL
-------------------------| USER_SCORES |-------------------------| User | Score | Time |-------------------------| Julie | 7 | 12:01 || Frank | 3 | 12:03 || Julie | 1 | 12:03 || Frank | 2 | 12:06 || Julie | 4 | 12:07 |-------------------------
-----------------------------| [-inf, 12:01) || ------------------------- || | Name | Score | Time | || ------------------------- || | | | | || | | | | || ------------------------- |-----------------------------
----------------------------[12:01, 12:04) |
------------------------- || Name | Score | Time | |------------------------- || Julie | 8 | 12:03 | || Frank | 3 | 12:03 | |------------------------- |----------------------------
----------------------------[12:04, now) |
------------------------- || Name | Score | Time | |------------------------- || Julie | 12 | 12:07 | || Frank | 5 | 12:06 | |------------------------- |----------------------------
Stream Mode: 12:01> SELECT Name, SUM(Score), MAX(Time) FROM USER_SCORES GROUP BY Name;
![Page 13: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/13.jpg)
Flink 在阿里的服务情况
集群规模
超万台状态数据
PetaBytes事件处理
十万亿/天峰值能力
17亿/秒
![Page 14: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/14.jpg)
Flink 的过去
offline Real-time
Batch
Processing
Continuous Processing &
Streaming AnalyticsEvent-driven
Applications✔
![Page 15: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/15.jpg)
现在
![Page 16: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/16.jpg)
Flink 1.9 的架构变化
Runtime
Distributed Streaming Dataflow
Query Processor
DAG & StreamOperator
Local
Single JVM
Cloud
GCE, EC2
Cluster
Standalone, YARN
Runtime
Distributed Streaming Dataflow
DataStream API
Stream Processing
DataSet API
Batch Processing
Table API & SQL
Relational
Table API & SQL
Relational
Local
Single JVM
Cloud
GCE, EC2
Cluster
Standalone, YARN
DataStream
Physical
![Page 17: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/17.jpg)
统一 Operator 抽象
Pull-based operator Push-based operator
算子可自定义读取顺序
![Page 18: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/18.jpg)
Table API & SQL 1.9 新特性
全新的 SQL类型系统
DDL初步支持
Table API增强
统一的Catalog API
Blink Planner
![Page 19: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/19.jpg)
What’s new in Blink Planner
数据结构二进制化
更丰富的内置函数
Minibatch聚合函数
多种解热点手段
维表关联支持
TopN 高效的流式去重
完整的批处理支持
![Page 20: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/20.jpg)
批处理错误恢复(1)
![Page 21: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/21.jpg)
批处理错误恢复(2)
![Page 22: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/22.jpg)
批处理错误恢复(3)
![Page 23: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/23.jpg)
批处理错误恢复(4)
![Page 24: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/24.jpg)
批处理错误恢复(5)
![Page 25: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/25.jpg)
插件化 Shuffle Manager
![Page 26: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/26.jpg)
生态
Flink Hive Flink Zeppelin
![Page 27: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/27.jpg)
中文社区
![Page 28: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/28.jpg)
Flink 的现在
offline Real-time
Batch
Processing
Continuous Processing &
Streaming AnalyticsEvent-driven
Applications✔✔
![Page 29: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/29.jpg)
未来
![Page 30: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/30.jpg)
Micro Services
O_0
O_1
I_0
I_1
I_2
P_0
P_1
P_2
S_0
S_1
Order
Inventory Payment
Shipping
Flow-Control
Async Call
Auto Scale
State Management
Event Driven
![Page 31: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/31.jpg)
Flink 的未来
offline Real-time
Batch
Processing
Continuous Processing &
Streaming AnalyticsEvent-driven
Applications✔✔ ✔
![Page 32: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/32.jpg)
扫码加入社群与志同道合的码友一起
Code Up
阿里云开发者社区 Apache Flink China 2群
粘贴二维码
![Page 33: Apache Flink的过去、现在和未来²尼- Apache... · Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud](https://reader033.vdocuments.net/reader033/viewer/2022042506/5f3d6c7ff3b72c13674281d3/html5/thumbnails/33.jpg)
谢谢!