apache flink meetup: sanjar akhmedov - joining infinity – windowless stream processing with flink
TRANSCRIPT
![Page 1: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/1.jpg)
Joining Infinity — Windowless Stream Processing with Flink
Sanjar Akhmedov, Software Engineer, ResearchGate
![Page 2: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/2.jpg)
It started when two researchers discovered first-
hand that collaborating with a friend or colleague on
the other side of the world was no easy task. There are many variations ofpassages of Lorem Ipsum
ResearchGate is a socialnetwork for scientists.
![Page 3: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/3.jpg)
Connect the world of science.Make research open to all.
![Page 4: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/4.jpg)
Structured system
There are many variations ofpassages of Lorem Ipsum
We have, and arecontinuing to changehow scientificknowledge is shared anddiscovered.
![Page 5: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/5.jpg)
11,000,000+Members
110,000,000+Publications
1,300,000,000+Citations
![Page 6: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/6.jpg)
Feature: Research Timeline
![Page 7: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/7.jpg)
Feature: Research Timeline
![Page 8: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/8.jpg)
Diverse data sources
Proxy
Frontend
Services
memcache MongoDB Solr PostgreSQL
Infinispan HBaseMongoDB Solr
![Page 9: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/9.jpg)
Big data pipelineChange
datacapture
Import
Hadoop cluster
Export
![Page 10: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/10.jpg)
Data Model
Account Publication
Claim
1 *
Author
Authorship
1*
![Page 11: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/11.jpg)
Hypothetical SQL
PublicationAuthorship
1*
CREATE TABLE publications (id SERIAL PRIMARY KEY,author_ids INTEGER[]
);
AccountClaim
1 *
Author
CREATE TABLE accounts (id SERIAL PRIMARY KEY,claimed_author_ids INTEGER[]
);
CREATE MATERIALIZED VIEW account_publicationsREFRESH FAST ON COMMITASSELECTaccounts.id AS account_id,publications.id AS publication_id
FROM accountsJOIN publicationsON ANY (accounts.claimed_author_ids) = ANY (publications.author_ids);
![Page 12: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/12.jpg)
• Data sources are distributed across different DBs
• Dataset doesn’t fit in memory on a single machine
• Join process must be fault tolerant
• Deploy changes fast
• Up-to-date join result in near real-time
• Join result must be accurate
Challenges
![Page 13: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/13.jpg)
Change data capture (CDC)
User Microservice DBRequest Write
Cache
Sync
Solr/ES
Sync
HBase/HDFS
Sync
![Page 14: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/14.jpg)
Change data capture (CDC)
User Microservice DBRequest Write
Log
K2
1
K1
4
Extract
![Page 15: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/15.jpg)
Change data capture (CDC)
User Microservice DBRequest Write
Log
K2
1
K1
4
K1
Ø
Extract
![Page 16: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/16.jpg)
Change data capture (CDC)
User Microservice DBRequest Write
Log
K2
1
K1
4
K1
Ø
KN
42…
Extract
![Page 17: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/17.jpg)
Change data capture (CDC)
User Microservice DB
Cache
Request Write
Log
K2
1
K1
4
K1
Ø
KN
42…
Extract
Sync
![Page 18: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/18.jpg)
Change data capture (CDC)
User Microservice DB
Cache
Request Write
Log
K2
1
K1
4
K1
Ø
KN
42…
Extract
Sync
HBase/HDFSSolr/ES
![Page 19: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/19.jpg)
Join two CDC streams into one
NoSQL1
SQL Kafka
Kafka
Flink Streaming Join Kafka NoSQL2
![Page 20: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/20.jpg)
Flink job topology
Accounts Stream
Join(CoFlatMap)
AccountPublications
PublicationsStream
…
Author 2
Author 1
Author N
![Page 21: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/21.jpg)
DataStream<Account> accounts = kafkaTopic("accounts");DataStream<Publication> publications = kafkaTopic("publications");DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {authorAccount.update(account.id);if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {authorPublication.update(publication.id);if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}});
Prototype implementation
![Page 22: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/22.jpg)
DataStream<Account> accounts = kafkaTopic("accounts");DataStream<Publication> publications = kafkaTopic("publications");DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {authorAccount.update(account.id);if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {authorPublication.update(publication.id);if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}});
Prototype implementation
![Page 23: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/23.jpg)
DataStream<Account> accounts = kafkaTopic("accounts");DataStream<Publication> publications = kafkaTopic("publications");DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {authorAccount.update(account.id);if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {authorPublication.update(publication.id);if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}});
Prototype implementation
![Page 24: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/24.jpg)
DataStream<Account> accounts = kafkaTopic("accounts");DataStream<Publication> publications = kafkaTopic("publications");DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {authorAccount.update(account.id);if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {authorPublication.update(publication.id);if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}});
Prototype implementation
![Page 25: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/25.jpg)
DataStream<Account> accounts = kafkaTopic("accounts");DataStream<Publication> publications = kafkaTopic("publications");DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {authorAccount.update(account.id);if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {authorPublication.update(publication.id);if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}});
Prototype implementation
![Page 26: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/26.jpg)
DataStream<Account> accounts = kafkaTopic("accounts");DataStream<Publication> publications = kafkaTopic("publications");DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {authorAccount.update(account.id);if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {authorPublication.update(publication.id);if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}});
Prototype implementation
![Page 27: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/27.jpg)
DataStream<Account> accounts = kafkaTopic("accounts");DataStream<Publication> publications = kafkaTopic("publications");DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {authorAccount.update(account.id);if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {authorPublication.update(publication.id);if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));}
}});
Prototype implementation
![Page 28: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/28.jpg)
Example dataflowAccount Publications
Accounts
Alice 2
Publications
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Author 2
Author N
…
![Page 29: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/29.jpg)
Example dataflowAccount Publications
Accounts
Alice 2
Publications
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Author 2
Author N
…
![Page 30: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/30.jpg)
Example dataflowAccount Publications
Accounts
Alice 2
Publications
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Author 2
Alice
Author N
…
![Page 31: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/31.jpg)
Example dataflowAccount Publications
Accounts
Alice 2
Bob 1
Publications
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Author 2
Alice
Author N
…
![Page 32: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/32.jpg)
Example dataflowAccount Publications
Accounts
Alice 2
Bob 1
Publications
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Author 2
Alice
Author N
…
(Bob, 1)
![Page 33: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/33.jpg)
Example dataflowAccount Publications
Accounts
Alice 2
Bob 1
Publications
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob
Author 2
Alice
Author N
…
(Bob, 1)
![Page 34: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/34.jpg)
Example dataflowAccount Publications
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob
Author 2
Alice
Author N
…
![Page 35: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/35.jpg)
Example dataflowAccount Publications
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob
Author 2
Alice
Author N
…
![Page 36: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/36.jpg)
Example dataflowAccount Publications
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
![Page 37: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/37.jpg)
Example dataflowAccount Publications
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
![Page 38: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/38.jpg)
Example dataflowAccount Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
![Page 39: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/39.jpg)
• ✔ Data sources are distributed across different DBs
• ✔ Dataset doesn’t fit in memory on a single machine
• ✔ Join process must be fault tolerant
• ✔ Deploy changes fast
• ✔ Up-to-date join result in near real-time
• ? Join result must be accurate
Challenges
![Page 40: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/40.jpg)
Paper1 gets deletedAccount Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
![Page 41: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/41.jpg)
Paper1 gets deletedAccount Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
![Page 42: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/42.jpg)
Paper1 gets deletedAccount Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
?
![Page 43: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/43.jpg)
Paper1 gets deletedAccount Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Need previousvalue
![Page 44: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/44.jpg)
Paper1 gets deletedAccount Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts Stream
Join
AccountPublications
Diff withPrevious
State
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
![Page 45: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/45.jpg)
Paper1 gets deletedAccount Publications
K1 (Bob, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts Stream
Join
AccountPublications
Diff withPrevious
State
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
![Page 46: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/46.jpg)
Paper1 gets deletedAccount Publications
K1 (Bob, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts Stream
Join
AccountPublications
Diff withPrevious
State
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Need K1 here,e.g. K1 = 𝒇(Bob, Paper1)
![Page 47: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/47.jpg)
Paper1 gets updatedAccount Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 2
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
![Page 48: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/48.jpg)
Paper1 gets updatedAccount Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 2
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
![Page 49: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/49.jpg)
Paper1 gets updatedAccount Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 2
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
![Page 50: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/50.jpg)
Paper1 gets updatedAccount Publications
K1 (Bob, Paper1)
(Alice, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 2
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
(Alice, Paper1)
![Page 51: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/51.jpg)
Paper1 gets updatedAccount Publications
K1 (Bob, Paper1)
?? (Alice, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 2
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
![Page 52: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/52.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
![Page 53: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/53.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
Accounts
Alice 2
Bob 1
Bob Ø
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
![Page 54: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/54.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
Accounts
Alice 2
Bob 1
Bob Ø
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
![Page 55: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/55.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Ø Paper1
Author 2
Alice Paper1
Author N
…
![Page 56: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/56.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Ø Paper1
Author 2
Alice Paper1
Author N
…
![Page 57: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/57.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Ø Paper1
Author 2
Alice Paper1
Author N
…
![Page 58: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/58.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Ø Paper1
Author 2
Alice Paper1
Author N
…
2. (Alice, 1)
![Page 59: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/59.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Alice Paper1
Author 2
Ø Paper1
Author N
…
2. (Alice, 1)
![Page 60: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/60.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Alice Paper1
Author 2
Ø Paper1
Author N
…
2. (Alice, 1)
(Alice, Paper1)
![Page 61: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/61.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
K2 (Alice, Paper1)
K2 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Alice Paper1
Author 2
Ø Paper1
Author N
…
2. (Alice, 1)
(Alice, Paper1)
![Page 62: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/62.jpg)
Alice claims Paper1 via different authorAccount Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
K3 (Alice, Paper1)
K2 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts Stream
Join
AccountPublications
PublicationsStream
Author 1
Alice Paper1
Author 2
Ø Paper1
Author N
…
2. (Alice, 1)
(Alice, Paper1)
Pick correct natural IDse.g. K3 = 𝒇(Alice, Author1, Paper1)
![Page 63: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/63.jpg)
• Keep previous element state to updateprevious join result
• Stream elements are not domain entitiesbut commands such as delete or upsert
• Joined stream must have natural IDsto propagate deletes and updates
How to solve deletes and updates
![Page 64: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/64.jpg)
Generic join graph
AccountPublications
Accounts Stream
PublicationsStream
Diff
Alice
Bob
…
Diff
Paper1
PaperN
…
Join
Author1
AuthorM
…
![Page 65: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/65.jpg)
Generic join graph
Operate on commands
AccountPublications
Accounts Stream
PublicationsStream
Diff
Alice
Bob
…
Diff
Paper1
PaperN
…
Join
Author1
AuthorM
…
![Page 66: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/66.jpg)
Memory requirements
AccountPublications
Accounts Stream
PublicationsStream
Diff
Alice
Bob
…
Diff
Paper1
PaperN
…
Join
Author1
AuthorM
…
Full copy ofAccounts stream
Full copy ofPublications
stream
Full copy ofAccounts stream
on left side
Full copy ofPublications stream
on right side
![Page 67: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/67.jpg)
Network load
AccountPublications
Accounts Stream
PublicationsStream
Diff
Alice
Bob
…
Diff
Paper1
PaperN
…
Join
Author1
AuthorM
…
Reshuffle Reshuffle
Network
Network
![Page 68: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/68.jpg)
• In addition to handling Kafka traffic we need to reshuffle all data twice over the network
• We need to keep two full copies of each joined stream in memory
Resource considerations
![Page 69: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/69.jpg)
Questions
We are hiring - www.researchgate.net/careers
![Page 70: Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac3d261a28ab145e8b672f/html5/thumbnails/70.jpg)