apache kafka - 'a system optimized for writing' · kafka...
TRANSCRIPT
![Page 1: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/1.jpg)
Apache Kafka......”a system optimized for writing”
Bernhard Hopfenmüller
26. August 2018
![Page 2: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/2.jpg)
whoami
Bernhard HopfenmüllerIT Consultant @ ATIX AG
IRC: Fobhepgithub.com/Fobhep
#atix #froscon2018
![Page 3: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/3.jpg)
whoarewe
Why do we need a messaging system?
The Linux & Open Source CompanyUnterschleißheim @ München
over 15 Jahrendatacenter automation, Linux
Consulting, Engineering, Support,Training
#atix #froscon2018
![Page 4: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/4.jpg)
Kafka
Quora.com What is the relation between Kafka, the writer, andApache Kafka, the distributed messaging system?
Jay Kreps: I thought that since Kafka was a system optimized forwriting using a writer’s name would make sense. I had taken a lotof lit classes in colleague and liked Franz Kafka. Plus the namesounded cool for an OS project
#atix #froscon2018
![Page 5: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/5.jpg)
I developed by LinkedIn, Open Source since 2011
I 2014 foundation of Confluent
#atix #froscon2018
![Page 6: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/6.jpg)
Messaging-Systems
Why do we need a messaging system?
I Challenge 1: System B notavailable no caching
I Challenge 2: System A sending toomuch (DoS)
I Challenge 3: System B crashingupon processing Source[1]
#atix #froscon2018
![Page 7: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/7.jpg)
Queues vs Topics
Supermarket vs Television
Supermarket Wait until it’s your turn
Each topic is received once!Source[1]
Television Choose what you want toreceive
topics are received any number of times!Source[1]
#atix #froscon2018
![Page 8: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/8.jpg)
Kafka-Basic structure
Source[2]
#atix #froscon2018
![Page 9: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/9.jpg)
Topics I
I core component of Kafka
I is filled by producer
I consists of one or more partitions
1 102 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9partition 2
old
0
0
0
partition 1
partition 0
producernew data
new
#atix #froscon2018
![Page 10: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/10.jpg)
Topics II
I producer can choose partition
I partition has running offset
I message is identified by offset
1 102 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9partition 2
old
0
0
0
partition 1
partition 0
producernew data
new
#atix #froscon2018
![Page 11: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/11.jpg)
Topics III
1 102 3 4 5 6 7 8 90 11 12
segment 0 4 8 12
1314 15
*.log *.index
directory
partition
I messages are stored physically!
I key-value principle
I Clean-Up policies:
#atix #froscon2018
![Page 12: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/12.jpg)
Topics IV
1 102 3 4 5 6 7 8 90 11offset
1 102 3 4 5 6 7 8 90 11
key
value
9 118
9 118 10
10
1 2 30 1 13 33 220
time/size retention
certain time/size
1 33 2
I Clean-Up policies:
I default: Retention-time(delete old data after x days)
I Retention-size(delete old data if datamemory > x)
#atix #froscon2018
![Page 13: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/13.jpg)
Topics V
1 102 3 4 5 6 7 8 90 11offset
1 102 3 4 5 6 7 8 90 11
key
value
8 105
8 105 11
11
1 2 30 1 13 33 220
1 2 30
log compaction
I Clean-Up policies:
I default: Retention-time(delete old data after x days)
I Retention-size(delete old data if datamemory > x)
I Log-Compaction(replace old value to key withnew)
#atix #froscon2018
![Page 14: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/14.jpg)
Topic consumption
I topics are pulled! (no DoS)
I any existing data can be pulled
#atix #froscon2018
![Page 15: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/15.jpg)
Consumer Groups
I 1 message is read 1 time in agroup
I 1 consumer can read x partitions
I 1 partition can be read by 1consumer
I parallelism allows high throughput
I never more consumers thanpartitions
Source[1]
#atix #froscon2018
![Page 16: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/16.jpg)
Replication
I implemented on partition level
I replicum = redundant unit ofpartition
I replica distributed over cluster
I replica have 1 leader and nfollowers
I producer writes to leader!
I leader copies to followersSource[3]
#atix #froscon2018
![Page 17: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/17.jpg)
Did somebody hear my message?
Producer decides if message was successfully sentConfiguration possibilities:
I as soon as sent
I as soon as received by first broker
I as soon as desired number of replica existKafka features exactly-once-semantics!
#atix #froscon2018
![Page 18: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/18.jpg)
Broker and ZooKeeper
I Brokers are stateless!
I Which Broker is alive?
I Broker communication?
I → ZooKeeper!
Source[4]
#atix #froscon2018
![Page 19: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/19.jpg)
ZooKeeper
I distributed, hierachical file system
I management of znodes()
I HA via ensemble (=ZooKeepercluster)
I ZooKeeper leader ! = Kafka leader
Source[4]
#atix #froscon2018
![Page 20: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/20.jpg)
Use Cases
I Messaging (ActiveMQ or RabbitMQ)
I Website Activity Tracking
I Metrics
I Log Aggregation
I Stream Processing
I Apache Storm and Apache Samza.
I Commit Log
#atix #froscon2018
![Page 21: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/21.jpg)
Who likes Kafka?
I Apple Inc.
I Cisco Systems
I Cloudflare
I eBay
I Netflix (Monitoring!)
I The New York Times ( Kafka as data storage! Super awesome blogpost)
I PayPal
I Spotify
I Twitter
I Uber (Kafka = Backbone!!!)
#atix #froscon2018
![Page 22: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/22.jpg)
Summary
streams...
I ... publish, subscribe
I ... persistent, failsafe
I ... real time processing
I combining streams with DFS
Kafka moves complication to application
#atix #froscon2018
![Page 23: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f024bfe7e708231d40390db/html5/thumbnails/23.jpg)
Sources
1 https://www.informatik-aktuell.de/betrieb/verfuegbarkeit/apache-kafka-eine-schluesselplattform-fuer-hochskalierbare-systeme.html
2 https://thecattlecrew.net/2017/09/28/apache-kafka-im-detail-teil-1/ andhttps://thecattlecrew.net/2017/09/28/apache-kafka-im-detail-teil-2/
3 https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/
4 https://www.infoq.com/articles/apache-kafka
#atix #froscon2018