apache kafka - martin podval
TRANSCRIPT
![Page 1: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/1.jpg)
ApacheKafka
@MartinPodval, hpsv.cz
![Page 2: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/2.jpg)
What is Apache Kafka?
Messaging SystemDistributedPersistent and ReplicableVery fast - low latency - and scalableSimple but highly configurableBy Linkedin, open sourced under apache.org
![Page 3: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/3.jpg)
Data Streaming
New kind of data ...● User or application data (events) streams● Monitoring - App, System● App Logging● High volume
![Page 4: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/4.jpg)
Data Streaming Cont’d
… you want to process● Using various components● Into a target form● Map, reduce, shuffle● Real time or batch
![Page 5: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/5.jpg)
HP Service Virtualization Use Cases
Process of clients message streams
Real-time performance modeling
Logs aggregation
![Page 6: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/6.jpg)
How To Solve It?
Producers and Consumers● Distributed● Decoupled● Configurable● Dynamic
![Page 7: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/7.jpg)
Kafka Cluster
Brokers● = Instances, Nodes● Topics● Partitions● Replicas
ZK● Coordination
![Page 8: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/8.jpg)
Kafka Topics
Commit Log● Immutable● Ordered● Sequential Offset
![Page 9: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/9.jpg)
Kafka Topics Cont’d
PartitionedIndependently:● Stored● Produced● Consumed
⇒ Scalable
Replicated● On partition basis● Different brokers
⇒ Fault Tolerant
![Page 10: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/10.jpg)
What Can I Do?
producer.write(topic_id, message);
consumer.read(topic_id, offset);
![Page 11: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/11.jpg)
I Want To Produce
● java/scala client● address of one or more brokers● choose a topic where to produce● highly configurable and tunable:
○ partitioner○ number of acks (async=0, master=1, replicas=1+?)○ batching, buffer size, timeouts, retries, ...
![Page 12: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/12.jpg)
I Want To Consume
High Level API● Groups abstraction
○ To All, To One○ To Some
● Stream API● Stores positions to support fault tolerance
![Page 13: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/13.jpg)
I Want To Consume Cont’d
Low Level● Java/scala client● Find a leader for a topic● Calculate an offset● Fetches messages
○ Re-consume if needed
![Page 14: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/14.jpg)
I Want To Consume Cont’d
Delivery Semantic:● At most once● At least once● Exactly once
![Page 15: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/15.jpg)
Kafka Internals - Disks
Avoid:● GC● Random disk
access
![Page 16: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/16.jpg)
Kafka Internals - Disks Cont’d
Disks are fast ...
… when properly used● sequential access - read ahead, write behind● rely on operating system
○ avoid heap, materialization and GC● it’s more like file copy over network
It’s easy … with immutable topics
![Page 17: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/17.jpg)
Kafka Internals - Replication
“In Sync” Replicas● Replication factor on partition basis● One leader + 0..n replicas● Replicas are consumers
○ “In Sync” if they are not “too far” behind a leader○ Batch sync
![Page 18: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/18.jpg)
Kafka Internals - Replication Cont’d
Tunable Trade-Offs● Producer’s write method:
○ Not blocked, async○ Waits for master ACK○ Waits for all in-sync replicas
● Consumer pulls only committed messages● Server’s minimum in-sync replicas
![Page 19: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/19.jpg)
Performance
“Incredible”
Scales with:● clients count, message size● number of replicas, partitions or topics
Depends on network and disk throughput
![Page 20: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/20.jpg)
Performance Cont’d
Our testing● 3 nodes, master + 2 replicas● 500 000 msg/s (100 bytes[])● 400 mbit/s - 1.2 gbit/s network throughput● end2end latency 2-3 ms
@see http://bit.ly/1FsIR9a
![Page 21: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/21.jpg)
Easy of Use
● No installation, just run a java/scala program
● Streams in files & dirs● Transparent zookeeper● Ecosystem
![Page 22: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/22.jpg)
Cons
● Beta version● Dependency on Zookeeper● The way how it is written in Scala● No easy way how to remove messages
![Page 23: Apache Kafka - Martin Podval](https://reader033.vdocuments.net/reader033/viewer/2022042507/55a939d41a28ab430a8b4855/html5/thumbnails/23.jpg)
Questions?