2015-04-15 | apache kafka (vienna scala user group)
TRANSCRIPT
Dominik Gruber, @the_dom Scala Vienna User Group – April 15, 2015
Apache Kafka
Dominik Gruber • @the_domApache Kafka
Apache Kafka• Originally developed by LinkedIn
• Open Sourced in 2011
• Written in Scala
• Clients for every popular language
• Version 0.8.2.1
• http://kafka.apache.org
Dominik Gruber • @the_domApache Kafka
Users• Everyone, really …
• LinkedIn, Yahoo!, Twitter, Netflix, Square, Spotify, Pinterest, Uber, Goldman Sachs, Tumblr, PayPal, Box, Airbnb, Mozilla, Cisco, Foursquare,…
• https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Dominik Gruber • @the_domApache Kafka
Apache Kafka
“A high throughput distributed messaging system.”
Dominik Gruber • @the_domApache Kafka
Apache Kafka
“Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.”
Dominik Gruber • @the_domApache Kafka
Apache Kafka
“Kafka is a distributed, partitioned, replicated commit log service.”
Dominik Gruber • @the_domApache Kafka
Claims
• Fast
• Scalable
• Durable
• Distributed by Design
Dominik Gruber • @the_domApache Kafka
Claims• Fast
• A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.
• Scalable
• Durable
• Distributed by Design
Dominik Gruber • @the_domApache Kafka
Claims• Fast
• Scalable
• Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine (…)
• Durable
• Distributed by Design
Dominik Gruber • @the_domApache Kafka
Claims• Fast
• Scalable
• Durable
• Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
• Distributed by Design
Dominik Gruber • @the_domApache Kafka
Claims• Fast
• Scalable
• Durable
• Distributed by Design
• Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
Dominik Gruber • @the_domApache Kafka
Design
http://kafka.apache.org/documentation.html
Dominik Gruber • @the_domApache Kafka
Design
http://kafka.apache.org/documentation.html
Dominik Gruber • @the_domApache Kafka
Design
“The performance of linear writes on a JBOD configuration with six 7200rpm SATA RAID-5
array is about 600MB/sec but the performance of random writes is only about 100k/sec—a
difference of over 6000X.”
Dominik Gruber • @the_domApache Kafka
Design
http://kafka.apache.org/documentation.html
Dominik Gruber • @the_domApache Kafka
Use Cases
“We designed Kafka to be able to act as a unified platform for handling all the real-time data feeds a large company
might have.”
Dominik Gruber • @the_domApache Kafka
Use Cases• Messaging
• Website Activity Tracking
• Metrics
• Log Aggregation
• Stream Processing
• Event Sourcing
• Commit Log
Dominik Gruber • @the_domApache Kafka
Demo
Dominik Gruber • @the_domApache Kafka
Q & A
Dominik Gruber • @the_domApache Kafka
Further reading• http://engineering.linkedin.com/distributed-systems/log-
what-every-software-engineer-should-know-about-real-time-datas-unifying
• http://blog.confluent.io/2015/04/07/hands-free-kafka-replication-a-lesson-in-operational-simplicity
• http://www.slideshare.net/wangxia5/netflix-kafka
• https://metamarkets.com/2015/simplicity-stability-and-transparency-how-samza-makes-data-integration-a-breeze