2015-04-15 | apache kafka (vienna scala user group)

20
Dominik Gruber, @the_dom Scala Vienna User Group – April 15, 2015 Apache Kafka

Upload: dominik-gruber

Post on 16-Jul-2015

140 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber, @the_dom Scala Vienna User Group – April 15, 2015

Apache Kafka

Page 2: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Apache Kafka• Originally developed by LinkedIn

• Open Sourced in 2011

• Written in Scala

• Clients for every popular language

• Version 0.8.2.1

• http://kafka.apache.org

Page 3: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Users• Everyone, really …

• LinkedIn, Yahoo!, Twitter, Netflix, Square, Spotify, Pinterest, Uber, Goldman Sachs, Tumblr, PayPal, Box, Airbnb, Mozilla, Cisco, Foursquare,…

• https://cwiki.apache.org/confluence/display/KAFKA/Powered+By

Page 4: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Apache Kafka

“A high throughput distributed messaging system.”

Page 5: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Apache Kafka

“Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.”

Page 6: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Apache Kafka

“Kafka is a distributed, partitioned, replicated commit log service.”

Page 7: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Claims

• Fast

• Scalable

• Durable

• Distributed by Design

Page 8: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Claims• Fast

• A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.

• Scalable

• Durable

• Distributed by Design

Page 9: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Claims• Fast

• Scalable

• Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine (…)

• Durable

• Distributed by Design

Page 10: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Claims• Fast

• Scalable

• Durable

• Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.

• Distributed by Design

Page 11: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Claims• Fast

• Scalable

• Durable

• Distributed by Design

• Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.

Page 12: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Design

http://kafka.apache.org/documentation.html

Page 13: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Design

http://kafka.apache.org/documentation.html

Page 14: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Design

“The performance of linear writes on a JBOD configuration with six 7200rpm SATA RAID-5

array is about 600MB/sec but the performance of random writes is only about 100k/sec—a

difference of over 6000X.”

Page 15: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Design

http://kafka.apache.org/documentation.html

Page 16: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Use Cases

“We designed Kafka to be able to act as a unified platform for handling all the real-time data feeds a large company

might have.”

Page 17: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Use Cases• Messaging

• Website Activity Tracking

• Metrics

• Log Aggregation

• Stream Processing

• Event Sourcing

• Commit Log

Page 18: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Demo

Page 19: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Q & A

Page 20: 2015-04-15 | Apache Kafka (Vienna Scala User Group)

Dominik Gruber • @the_domApache Kafka

Further reading• http://engineering.linkedin.com/distributed-systems/log-

what-every-software-engineer-should-know-about-real-time-datas-unifying

• http://blog.confluent.io/2015/04/07/hands-free-kafka-replication-a-lesson-in-operational-simplicity

• http://www.slideshare.net/wangxia5/netflix-kafka

• https://metamarkets.com/2015/simplicity-stability-and-transparency-how-samza-makes-data-integration-a-breeze