Download - An Introduction to Apache Kafka
![Page 1: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/1.jpg)
By Amir Sedighi@amirsedighi
Data Solutions Engineer at DatisPars
Nov 2014
![Page 2: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/2.jpg)
2
References
● http://kafka.apache.org/documentation.html
● http://www.slideshare.net/charmalloc/current-and-future-of-apache-kafka
● http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-single-node/
![Page 3: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/3.jpg)
3
At first data pipelining looks easy!
● It often starts with one data pipeline from a producer to a consumer.
![Page 4: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/4.jpg)
4
It looks pretty wise either to reuse things!
● Reusing the pipeline for new producers.
![Page 5: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/5.jpg)
5
We may handle some situations!
● Reusing added producers for new consumers.
![Page 6: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/6.jpg)
6
But we can't go far!
● Eventually the solution becomes the problem!
![Page 7: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/7.jpg)
7
The additional requirements make things complicated!
● By later developments it gets even worse!
![Page 8: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/8.jpg)
8
How to avoid this mess?
![Page 9: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/9.jpg)
9
Decoupling Data-Pipelines
![Page 10: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/10.jpg)
10
Message Delivery Semantics
● At most once
– Messages may be lost by are never delivered.
● At least once
– Messages are never lost byt may be redliverd.
● Exactly once
– This is what people actually want.
![Page 11: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/11.jpg)
11
Apache Kafka is publish-subscribe messaging
rethought as a distributed commit log.
![Page 12: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/12.jpg)
12
Apache Kafka
● Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.
– Kafka is super fast.
– Kafka is scalable.
– Kafka is durable.
– Kafka is distributed by design.
![Page 13: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/13.jpg)
13
Apache Kafka
● Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.
– Kafka is super fast.
– Kafka is scalable.
– Kafka is durable.
– Kafka is distributed by design.
![Page 14: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/14.jpg)
14
Apache Kafka
● Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.
– Kafka is super fast.
– Kafka is scalable.
– Kafka is durable.
– Kafka is distributed by design.
![Page 15: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/15.jpg)
15
Apache Kafka
● A single Kafka broker (server) can handle hundreds of megabytes of reads and writes per second from thousands of clients.
![Page 16: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/16.jpg)
16
Apache Kafka
● Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.
– Kafka is super fast.
– Kafka is scalable.
– Kafka is durable.
– Kafka is distributed by design.
![Page 17: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/17.jpg)
17
Apache Kafka
● Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime.
![Page 18: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/18.jpg)
18
Apache Kafka
● Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.
– Kafka is super fast.
– Kafka is scalable.
– Kafka is durable.
– Kafka is distributed by design.
![Page 19: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/19.jpg)
19
Apache Kafka
● Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
![Page 20: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/20.jpg)
20
Apache Kafka
● Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.
– Kafka is super fast.
– Kafka is scalable.
– Kafka is durable.
– Kafka is distributed by design.
![Page 21: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/21.jpg)
21
Apache Kafka
● Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
![Page 22: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/22.jpg)
22
Kafka in Linkedin
![Page 23: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/23.jpg)
23
![Page 24: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/24.jpg)
24
Kafka is a distributed, partitioned, replicated commit log service.
![Page 25: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/25.jpg)
25
Main Components
● Topic
● Producer
● Consumer
● Broker
![Page 26: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/26.jpg)
26
Topic
● Topic
● Producer
● Consumer
● Broker
● Kafka maintains feeds of messages in categories called topics.
● Topics are the highest level of abstraction that Kafka provides.
![Page 27: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/27.jpg)
27
Topic
![Page 28: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/28.jpg)
28
Topic
![Page 29: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/29.jpg)
29
Topic
![Page 30: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/30.jpg)
30
Producer
● Topic
● Producer
● Consumer
● Broker
● We'll call processes that publish messages to a Kafka topic producers.
![Page 31: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/31.jpg)
31
Producer
![Page 32: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/32.jpg)
32
Producer
![Page 33: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/33.jpg)
33
Producer
![Page 34: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/34.jpg)
34
Consumer
● Topic
● Producer
● Consumer
● Broker
● We'll call processes that subscribe to topics and process the feed of published messages, consumers.
– Hadoop Consumer
![Page 35: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/35.jpg)
35
Consumer
![Page 36: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/36.jpg)
36
Broker
● Topic
● Producer
● Consumer
● Broker
● Kafka is run as a cluster comprised of one or more servers each of which is called a broker.
![Page 37: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/37.jpg)
37
Broker
![Page 38: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/38.jpg)
38
Broker
![Page 39: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/39.jpg)
39
Topics
● A topic is a category or feed name to which messages are published.
● Kafka cluster maintains a partitioned log for each topic.
![Page 40: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/40.jpg)
40
Partition
● Is an ordered, immutable sequence of messages that is continually appended to a commit log.
● The messages in the partitions are each assigned a sequential id number called the offset.
![Page 41: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/41.jpg)
41
Partition
![Page 42: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/42.jpg)
42
Again Topic and Partition
![Page 43: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/43.jpg)
43
Log Compaction
![Page 44: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/44.jpg)
44
Producer
● The producer is responsible for choosing which message to assign to which partition within the topic.
– Round-Robin
– Load-Balanced
– Key-Based (Semantic-Oriented)
![Page 45: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/45.jpg)
45
Log Compaction
![Page 46: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/46.jpg)
46
How a Kafka cluster looks Like?
![Page 47: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/47.jpg)
47
How Kafka replicates a Topic's partitions through the cluster?
![Page 48: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/48.jpg)
48
Logical Consumers
![Page 49: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/49.jpg)
49
What if we put jobs (Processors) cross the flow?
![Page 51: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/51.jpg)
51
Run Zookeeper
● bin/zookeeper-server-start.sh config/zookeeper.properties
![Page 52: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/52.jpg)
52
Run kafka-server
● bin/kafka-server-start.sh config/server.properties
![Page 53: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/53.jpg)
53
Create Topic
● bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
> Created topic "test".
![Page 54: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/54.jpg)
54
List all Topics
● bin/kafka-topics.sh --list --zookeeper localhost:2181
![Page 55: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/55.jpg)
55
Send some Messages by Producer
● bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Hello DatisPars Guys!
How is it going with you?
![Page 56: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/56.jpg)
56
Start a Consumer
● bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
![Page 57: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/57.jpg)
57
Producing ...
![Page 58: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/58.jpg)
58
Consuming
![Page 59: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/59.jpg)
59
Use Cases
● Messaging
– Kafka is comparable to traditional messaging systems such as ActiveMQ and RabbitMQ.
● Kafka provides customizable latency● Kafka has better throughput● Kafka is highly Fault-tolerance
![Page 60: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/60.jpg)
60
Use Cases
● Log Aggregation
– Many people use Kafka as a replacement for a log aggregation solution.
– Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing.
– In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.
● Lower-latency● Easier support
![Page 61: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/61.jpg)
61
Use Cases
● Stream Processing– Storm and Samza are popular frameworks for stream processing. They
both use Kafka.
● Event Sourcing– Event sourcing is a style of application design where state changes are
logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.
● Commit Log– Kafka can serve as a kind of external commit-log for a distributed
system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data.
![Page 62: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/62.jpg)
62
Message Format
● /** ● * A message. The format of an N byte message is the following: ● * If magic byte is 0 ● * 1. 1 byte "magic" identifier to allow format changes ● * 2. 4 byte CRC32 of the payload ● * 3. N - 5 byte payload ● * If magic byte is 1 ● * 1. 1 byte "magic" identifier to allow format changes ● * 2. 1 byte "attributes" identifier to allow annotations on the message independent of the
version (e.g. compression enabled, type of codec used) ● * 3. 4 byte CRC32 of the payload ● * 4. N - 6 byte payload ● */
![Page 63: An Introduction to Apache Kafka](https://reader031.vdocuments.net/reader031/viewer/2022013123/58f9a983760da3da068b6f6f/html5/thumbnails/63.jpg)
63
Questions?