reducing microservice complexity with kafka and reactive streams
TRANSCRIPT
Reducing Microservice Complexity with Kafka and Reactive Streams
Jim Riecken
Reducing Microservice Complexity with Kafka and Reactive Streams
Senior Software DeveloperJim Riecken
@jimriecken - [email protected]
@jimriecken
• Monolith to Microservices + Complexity
• Asynchronous Messaging• Kafka• Reactive Streams + Akka Streams
Agenda
• Details on how to set up a Kafka cluster
• In-depth tutorial on Akka Streams
Anti-Agenda
Monolith to Microservices
M
Efficie
ncy
Time
MS1
S2
F
S1
S2
S3
S4
S5
Efficie
ncy
Time
•Small•Scalable•Independent•Easy to Create•Clear ownership
Network Calls
•Latency•Failure
~99.5%
Reliability
99.9% 99.9% 99.9% 99.9%
Coordination
•Between services
•Between teams
AsynchronousMessaging
Message Bus
Synchronous
Asynchronous
• Decoupling•Pub/Sub
• Less coordination•Additional consumers are easy
•Help scale organization
Why?
•Well-defined delivery semantics
• High-Throughput• Highly-Available• Durable• Scalable• Backpressure
Messaging Requirements
Kafka
• Distributed, partitioned, replicated commit log service
• Pub/Sub messaging functionality• Created by LinkedIn, now an
Apache open-source project
What is Kafka?
Producers
Kafka Brokers
Consumers
0 | 1 | 2 | 3 | 4 | 5
0 | 1 | 2 | 3 | 4 | 5 | 6
0 | 1 | 2 | 3
P0
P1
P2
New Messages Appended
Topic
Topics + Partitions
• Send messages to topics• Responsible for choosing which
partition to send to•Round-robin•Consistent hashing based on a message key
Producers
• Pull messages from topics• Track their own offset in each
partition
Consumers
P0 P1 P2
1 2 3 4 5 6
Topic
Group 1 Group 2
How does Kafka meet the
requirements?
• Hundreds of MB/s of reads/writes from thousands of concurrent clients
• LinkedIn (2015)•800 billion messages per day (18
million/s peak)•175 TB of data produced per day•> 1000 servers in 60 clusters
Kafka is Fast
• Brokers•All data is persisted to disk•Partitions replicated to other nodes
• Consumers•Start where they left off
• Producers•Can retry - at-least-once messaging
Kafka is Resilient
• Capacity can be added at runtime with zero downtime
•More servers => more disk space• Topics can be larger than any
single node could hold• Additional partitions can be added
to add more parallelism
Kafka is Scalable
• Large storage capacity•Topic retention is a Consumer SLA
• Almost impossible for a fast producer to overload a slow consumer
•Allows real-time as well as batch consumption
Kafka Helps with Back-Pressure
Message Data Format
• Array[Byte]• Serialization?• JSON?• Protocol Buffers
•Binary - Fast•IDL - Code Generation•Message evolution
Messages
Processing Data with Reactive
Streams
• Standard for async stream processing with non-blocking back-pressure
•Subscriber signals demand to publisher
•Publisher sends no more than demand
• Low-level• Mainly meant for library authors
Reactive Streams
Publisher[T] Subscriber[T]
onSubscribe(s: Subscription)onNext(t: T)onComplete()onError(t: Throwable)
Subscription
subscribe(s: Subscriber[-T])
request(n: Long)cancel()
Processing Data with Akka Streams
• Library on top of Akka Actors and Reactive Streams
• Process sequences of elements using bounded buffer space
• Strongly Typed
Akka Streams
Flow
Source
SinkFanOut
FanIn
Concepts
Runnable Graph
Concepts
Composition
• Turning on the tap•Create actors•Open files/sockets/other
resources•Materialized values
•Source: Actor, Promise, Subscriber
•Sink: Actor, Future, Producer
Materialization
Reactive Kafka
• https://github.com/akka/reactive-kafka
• Akka Streams wrapper around Kafka API
•Consumer Source•Producer Sink
Reactive Kafka
• Sink - sends message to Kafka topic•Flow - sends message to Kafka topic + emits result downstream
• When the stream completes/fails the connection to Kafka will be automatically closed
Producer
• Source - pulls messages from Kafka topics
• Offset Management• Back-pressure• Materialization
•Object that can stop the consumer (and complete the stream)
Consumer
Simple Producer Example implicit val system = ActorSystem("producer-test")
implicit val materializer = ActorMaterializer()
val producerSettings = ProducerSettings(
system, new ByteArraySerializer, new StringSerializer
).withBootstrapServers("localhost:9092")
Source(1 to 100)
.map(i => s"Message $i")
.map(m => new ProducerRecord[Array[Byte], String]
("lower", m))
.to(Producer.plainSink(producerSettings)).run()
Simple Consumer Example implicit val system = ActorSystem("producer-test")
implicit val materializer = ActorMaterializer()
val consumerSettings = ConsumerSettings(
system, new ByteArrayDeserializer, new StringDeserializer, Set("lower")
).withBootstrapServers("localhost:9092").withGroupId("test-group")
val control =
Consumer.atMostOnceSource(consumerSettings.withClientId("client1"))
.map(record => record.value)
.to(Sink.foreach(v => println(v))).run()
control.stop()
val control =
Consumer.committableSource(consumerSettings.withClientId("client1"))
.map { msg =>
val upper = msg.value.toUpperCase
Producer.Message(
new ProducerRecord[Array[Byte], String]("upper", upper),
msg.committableOffset)
}.to(Producer.commitableSink(producerSettings)).run()
control.stop()
Combined Example
Demo
Wrap-Up
• Microservices have many advantages, but can introduce failure and complexity.
• Asynchronous messaging can help reduce this complexity and Kafka is a great option.
• Akka Streams makes reliably processing data from Kafka with back-pressure easy
Wrap-Up
Thank you!Questions?
@jimriecken - [email protected] Riecken