samza at linkedin: taking stream processing to the next level
TRANSCRIPT
Apache Samza
Apache Samza: Taking stream processingto the next levelMartin Kleppmann @martinkl
Martin KleppmannHacker, designer, inventor, entrepreneur
Co-founded two startups, Rapportive LinkedInCommitter on Avro & Samza ApacheWriting book on data-intensive apps OReillymartinkl.com | @martinkl
Apache Kafka
Apache Samza
Apache KafkaApache Samza
Credit: Jason Walsh on Flickrhttps://www.flickr.com/photos/22882695@N00/2477241427/Credit: Lucas Richarz on Flickrhttps://www.flickr.com/photos/22057420@N00/5690285549/
Things we would like to do(better)
Provide timely, relevant updates to your newsfeed
Update search results with new information as it appears
Real-time analysis of logs and metrics
Tools?Response latencyKafka & SamzaMilliseconds to minutesLoosely coupledRESTSynchronousClosely coupledHours to daysLoosely coupled
9
Service 1Kafkaevents/messagesAnalyticsCache maintenanceNotificationssubscribesubscribesubscribepublishpublishService 2
Publish / subscribeEvent / message = something happenedTracking:User x clicked y at time zData change:Key x, old value y, set to new value zLogging:Service x threw exception y in request zMetrics:Machine x had free memory y at time z
Many independent consumersHigh throughput (millions msgs/sec)Fairly low latency (a few ms)
Kafka at LinkedIn350+ Kafka brokers8,000+ topics140,000+ Partitions
278 Billion messages/day49 TB/day in176 TB/day out
Peak Load4.4 Million messages per second6 Gigabits/sec Inbound21 Gigabits/sec Outbound
public interface StreamTask {void process(IncomingMessageEnvelope envelope,MessageCollector collector,TaskCoordinator coordinator);}Samza API: processing messagesgetKey(), getMsg()commit(), shutdown()sendMsg(topic, key, value)
Also provide interfaces for windowing tasks that are called specific amounts of timeAlso provide methods for initialization, configuration, etc.Checkpointing is handled behind the scenes13
Familiar ideas from MR/Pig/Cascading/Filterrecords matching conditionMaprecord func(record)Jointwo/more datasets by keyGrouprecords with the same value in fieldAggregate records within the same groupPipejob 1s output job 2s input
MapReduce assumes fixed dataset.Can we adapt this to unbounded streams?
Operations on streamsFilterrecords matching condition easyMap record func(record) easyJointwo/more datasets by keywithin time window, need bufferGrouprecords with the same value in fieldwithin time window, need bufferAggregaterecords within the same group ok when do you emit result?Pipejob 1s output job 2s input ok but what about faults?
Stateful stream processing(join, group, aggregate)
Joining streams requires stateUser goes to lunch click long after impressionQueue backlog click before impressionWindow joinJoin and aggregateClick-through rate
Key-valuestoreAd impressionsAd clicks
Remote state or local state?Samza job partition 0Samza job partition 1
e.g. Cassandra, MongoDB, 100-500k msg/sec/node100-500k msg/sec/node1-5k queries/sec??
Remote state or local state?Samza job partition 0Samza job partition 1
LocalLevelDB/RocksDBLocalLevelDB/RocksDB
Another example: Newsfeed & followingUser 138 followed user 582User 463 followed user 536User 582 posted: Im at Berlin Buzzwords and it rocksUser 507 unfollowed user 115User 536 posted: Nice weather today, going for a walkUser 981 followed user 575
Expected output: inbox (newsfeed) for each user
Newsfeed & followingFan out messages to followersDelivered messages
582 => [ 138, 721, ]Follow/unfollow eventsPosted messagesUser 582 posted: Im at BerlinBuzzwords and it rocksUser 138 followed user 582Notify user 138: {User 582 posted:Im at Berlin Buzzwords and it rocks}Push notifications etc.
Local state:
Bring computation and datatogether in one place
Fault tolerance
KafkaKafkaYARN NodeManager
YARN NodeManager
YARNRMSamza Container
Samza Container
Samza Container
Samza Container
Machine 1Machine 2
Task
Task
Task
Task
Task
Task
Task
TaskBOOM
KafkaKafkaYARN NodeManager
YARN NodeManager
YARNRMSamza Container
Samza Container
Samza Container
Samza Container
Machine 1Machine 2
Task
Task
Task
Task
Task
Task
Task
TaskBOOM
KafkaYARN NodeManager
Machine 3
YARN NodeManager
Samza Container
Samza Container
KafkaYARN NodeManager
Samza Container
Samza Container
Machine 2
Task
Task
Task
TaskKafkaMachine 3
Task
Task
Task
Task
Fault-tolerant local stateSamza job partition 0Samza job partition 1
LocalLevelDB/RocksDBLocalLevelDB/RocksDBDurable changelogreplicate writes
YARN NodeManager
Samza Container
Samza Container
KafkaYARN NodeManager
Samza Container
Samza Container
Machine 2
Task
Task
Task
TaskMachine 3
Task
Task
Task
Task
Kafka
Samzas fault-tolerant local stateEmbedded key-value: very fastMachine dies local key-value store is lost
Solution: replicate all writes to Kafka!Machine dies restart on another machineRestore key-value store from changelogChangelog compaction in the background (Kafka 0.8.1)
When things go slow
Owned byTeam XTeam YTeam ZCascades of jobsJob 1Stream BStream AJob 2Job 3Job 4Job 5
Consumer goes slowBackpressureQueue upDrop dataOther jobs grindto a halt Run out ofmemory Spill to diskOh wait Kafka does this anyway!
No thanks
Job 1Stream BStream AJob 2Job 3Job 4Job 5Job 1 outputJob 2 outputJob 3 output
Samza always writesjob output to KafkaMapReduce always writesjob output to HDFS
Every job output is a named streamOpen: Anyone can consume itRobust: If a consumer goes slow, nobody else is affectedDurable: Tolerates machine failureDebuggable: Just look at itScalable: Clean interface between teamsClean: loose coupling between jobs
ProblemSolutionNeed to buffer job outputfor downstream consumersWrite it to Kafka!Need to make local statestore fault-tolerantWrite it to Kafka!Need to checkpoint jobstate for recoveryWrite it to Kafka!Recap
Apache Kafka
Apache Samzakafka.apache.orgsamza.incubator.apache.org
Hello Samza (try Samza in 5 mins)
Thank you!Samza:Getting started:samza.incubator.apache.orgUnderlying thinking:bit.ly/jay_on_logsStart contributing:bit.ly/samza_newbie_issues
Me:Twitter:@martinklBlog:martinkl.com