extending the yahoo streaming benchmark
TRANSCRIPT
![Page 2: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/2.jpg)
Who am I?• Director of Applications Engineering at data
Artisans• Previously working on streaming
computation at Twitter, Gnip and Boulder Imaging
• Involved in various kinds of stream processing for about a decade
• High-speed video, social media streaming, general frameworks for stream processing
![Page 3: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/3.jpg)
Overview• Yahoo! performed a benchmark comparing
Apache Flink, Storm and Spark• The benchmark never actually pushed Flink
to it’s throughput limits but stopped at Storms limits
• I knew Flink was capable of much more so I repeated the benchmarks myself
• I did a follow up blog post explaining my findings and will summarize them here
![Page 4: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/4.jpg)
Yahoo! Benchmark• Count ad impressions grouped by
campaign• Compute aggregates over a 10 second
window• Emit current value of window aggregates
to Redis every second for query• Map ads to campaigns using Redis as well
![Page 5: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/5.jpg)
Any questions so far?
![Page 6: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/6.jpg)
Storm Code
![Page 7: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/7.jpg)
Flink Code
![Page 8: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/8.jpg)
Hardware Specs• 10 Kafka brokers with 2 partitions each• 10 compute nodes (Flink / Storm)• Each machine has 1 Xeon [email protected] CPU
• 4 cores w/ hyperthreading• 32 GB RAM (only 8GB allocated to JVMs)
• 10 GigE Ethernet between compute nodes• 1 GigE Ethernet between Kafka cluster and compute
nodes
![Page 9: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/9.jpg)
Logical Deployment
Data Generat
orKafka Source Filter Project Join
Redis
Window Sink Redis
Stream Processor
![Page 10: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/10.jpg)
Redis
Apache StormDeployment
Kafka
Kafka
Kafka
Source Filter Project Join Window Sink
FlinkData Generator
Redis
Shuffle
Apache Storm10 Gige Link1 Gige Link
![Page 11: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/11.jpg)
Redis
Kafka
Kafka
Kafka
Source Filter Project Join Window Sink
FlinkData Generator
Redis
Shuffle
10 Gige Link1 Gige Link
![Page 12: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/12.jpg)
Redis
Kafka
Kafka
Kafka
Source / Filter Project Join Window Sink
FlinkData Generator
Redis
Shuffle
10 Gige Link1 Gige Link
![Page 13: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/13.jpg)
Redis
Kafka
Kafka
Kafka
Source / Filter / Project Join Window Sink
FlinkData Generator
Redis
Shuffle
10 Gige Link1 Gige Link
![Page 14: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/14.jpg)
Redis
Kafka
Kafka
Kafka
Source / Filter / Project / Join Window Sink
FlinkData Generator
Redis
Shuffle
10 Gige Link1 Gige Link
![Page 15: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/15.jpg)
Redis
Kafka
Kafka
Kafka
Window / Sink
FlinkData Generator
Redis
Shuffle
Source / Filter / Project / Join
10 Gige Link1 Gige Link
![Page 16: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/16.jpg)
Redis
Kafka
Kafka
Kafka
FlinkData Generator
Redis
Shuffle
Window / SinkSource / Filter / Project / Join
10 Gige Link1 Gige Link
![Page 17: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/17.jpg)
Redis
Kafka
Kafka
Kafka
FlinkData Generator
Redis
Shuffle
Apache FlinkDeployment
Apache Flink
Window / SinkSource / Filter / Project / Join
10 Gige Link1 Gige Link
![Page 18: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/18.jpg)
Processing Guarantees
Apples and OrangesApache Storm Apache Flink
At least once semantics
Exactly once semantics
Double counting after failures No double counting
Lost state after failures No state loss
![Page 19: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/19.jpg)
Benchmark
Storm
Flink
0 750,000 1,500,000 2,250,000 3,000,000 3,750,000
Baseline
Throughput: msgs/sec
![Page 20: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/20.jpg)
Bottleneck AnalysisApache Storm
Kafka
Kafka
Kafka
Source Filter Project Join Window Sink
FlinkData Generator
Shuffle
Apache Storm10 Gige Link1 Gige Link
Redis
Redis
![Page 21: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/21.jpg)
Bottleneck AnalysisApache Storm
Kafka
Kafka
Kafka
Source Filter Project Join Window Sink
FlinkData Generator
Shuffle
Apache Storm10 Gige Link1 Gige Link
Redis
Redis
CPU
![Page 22: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/22.jpg)
Redis
Kafka
Kafka
Kafka
FlinkData Generator
Redis
Shuffle
Bottleneck AnalysisApache Flink
Apache Flink
Window / SinkSource / Filter / Project / Join
10 Gige Link1 Gige Link
![Page 23: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/23.jpg)
Redis
Kafka
Kafka
Kafka
FlinkData Generator
Redis
Shuffle
Bottleneck AnalysisApache Flink
Apache Flink
Window / SinkSource / Filter / Project / Join
10 Gige Link1 Gige Link
Network
![Page 24: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/24.jpg)
Redis
Kafka
Kafka
Kafka
FlinkData Generator
Redis
Shuffle
Eliminate theBottleneck
Apache Flink
Window / SinkSource / Filter / Project / Join
10 Gige Link1 Gige Link
![Page 25: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/25.jpg)
Redis
FlinkData Generator
Redis
Shuffle
Apache Flink
Window / SinkSource / Filter / Project / Join
10 Gige Link1 Gige Link
Eliminate theBottleneck
![Page 26: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/26.jpg)
Redis
Redis
Shuffle
Apache Flink
Window / SinkSource / Filter / Project / Join
10 Gige Link1 Gige Link
DataGenerator
Eliminate theBottleneck
![Page 27: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/27.jpg)
Redis
Redis
Shuffle
Apache Flink
Window / SinkSource / Filter / Project / Join
10 Gige Link1 Gige Link
DataGenerator
Apache FlinkDeployment
Round 2
![Page 28: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/28.jpg)
Benchmark
Storm
Flink
0 750,000 1,500,000 2,250,000 3,000,000 3,750,000
Baseline
Throughput: msgs/sec
![Page 29: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/29.jpg)
BenchmarkRound 2
Storm
Flink
Flink (10 GigE)
0 4,000,000 8,000,000 12,000,000 16,000,000
10 GigE end-to-end
Throughput: msgs/sec
![Page 30: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/30.jpg)
Results• Apache Flink achieved 15 million messages
/ sec on Yahoo! benchmark• Much stronger processing guarantees:
Exactly once• 80x higher than what was reported in the
original Yahoo! benchmark on similar hardware
![Page 31: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/31.jpg)
Questions?
![Page 32: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/32.jpg)
Storm Compatibility• Lot’s of companies already have applications
written using the Storm API• Flink provides a Storm compatibility layer• Run your Storm jobs on Flink with a one line
code change• Flink also allows you to reuse your existing
Storm spout and bolt code from a Flink job• Give it a try!
![Page 33: Extending the Yahoo Streaming Benchmark](https://reader034.vdocuments.net/reader034/viewer/2022052705/58f9a92e760da3da068b6be5/html5/thumbnails/33.jpg)
Thanks!