samza memory capacity_2015_ieee_big_data_data_quality_workshop
TRANSCRIPT
![Page 1: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/1.jpg)
A Memory Capacity Model for High Performing Data-‐filtering
Applica:ons in Samza Framework
1
Tao Feng, Zhenyun Zhuang, Yi Pan, Haricharan Ramachandra LinkedIn Corp
![Page 2: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/2.jpg)
Agenda
• Introduc:on • Memory capacity model • Evalua:on • Summary
2
![Page 3: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/3.jpg)
INTRODUCTION
3
![Page 4: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/4.jpg)
What Is Samza
4
Input Stream
Task 1 Task 2 Task 3
Output Stream Changelog Stream
Local state store
Checkpoint
Container
![Page 5: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/5.jpg)
Samza-‐based Data Filtering Systems
• Two main scenarios
5
Data Filtering By Rules Data Filtering By Joining Streams
![Page 6: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/6.jpg)
MEMORY CAPACITY MODEL
6
![Page 7: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/7.jpg)
Mo:va:on
• We need an accurate resource predic:ve model for beSer capacity planning
• We could have more containers within single node • Higher density without SLA viola:on • Lower business cost
7
![Page 8: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/8.jpg)
Memory Capacity Model
• L = TPE(B + Bk + Bm) • L: live data set size • T: Number of input topics • P: Number of par::on per topic • E: Number of unique entry per par::on • B: bytes per treemap entry • Bk: bytes of key serializa:on • Bm: bytes of value message serializa:on
• Required Heap Size 1H = 2*L • Details of proof could be found in our paper
8
![Page 9: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/9.jpg)
EVALUATION
9
![Page 10: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/10.jpg)
Test Setup
10
0
broker
Ka^a Clusters
1 … N
Contaier
Test System
• Test System config • 24 cores • 1gbps nic • 45GB mem
• JVM op:on: • UseG1GC • G1HeapRegion
Size= 4M
broker
broker
![Page 11: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/11.jpg)
Evalua:on Methodology
• Firstly we deduct the heap size based on the model as 1H • e.g with T: 1, P: 8, E: 5 million, B: 40 bytes, Bk: 24 bytes, Bm: 24 bytes, 1H = 2*L = 2*TPE(B + Bk + Bm) = 7G
• Secondly we compare Samza job throughput, system performance metrics(GC :me, CPU:me) with 2H, 3H cases
11
![Page 12: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/12.jpg)
Performance Results
12
![Page 13: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/13.jpg)
Performance Results(conc)
13
![Page 14: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/14.jpg)
Performance Results(conc)
14
1H 2H 3H
Young GC of G1 Count 88 29 32
Total :me(ms) 9850 5063 6144
Mixed GC of G1 Count 24 0 0
Total :me(ms) 70166 0 0
Total Count 112 29 31
Total :me(ms) 80117 5063 6144
• No full GC involved in 1H case • Expected Higher CPU :me and GC :me for 1H case
![Page 15: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/15.jpg)
Summary
• The model predicts memory usage of Samza accurately and guarantees Samza job SLA w/o much Samza SLA viola:on
• It allows 2X dense Samza containers deployments within the same node with the accurate memory es:ma:on
15
![Page 16: Samza memory capacity_2015_ieee_big_data_data_quality_workshop](https://reader034.vdocuments.net/reader034/viewer/2022042907/5877ce6a1a28ab39588b73bb/html5/thumbnails/16.jpg)
Q & A
16