![Page 1: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/1.jpg)
Xiaoyi Lu, Md. Wasi-‐ur-‐Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda
Network-‐Based Compu2ng Laboratory
Department of Computer Science and Engineering The Ohio State University, Columbus, OH, USA
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance
Networks
![Page 2: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/2.jpg)
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
2
![Page 3: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/3.jpg)
WBDB 2013
Big Data Technology • Apache Hadoop is one of the most popular
Big Data technology – Provides framework for large-scale,
distributed data storage and processing • An open-source implementation of
MapReduce programming model • Hadoop Distributed File System (HDFS) is
the underlying file system of Hadoop MapReduce and Hadoop DataBase, HBase
• Hadoop Core – Common functionalities, e.g. Remote Procedure Call (RPC)
HDFS
MapReduce HBase
Hadoop Framework
3
Core (RPC, ..)
![Page 4: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/4.jpg)
WBDB 2013
Adoption of Hadoop RPC • Hadoop RPC is increasingly being used with data-center
middlewares such as MapReduce, HDFS, and HBase because of its simplicity, productivity, and high performance. – Metadata exchange – Manage compute nodes and track system status – Efficient data management operations: get block info, create blocks etc. – Database operations: put, get, etc.
4
High Performance
Networks
(HDD/SSD)
(HDD/SSD)
(HDD/SSD)
...
...
(HDFS Data Nodes)(HDFS Clients)
...
...
(HBase Clients) (HRegion Servers) (Data Nodes)
(HDD/SSD)
(HDD/SSD)
(HDD/SSD)
...
... ...
... ...
...
High Performance
Networks
High Performance
Networks
MapReduce & HDFS HBase
Map/Reduce (HDFS Name Node)
![Page 5: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/5.jpg)
WBDB 2013
Common Protocols using Open Fabrics
5
Applica-on
Verbs Sockets
ApplicaAon Interface
SDP
RDMA
SDP
InfiniBand Adapter
InfiniBand Switch
RDMA
IB Verbs
InfiniBand Adapter
InfiniBand Switch
User space
RDMA
RoCE
RoCE Adapter
User space
Ethernet Switch
TCP/IP
Ethernet Driver
Kernel Space
Protocol
InfiniBand Adapter
InfiniBand Switch
IPoIB
Ethernet Adapter
Ethernet Switch
Adapter
Switch
1/10/40 GigE
iWARP
Ethernet Switch
iWARP
iWARP Adapter
User space IPoIB
TCP/IP
Ethernet Adapter
Ethernet Switch
10/40 GigE-‐TOE
Hardware Offload
RSockets
InfiniBand Adapter
InfiniBand Switch
User space
RSockets
![Page 6: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/6.jpg)
WBDB 2013
Can Big Data Processing Systems be Designed with High-‐Performance Networks and Protocols?
Enhanced Designs
Applica-on
Accelerated Sockets
10 GigE or InfiniBand
Verbs / Hardware Offload
Current Design
Applica-on
Sockets
1/10 GigE Network
• Sockets not designed for high-‐performance – Stream semanAcs oSen mismatch for upper layers (Memcached, HBase, Hadoop) – Zero-‐copy not available for non-‐blocking sockets
Our Approach
Applica-on
OSU Design
10 GigE or InfiniBand
Verbs Interface
6
![Page 7: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/7.jpg)
WBDB 2013
Hadoop RPC over InfiniBand
Hadoop RPC
IB Verbs
InfiniBand
Applica-ons
1/10 GigE, IPoIB Network
Java Socket Interface
Java Na-ve Interface (JNI)
Our Design
Default
OSU Design
Enables high performance RDMA communicaAon, while supporAng tradiAonal socket interface
Xiaoyi Lu, Nusrat Islam, Md. Wasi-‐ur-‐Rahman, Jithin Jose, Hari Subramoni, Hao Wang, Dhabaleswar K. (DK) Panda. “High-‐Performance Design of Hadoop RPC with RDMA over InfiniBand.” To be presented in the 42nd Interna-onal Conference on Parallel Processing (ICPP 2013), Lyon, France, October, 2013. 7
rpc.ib.enabled
![Page 8: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/8.jpg)
WBDB 2013
8
Hadoop RPC over IB: Gain in Latency and Throughput
• Hadoop RPC over IB PingPong Latency
– 1 byte: 39 us; 4 KB: 52 us – 42%-‐49% and 46%-‐50% improvements compared with the performance of default
Hadoop RPC on 10 GigE and IPoIB (32Gbps) respec-vely
• Hadoop RPC over IB Throughput
– 512 bytes & 48 clients: 135.22 Kops/sec – 82% and 64% improvements compared with the peak performance of default
Hadoop RPC on 10 GigE and IPoIB (32Gbps) respec-vely
0
20
40
60
80
100
120
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
Latency (us)
Payload Size (Byte)
RPC-‐10GigE RPC-‐IPoIB(32Gbps) RPCoIB(32Gbps)
0
20
40
60
80
100
120
140
160
8 16 24 32 40 48 56 64
Throughp
ut (K
ops/Sec)
Number of Clients
RPC-‐10GigE
RPC-‐IPoIB(32Gbps)
RPCoIB(32Gbps)
![Page 9: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/9.jpg)
WBDB 2013
• High-Performance Design of Hadoop over RDMA-enabled Interconnects
– High performance design with native InfiniBand support at the verbs-level for HDFS, MapReduce, and RPC components
– Easily configurable for both native InfiniBand and the traditional sockets-based support (Ethernet and InfiniBand with IPoIB)
– Current release: 0.9.0 • Based on Apache Hadoop 0.20.2
• Compliant with Apache Hadoop 0.20.2 APIs and applications
• Tested with
– Mellanox InfiniBand adapters (DDR, QDR and FDR)
– Various multi-core platforms
– Different file systems with disks and SSDs
– http://hadoop-rdma.cse.ohio-state.edu
Available in Hadoop-RDMA SoSware
9
![Page 10: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/10.jpg)
WBDB 2013
Requirements of Hadoop RPC Benchmarks • To achieve optimal performance, Hadoop RPC needs
to be tuned based on cluster and workload characteristics
• A micro-benchmark tool suite to evaluate Hadoop RPC performance metrics in different configurations is important for tuning and understanding
• For Hadoop developers, this kind of micro-benchmark suite is helpful to evaluate and optimize the performance of new designs
10
![Page 11: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/11.jpg)
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
11
![Page 12: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/12.jpg)
WBDB 2013
Problem Statement • Can we design and implement a simple and standardized
benchmark suite to let all users and developers in the Big Data community evaluate, understand, and optimize the Hadoop RPC performance over a range of networks/protocols?
• What will be the performance of Hadoop RPC when evaluated
using this benchmark suite on high-performance networks?
12
![Page 13: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/13.jpg)
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
13
![Page 14: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/14.jpg)
WBDB 2013
Design Considerations
• The performance of RPC systems is usually measured by the metrics of latency and throughput
• Performance of Hadoop RPC is determined by: – Factors related to network configurations; Faster
interconnects and/or protocols can enhance Hadoop RPC performance
– Controllable parameters in RPC engine-level and benchmark-level: handler/client number, etc.
– Data types: serialization and deserialization issues of different data types in RPC system; BytesWritable, Text, etc.
– CPU Utilization: tradeoff between RPC subsystem performance and the whole system performance
14
![Page 15: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/15.jpg)
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
15
![Page 16: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/16.jpg)
WBDB 2013
Micro-benchmark Suite • Two different micro-benchmarks:
– Latency: Single Server, Single Client – Throughput: Single Server, Multiple Clients
• A script framework for job launching and resource monitoring
• Calculates statistics like Min, Max, Average
16
Component Network Address
Port Data Type
Min Msg Size
Max Msg Size
No. of Iterations
Handlers Verbose
lat_client √ √ √ √ √ √ √
lat_server √ √ √ √
Component Network Address
Port Data Type
Min Msg Size
Max Msg Size
No. of Iterations
No. of Clients
Handlers Verbose
thr_client √ √ √ √ √ √ √
thr_server √ √ √ √ √ √
![Page 17: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/17.jpg)
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
17
![Page 18: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/18.jpg)
WBDB 2013
Experimental Setup • Hardware
– Intel Westmere Cluster • 8 nodes • Each node has 8 processor cores on 2 Intel Xeon 2.67 GHz Quad-‐core CPUs, 24 GB main memory
• Network: 1GigE, 10GigE, and IPoIB (32Gbps) • SoSware
– Enterprise Linux Server release 6.1 (Santiago) at kernel version 2.6.32-131 with OpenFabrics version 1.5.3
– Hadoop 0.20.2 and Sun Java SDK 1.7. 18
![Page 19: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/19.jpg)
WBDB 2013
RPC Latency for BytesWritable
Small Messages Large Messages
• Latency for RPC decreases if the underlying interconnect is changed to IPoIB or 10 GigE from 1 GigE.
• With 10 GigE interconnect, we observe beher latency than IPoIB for small payload sizes. For large payload sizes, IPoIB performs beher than 10 GigE. – IPoIB achieves 27% gain over 10 GigE for a 64 MB payload size, whereas it performs worse by
0.66% over 10 GigE for a 4 KB payload size. 19
0
50
100
150
200
250
Late
ncy
(us)
Payload Size (Byte)
1GigE
10GigE
IPoIB(32Gbps)
0
100
200
300
400
500
600
700
800
128K 256K 512K 1M 2M 4M 8M 16M 32M 64M
Late
ncy
(ms)
Payload Size (Byte)
1GigE
10GigE
IPoIB(32Gbps)
![Page 20: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/20.jpg)
WBDB 2013
RPC Latency for Text
Small Messages Large Messages
• Similar performance characterisAc for RPC latency with the data type of Text.
20
0
20
40
60
80
100
120
140
160
180
200
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
Late
ncy
(us)
Payload Size (Byte)
1GigE
10GigE
IPoIB(32Gbps)
0
100
200
300
400
500
600
700
800
128K 256K 512K 1M 2M 4M 8M 16M 32M 64M
Late
ncy
(us)
Payload Size (Byte)
1GigE
10GigE
IPoIB(32Gbps)
![Page 21: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/21.jpg)
WBDB 2013
RPC Throughput for BytesWritable
7 RPC Server Handlers 16 RPC Server Handlers
• IPoIB performs beher than 10 GigE as payload size is increased.
• At 4 KB, the improvement goes upto 26% for seven handler threads. For small payload sizes, 10 GigE performs beher than IPoIB by an average margin of 5-‐6%.
21
0
5
10
15
20
25
30
35
40
45
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
Thro
ughp
ut (K
ops/
Sec)
Payload Size (byte)
1GigE
10GigE
IPoIB(32Gbps) 0
5
10
15
20
25
30
35
40
45
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
Thro
ughp
ut (K
ops/
Sec)
Payload Size (byte)
1GigE
10GigE
IPoIB(32Gbps)
![Page 22: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/22.jpg)
WBDB 2013
RPC Throughput for BytesWritable
CPU utilization for the experiment with 4 handlers Throughput Comparison for 4 KB payload size
• Keep the payload size fixed to 4 KB and observe the trend with different handler numbers and different networks – IPoIB performs beher than 10 GigE as 48%, 5%, 45%, and 47% for 1, 4, 16, and 32 handlers
respecAvely.
• Easily used to monitor resource uAlizaAon. Enable a parameter in the script framework. 22
0 10 20 30 40 50 60 70 80 90
100
1 4 16 32
Thro
ughp
ut (K
ops/
Sec)
Handler Number
1GigE 10GigE IPoIB(32Gbps)
0
5
10
15
20
25
30
35
40
45
0 9 18
27
36
45
54
63
72
81
90
99
108
117
126
135
144
153
162
171
180
189
198
207
216
CPU
Util
izat
ion
(%)
Sampling Point
1GigE 10GigE IPoIB(32Gbps)
![Page 23: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/23.jpg)
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
23
![Page 24: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/24.jpg)
WBDB 2013
Conclusion and Future Works
• Design and implement a micro-benchmark suite to evaluate the performance of standalone Hadoop RPC.
• Provide standard micro-benchmarks to measure the latency and throughput of Hadoop RPC with different data types.
• Illustrate the performance results of Hadoop RPC using our benchmarks over different networks/protocols (1GigE/10GigE/IPoIB).
• Will extend our benchmark suite to help users to make the performance comparisons among Hadoop Writable RPC, Avro, Thrift, and Protocol buffers
• Will be made available to the big data community via an open-source release 24
![Page 25: A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance … · 2015-07-29 · • To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload](https://reader033.vdocuments.net/reader033/viewer/2022041706/5e44dd77a0b8cc465d4c5754/html5/thumbnails/25.jpg)
WBDB 2013
Thank You!
{luxi, rahmanmd, islamn, panda}@cse.ohio-‐state.edu
Network-‐Based CompuAng Laboratory hhp://nowlab.cse.ohio-‐state.edu/
MVAPICH Web Page hhp://mvapich.cse.ohio-‐state.edu/
25 Hadoop-‐RDMA Web Page
hhp://hadoop-‐rdma.cse.ohio-‐state.edu/