a micro-benchmark suite for evaluating hadoop rpc on high-performance … · 2015-07-29 · • to...

Post on 29-Jan-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Xiaoyi  Lu,  Md.  Wasi-­‐ur-­‐Rahman,  Nusrat  Islam,  and  Dhabaleswar  K.  (DK)  Panda  

 Network-­‐Based  Compu2ng  Laboratory  

Department  of  Computer  Science  and  Engineering  The  Ohio  State  University,  Columbus,  OH,  USA  

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance

Networks

WBDB 2013

Outline    

•  IntroducAon  and  MoAvaAon  

•  Problem  Statement  

•  Design  ConsideraAons  •  Micro-­‐benchmark  Suite  

•  Performance  EvaluaAon  

•  Conclusion  &  Future  work

2

WBDB 2013

Big Data Technology •  Apache Hadoop is one of the most popular

Big Data technology –  Provides framework for large-scale,

distributed data storage and processing •  An open-source implementation of

MapReduce programming model •  Hadoop Distributed File System (HDFS) is

the underlying file system of Hadoop MapReduce and Hadoop DataBase, HBase

•  Hadoop Core – Common functionalities, e.g. Remote Procedure Call (RPC)

HDFS

MapReduce HBase

Hadoop Framework

3

Core (RPC, ..)

WBDB 2013

Adoption of Hadoop RPC •  Hadoop RPC is increasingly being used with data-center

middlewares such as MapReduce, HDFS, and HBase because of its simplicity, productivity, and high performance. –  Metadata exchange –  Manage compute nodes and track system status –  Efficient data management operations: get block info, create blocks etc. –  Database operations: put, get, etc.

4

High Performance

Networks

(HDD/SSD)

(HDD/SSD)

(HDD/SSD)

...

...

(HDFS Data Nodes)(HDFS Clients)

...

...

(HBase Clients) (HRegion Servers) (Data Nodes)

(HDD/SSD)

(HDD/SSD)

(HDD/SSD)

...

... ...

... ...

...

High Performance

Networks

High Performance

Networks

MapReduce & HDFS HBase

Map/Reduce (HDFS Name Node)

WBDB 2013

Common  Protocols  using  Open  Fabrics  

5

Applica-on  

Verbs  Sockets  

ApplicaAon  Interface  

SDP

RDMA  

SDP  

InfiniBand  Adapter  

InfiniBand  Switch  

RDMA  

IB  Verbs  

InfiniBand  Adapter  

InfiniBand  Switch  

User space

RDMA  

RoCE  

RoCE  Adapter  

User space

Ethernet  Switch  

TCP/IP  

Ethernet  Driver  

Kernel Space

Protocol  

InfiniBand  Adapter  

InfiniBand  Switch  

IPoIB  

Ethernet  Adapter  

Ethernet  Switch  

Adapter  

Switch  

1/10/40  GigE  

iWARP  

Ethernet  Switch  

iWARP  

iWARP  Adapter  

User space IPoIB  

TCP/IP  

Ethernet  Adapter  

Ethernet  Switch  

10/40  GigE-­‐TOE  

Hardware  Offload  

RSockets  

InfiniBand  Adapter  

InfiniBand  Switch  

User space

RSockets  

WBDB 2013

Can  Big  Data  Processing  Systems  be  Designed  with  High-­‐Performance  Networks  and  Protocols?  

Enhanced  Designs  

Applica-on  

Accelerated  Sockets  

10  GigE  or  InfiniBand  

Verbs  /  Hardware  Offload  

Current    Design  

Applica-on  

Sockets  

1/10  GigE  Network  

•  Sockets  not  designed  for  high-­‐performance  –  Stream  semanAcs  oSen  mismatch  for  upper  layers  (Memcached,  HBase,  Hadoop)  –  Zero-­‐copy  not  available  for  non-­‐blocking  sockets  

Our  Approach  

Applica-on  

OSU    Design  

10  GigE  or  InfiniBand  

Verbs  Interface  

6

WBDB 2013

Hadoop RPC over InfiniBand

Hadoop  RPC  

IB  Verbs  

InfiniBand  

Applica-ons  

1/10  GigE,  IPoIB    Network  

Java  Socket    Interface  

Java  Na-ve  Interface  (JNI)  

Our Design

Default

 OSU  Design  

 

Enables  high  performance  RDMA  communicaAon,  while  supporAng  tradiAonal  socket  interface  

Xiaoyi  Lu,  Nusrat  Islam,  Md.  Wasi-­‐ur-­‐Rahman,  Jithin  Jose,  Hari  Subramoni,  Hao  Wang,  Dhabaleswar  K.  (DK)  Panda.  “High-­‐Performance  Design  of  Hadoop  RPC  with  RDMA  over  InfiniBand.”  To  be  presented  in  the  42nd  Interna-onal  Conference  on  Parallel  Processing  (ICPP  2013),  Lyon,  France,  October,  2013.   7

rpc.ib.enabled

WBDB 2013

8  

Hadoop  RPC  over  IB:  Gain  in  Latency  and  Throughput  

•  Hadoop  RPC  over  IB  PingPong  Latency  

–  1  byte:  39  us;  4  KB:  52  us  –   42%-­‐49%  and  46%-­‐50%  improvements  compared  with  the  performance  of  default  

Hadoop  RPC  on  10  GigE  and  IPoIB  (32Gbps)  respec-vely  

•  Hadoop  RPC  over  IB  Throughput  

–  512  bytes  &  48  clients:  135.22  Kops/sec  –   82%  and  64%  improvements  compared  with  the  peak  performance  of  default  

Hadoop  RPC  on  10  GigE  and  IPoIB  (32Gbps)  respec-vely  

 

0  

20  

40  

60  

80  

100  

120  

1   2   4   8   16   32   64   128   256   512  1024  2048  4096  

Latency  (us)

 

Payload  Size  (Byte)  

RPC-­‐10GigE  RPC-­‐IPoIB(32Gbps)  RPCoIB(32Gbps)  

0  

20  

40  

60  

80  

100  

120  

140  

160  

8   16   24   32   40   48   56   64  

Throughp

ut  (K

ops/Sec)  

Number  of  Clients

RPC-­‐10GigE  

RPC-­‐IPoIB(32Gbps)  

RPCoIB(32Gbps)  

WBDB 2013

•  High-Performance Design of Hadoop over RDMA-enabled Interconnects

–  High performance design with native InfiniBand support at the verbs-level for HDFS, MapReduce, and RPC components

–  Easily configurable for both native InfiniBand and the traditional sockets-based support (Ethernet and InfiniBand with IPoIB)

–  Current release: 0.9.0 •  Based on Apache Hadoop 0.20.2

•  Compliant with Apache Hadoop 0.20.2 APIs and applications

•  Tested with

–  Mellanox InfiniBand adapters (DDR, QDR and FDR)

–  Various multi-core platforms

–  Different file systems with disks and SSDs

–  http://hadoop-rdma.cse.ohio-state.edu

Available in Hadoop-RDMA SoSware  

9

WBDB 2013

Requirements of Hadoop RPC Benchmarks •  To achieve optimal performance, Hadoop RPC needs

to be tuned based on cluster and workload characteristics

•  A micro-benchmark tool suite to evaluate Hadoop RPC performance metrics in different configurations is important for tuning and understanding

•  For Hadoop developers, this kind of micro-benchmark suite is helpful to evaluate and optimize the performance of new designs

10

WBDB 2013

Outline    

•  IntroducAon  and  MoAvaAon  

•  Problem  Statement  

•  Design  ConsideraAons  •  Micro-­‐benchmark  Suite  

•  Performance  EvaluaAon  

•  Conclusion  &  Future  work

11

WBDB 2013

Problem Statement •  Can we design and implement a simple and standardized

benchmark suite to let all users and developers in the Big Data community evaluate, understand, and optimize the Hadoop RPC performance over a range of networks/protocols?

•  What will be the performance of Hadoop RPC when evaluated

using this benchmark suite on high-performance networks?

12

WBDB 2013

Outline    

•  IntroducAon  and  MoAvaAon  

•  Problem  Statement  

•  Design  ConsideraAons  •  Micro-­‐benchmark  Suite  

•  Performance  EvaluaAon  

•  Conclusion  &  Future  work

13

WBDB 2013

Design Considerations

•  The performance of RPC systems is usually measured by the metrics of latency and throughput

•  Performance of Hadoop RPC is determined by: –  Factors related to network configurations; Faster

interconnects and/or protocols can enhance Hadoop RPC performance

–  Controllable parameters in RPC engine-level and benchmark-level: handler/client number, etc.

–  Data types: serialization and deserialization issues of different data types in RPC system; BytesWritable, Text, etc.

–  CPU Utilization: tradeoff between RPC subsystem performance and the whole system performance

14

WBDB 2013

Outline    

•  IntroducAon  and  MoAvaAon  

•  Problem  Statement  

•  Design  ConsideraAons  •  Micro-­‐benchmark  Suite  

•  Performance  EvaluaAon  

•  Conclusion  &  Future  work

15

WBDB 2013

Micro-benchmark Suite •  Two different micro-benchmarks:

–  Latency: Single Server, Single Client –  Throughput: Single Server, Multiple Clients

•  A script framework for job launching and resource monitoring

•  Calculates statistics like Min, Max, Average

16

Component Network Address

Port Data Type

Min Msg Size

Max Msg Size

No. of Iterations

Handlers Verbose

lat_client √ √ √ √ √ √ √

lat_server √ √ √ √

Component Network Address

Port Data Type

Min Msg Size

Max Msg Size

No. of Iterations

No. of Clients

Handlers Verbose

thr_client √ √ √ √ √ √ √

thr_server √ √ √ √ √ √

WBDB 2013

Outline    

•  IntroducAon  and  MoAvaAon  

•  Problem  Statement  

•  Design  ConsideraAons  •  Micro-­‐benchmark  Suite  

•  Performance  EvaluaAon  

•  Conclusion  &  Future  work

17

WBDB 2013

Experimental Setup •  Hardware  

–  Intel  Westmere  Cluster  • 8  nodes  • Each  node  has  8  processor  cores  on  2  Intel  Xeon  2.67  GHz  Quad-­‐core  CPUs,  24  GB  main  memory  

• Network:  1GigE,  10GigE,  and  IPoIB  (32Gbps)  •  SoSware  

–  Enterprise Linux Server release 6.1 (Santiago) at kernel version 2.6.32-131 with OpenFabrics version 1.5.3  

–  Hadoop  0.20.2  and  Sun  Java  SDK  1.7.    18

WBDB 2013

RPC Latency for BytesWritable

Small Messages Large Messages  

•  Latency  for  RPC  decreases  if  the  underlying  interconnect  is  changed  to  IPoIB  or  10  GigE  from  1  GigE.    

•  With  10  GigE  interconnect,  we  observe  beher  latency  than  IPoIB  for  small  payload  sizes.  For  large  payload  sizes,  IPoIB  performs  beher  than  10  GigE.    –  IPoIB  achieves  27%  gain  over  10  GigE  for  a  64  MB  payload  size,  whereas  it  performs  worse  by  

0.66%  over  10  GigE  for  a  4  KB  payload  size.    19

0

50

100

150

200

250

Late

ncy

(us)

Payload Size (Byte)

1GigE

10GigE

IPoIB(32Gbps)

0

100

200

300

400

500

600

700

800

128K 256K 512K 1M 2M 4M 8M 16M 32M 64M

Late

ncy

(ms)

Payload Size (Byte)

1GigE

10GigE

IPoIB(32Gbps)

WBDB 2013

RPC Latency for Text

Small Messages Large Messages

•  Similar  performance  characterisAc  for  RPC  latency  with  the  data  type  of  Text.    

20

0

20

40

60

80

100

120

140

160

180

200

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Late

ncy

(us)

Payload Size (Byte)

1GigE

10GigE

IPoIB(32Gbps)

0

100

200

300

400

500

600

700

800

128K 256K 512K 1M 2M 4M 8M 16M 32M 64M

Late

ncy

(us)

Payload Size (Byte)

1GigE

10GigE

IPoIB(32Gbps)

WBDB 2013

RPC Throughput for BytesWritable

7 RPC Server Handlers 16 RPC Server Handlers  

•  IPoIB  performs  beher  than  10  GigE  as  payload  size  is  increased.  

•  At  4  KB,  the  improvement  goes  upto  26%  for  seven  handler  threads.  For  small  payload  sizes,  10  GigE  performs  beher  than  IPoIB  by  an  average  margin  of  5-­‐6%.    

21

0

5

10

15

20

25

30

35

40

45

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Thro

ughp

ut (K

ops/

Sec)

Payload Size (byte)

1GigE

10GigE

IPoIB(32Gbps) 0

5

10

15

20

25

30

35

40

45

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Thro

ughp

ut (K

ops/

Sec)

Payload Size (byte)

1GigE

10GigE

IPoIB(32Gbps)

WBDB 2013

RPC Throughput for BytesWritable

CPU utilization for the experiment with 4 handlers Throughput Comparison for 4 KB payload size

•  Keep  the  payload  size  fixed  to  4  KB  and  observe  the  trend  with  different  handler  numbers  and  different  networks  –  IPoIB  performs  beher  than  10  GigE  as  48%,  5%,  45%,  and  47%  for  1,  4,  16,  and  32  handlers  

respecAvely.  

•  Easily  used  to  monitor  resource  uAlizaAon.  Enable  a  parameter  in  the  script  framework.   22

0 10 20 30 40 50 60 70 80 90

100

1 4 16 32

Thro

ughp

ut (K

ops/

Sec)

Handler Number

1GigE 10GigE IPoIB(32Gbps)

0

5

10

15

20

25

30

35

40

45

0 9 18

27

36

45

54

63

72

81

90

99

108

117

126

135

144

153

162

171

180

189

198

207

216

CPU

Util

izat

ion

(%)

Sampling Point

1GigE 10GigE IPoIB(32Gbps)

WBDB 2013

Outline    

•  IntroducAon  and  MoAvaAon  

•  Problem  Statement  

•  Design  ConsideraAons  •  Micro-­‐benchmark  Suite  

•  Performance  EvaluaAon  

•  Conclusion  &  Future  work

23

WBDB 2013

Conclusion and Future Works

•  Design and implement a micro-benchmark suite to evaluate the performance of standalone Hadoop RPC.

•  Provide standard micro-benchmarks to measure the latency and throughput of Hadoop RPC with different data types.

•  Illustrate the performance results of Hadoop RPC using our benchmarks over different networks/protocols (1GigE/10GigE/IPoIB).

•  Will extend our benchmark suite to help users to make the performance comparisons among Hadoop Writable RPC, Avro, Thrift, and Protocol buffers

•  Will be made available to the big data community via an open-source release 24

WBDB 2013

     Thank  You!  

{luxi,  rahmanmd,  islamn,  panda}@cse.ohio-­‐state.edu    

Network-­‐Based  CompuAng  Laboratory  hhp://nowlab.cse.ohio-­‐state.edu/

MVAPICH  Web  Page  hhp://mvapich.cse.ohio-­‐state.edu/

25 Hadoop-­‐RDMA  Web  Page  

hhp://hadoop-­‐rdma.cse.ohio-­‐state.edu/

top related