big data benchmarking with rdma solutions

14
© 2013 Mellanox Technologies 1 Big Data Benchmarking with RDMA solutions Oracle Open World 2013

Upload: mellanox-technologies

Post on 12-May-2015

701 views

Category:

Technology


6 download

DESCRIPTION

Presented by Eyal Gutkind of Mellanox in the Mellanox theater, during Oracle OpenWorld 2013

TRANSCRIPT

Page 1: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 1

Big Data Benchmarking with RDMA solutions

Oracle Open World 2013

Page 2: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 2

Leading Supplier of End-to-End Interconnect Solutions

Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables

Comprehensive End-to-End InfiniBand and Ethernet Portfolio

Virtual Protocol Interconnect

StorageFront / Back-EndServer / Compute Switch / Gateway

56G IB & FCoIB 56G InfiniBand

10/40/56GbE & FCoE 10/40/56GbE

Fibre Channel

Virtual Protocol Interconnect

Page 3: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 3

A scalable fault-tolerant distributed system for data storage and processing

Hadoop has two main systems• Hadoop Distributed File System: self-healing high-bandwidth clustered storage. • MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data

programming abstraction.

Key values • Flexibility – Store any data, Run any analysis. • Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. • Economics – Cost per TB at a fraction of traditional options.

Hadoop Framework

HDFS™(Hadoop Distributed File System)

Map Reduce HBase

DISK DISK DISK DISK DISK DISK

Hive Pig

Map Reduce

HDFS™(Hadoop Distributed File System)

Page 4: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 4

Three Areas for Accelerations

Data Analytics• Explore inefficiencies in existing analytics frameworks and systems• Accelerate data processing to deliver faster results

Storage• Explore ways to refine dominant file system• Take advantage for direct attached disk to accelerate data access

Distributed Storage• Leverage popular distributed storage systems with Big Data applications• Use existing systems for usage with Big Data frameworks

Page 5: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 5

~88% CPU Utilization

I/O Offload Frees Up CPU for Application Processing U

ser

Sp

ace

Sy

stem

Sp

ace

~53% CPU Utilization

~47% CPU Overhead/Idle

~12% CPU Overhead/Idle

Without RDMA With RDMA and Offload

Use

r S

pa

ceS

yst

em S

pa

ce

Page 6: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 6

Plug-in architecture • Open-source, latest GA version 3.1 (6/10/2013)• Google code repository at: https://code.google.com/p/uda-plugin/

Accelerates Map Reduce Jobs• Accelerated merge sort

Efficient Shuffle Provider • Data transfer over RDMA • Supports InfiniBand and Ethernet

Supported Hadoop Distributions• Apache 3.0 – In the main trunk!• Apache 2.0.3 – In the main trunk!• Apache Hadoop 1.0.x ; 1.1.x• Cloudera Distribution Hadoop 3 &4• Hortonworks HDP 1.x• GPHD 1.2

Supported Hardware• ConnectX®-3 VPI• SwitchX-2 based systems

Unstructured Data Accelerator - UDA

HDFS™(Hadoop Distributed File System)

Map Reduce HBase

DISK DISK DISK DISK DISK DISK

Hive Pig

Map Reduce

Page 7: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 7

Double Map Reduce Performance with UDA

*TeraSort is a popular benchmark used to measure the performance of Hadoop cluster

~50% Disk Access CPU Efficiency 2.5X

**1TB Data Set, 20x dual X5670 (Westmere) Machines, 10x HDD Base; Vanilla GPHD1.2; UDA GPHD1.2+UDA

2X Faster Job Completion! Increase the Value of Data!

54%

Page 8: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 8

HDFS is the Hadoop File System• The underlying File system for HBase and other NoSQL Data Bases

More Drives, Higher Throughput is Needed

SSDs Solutions Must use Higher Throughput• Bounded by 1GbE and 10GbE

HDFS Acceleration; Joint Project With Ohio State University

HDFS™(Hadoop Distributed File System)

Map Reduce HBase

DISK DISK DISK DISK DISK DISK

Hive Pig

Page 9: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 9

SSDs Become De-Facto standard in HDFS deployment• Read capability is a critical factor for application performance

E-DFSIO, Part of Intel’s HiBench test suite, profiles aggregated throughput on the cluster• 1GbE network impede any performance benefit from SSD deployment

Unlocking the Power SSDs In Hadoop Environment

E-DFSIO, Showing the Power of SSD @ HDFS

Page 10: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 10

OrangeFS as Hadoop Storage Solution

Page 11: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 11

Mellanox VPI Card• MCX354A-FCBT

Mellanox Edge Switches• MSX10xx; MSX60xx

Cloudera Certified – CDH3 and CDH4

Page 12: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 12

E5-26x0 (Sandy Bridge) Machines• Dual Socket• 4+ cores each socket• 32GB+ of DRAM

Disk Drives• At least 5 x 1TB, SAS, 10K RPM

Hadoop Configuration• At least one Name Node + Job Tracker• At least 4 Data Nodes

Installation:• Your selection of Hadoop Distribution or other Big Data solution (Such as Cassandra)

Networking• ConnectX-3 VPI card, FDR, 40GbE and 10GbE• SwitchX based systems: MSX6036F, MSX1036B and MSX1016 • Mellanox’s FDR, 40GbE and 10GbE Cable Solutions

http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Hadoop.pdf

Simple Building Block for Big Data Solution

Page 13: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 13

EMC 1000-Node Analytic Platform Accelerates Industry's Hadoop Development 24 PetaByte of physical storage

• Half of every written word since inception of mankind Mellanox VPI Solutions

Test Drive Your Big Data

2X Faster Hadoop Job Run-TimeHadoop

Acceleration

High Throughput, Low Latency, RDMA Critical for ROI

Page 14: Big Data Benchmarking with RDMA solutions

© 2013 Mellanox Technologies 14

Thank You