accelerate your business with deep learning

©2019 Mellanox Technologies. All rights reserved.

From Development to Deployment: Distributed AI Environments Depend on HPE and Mellanox

Executive SummaryWhen it comes to finding insight into massive amounts of both structured and unstructured data (images, text, voice, videos), machine learning is of principal importance for both research and business. Data analytics has become an essential function within many high-performance clusters, enterprise data centers, clouds and Hyperscale platforms.

Machine learning is a pillar of today’s technological world and offers solutions that enable better and more accurate decision-making based on the great amounts of data being collected. Machine learning encompasses a wide range of applications, ranging from security, finance, and image and voice recognition, to self-driving cars and smart cities.

Optimized for production environments, training at scale with the HPE Apollo 6500 Gen10 System that employs NVIDIA® Tesla V100 GPUs and Mellanox end-to-end high speed 100Gb/s interconnect fabric is a high-performance solution that provides exceptional performance for machine learning.

OverviewUsing HPE Apollo 6500 Systems with state-of-the-art NVIDIA GPUs that takes advantage of Mellanox’s InfiniBand, supporting GPUDirect RDMA (GDR) to accelerate training neural networks has already become an industry-leading platform when scaling out deep learning (DL) frameworks, such as Caffe, Caffe2, Chainer, MxNet, TensorFlow, and PyTorch.

AI development and deployment to production can be complex and requires the right technology and methodology to be successful. HPE provides a single source for both. Working closely with world-class partners, HPE delivers technology along with guidance from resources such as the HPE Deep Learning Cookbook and consultative services.

SOLUTION BRIEF

Accelerate Your Business with Deep Learning

Figure 1: Machine learning leverages vast amounts of data to unlock actionable insights to drive new opportunities for a broad range of business and research applications


The Solution: The following products comprise the combined solution:

The HPE Apollo 6500 Gen10 System is an ideal DL platform that provides performance and flexibility with industry-leading GPUs, fast GPU interconnects, high-bandwidth fabric, and a configurable GPU topology to match varied workloads. The HPE Apollo 6500 System provides rock-solid reliability, availability, and serviceability (RAS) features and includes up to eight GPUs per server, next generation NVIDIA NVLink™ for fast GPU-to-GPU communication, support for Intel® Xeon® Scalable processors, and a choice of high-speed/low-latency fabric. It is also workload-enhanced using flexible configuration capabilities.

The HPE Apollo 6500 Gen10 System supports up to eight NVIDIA Tesla V100 SXM2 16GB or 32GB GPU modules. Powered by NVIDIA Volta architecture, the Tesla V100 is the world’s most advanced data center GPU, designed to accelerate AI, HPC, and graphics workloads. Each Tesla V100 GPU processor offers the performance of up to 100 CPUs in a single GPU and can deliver 15.7 TFLOPS of single-precision performance and 125 TFLOPS of DL performance, for a total of one PFLOPS when fully populated with eight Tesla V100 GPUs. The tested architecture leverages NVIDIA NVLink technology to provide higher bandwidth and scalability for multi-GPU configurations. A single V100 GPU supports up to six NVIDIA NVLink connections for GPU-to-GPU communication, for a total of 300 GB/sec.

Networking: Mellanox 100Gb EDR InfiniBand When GPU workloads and data sets scale beyond a single Apollo 6500, a high-performance network fabric is critical for maintaining high-performance, inter-node communication, as well as enabling the external storage system to deliver full bandwidth to the GPU servers. For networking, Mellanox switches, cables, and network adapters provide industry-leading performance and flexibility for an Apollo 6500 System in a DL solution.

Mellanox is an industry-leading supplier of high-performance Ethernet and InfiniBand interconnects for high-performance GPU clusters used for DL workloads and for storage interconnect. With such technologies as remote direct memory access (RDMA) and GPUDirect, Mellanox enables excellent machine learning scalability and efficiency at network speeds from 10 to 100 Gbps. The InfiniBand network provides a high-performance interconnect between multiple GPU servers as well as providing network connectivity to the shared storage solution.

Accelerate Your Business with Deep Learning: Distributed AI Environments Depend on HPE and Mellanox page 2

Figure 2: HPE Apollo 6500 Gen10 System

Figure 3: NVIDIA Tesla V100 GPU Accelerator

Figure 4: Networking: Mellanox 100Gb EDR InfiniBand


Benchmark ResultsThe following training was performed using ImageNet 2012 for both ResNet-50 and VGG16 models to demonstrate scalable performance and efficiency by utilizing HPE Apollo 6500 Gen10 Systems with NVIDIA Tesla V100 SXM2 16GB GPUs, while providing comparisons of the scalability and performance advantages using Mellanox EDR 100Gb/s InfiniBand over that of 100Gb/s TCP.

Accelerate Your Business with Deep Learning: Distributed AI Environments Depend on HPE and Mellanox page 3

Figure 5: ResNet-50 XLA True Synthetic Data (Distributed Nodes

Figure 6: ReNet-50 XLA True Synthetic (Efficiency)


Figure 7: VGG16 Synthetic Data (Distributed Nodes)

Figure 8: VGG16 Synthetic Data (Efficiency)

© Copyright 2019. Mellanox Technologies. All rights reserved.Mellanox and Mellanox logo are registered trademarks of Mellanox Technologies, Ltd. All other trademarks are property of their respective owners.

350 Oakmead Parkway, Suite 100, Sunnyvale, CA 94085Tel: 408-970-3400 • Fax: 408-970-3403www.mellanox.com

SB MLNX-15-060298 Ver 1.0

SummaryBy taking advantage of the advanced system hardware architecture of the HPE Apollo 6500 with NVIDIA Tesla V100 SXM2 16GB GPUs and Mellanox EDR 100Gb/s InfiniBand smart interconnect, endusers can expect between 1.3x and 3x or more performance improvement over that of 100Gb/s Ethernet. Leveraging native RDMA, GPUDirect and advanced offload capabilities of the InfiniBand interconnect demonstrates the highest efficiency and scalable performance for AI workloads, delivering the highest data center return on investment, and the fastest possible data insights.

System Hardware and Software Components(4) HPE Apollo 6500 systems configured with (8) NVIDIA Tesla V100 SXM2 16GB, (2) HPE DL360 Gen10 Intel Xeon-Gold 6134 (3.2 GHz/8-core/130 W) CPUs, (24) DDR4-2666 CAS-19-19-19 Registered Memory Modules, HPE 1.6 TB NVMe SFF (2.5”) SSD, HPE Infini-Band EDR 100 Gbps 2-port 841QSFP28 Adapters, HPE Mellanox InfiniBand EDR 100 Gb/sec v2 36-port Switch (SB7890), Ubuntu 16.04, MLNX_OFED 4.5-1.0, Mellanox OFED GPUDirect RDMA 1.0-8, Docker 18.09.1, CUDA SDK 10.0, Tensorflow 1.12, Horovod, Tensorflow Benchmarks v1.12 compatible.

Resources

Accelerate Your Business with Deep Learning: Distributed AI Environments Depend on HPE and Mellanox

HPE White Paper: Accelerate time to value and AI insights https://www.hpe.com/us/en/resources/storage/requirements-distributed-ai.html

HPE Deep Learning solutions hpe.com/info/deep-learning

HPE Apollo 6500 Gen10 System https://hpe.com/servers/apollo6500

HPE Deep Learning Cookbook developer.hpe.com/platform/hpe-deep-learning -cookbook/home

HPE/NVIDIA Alliance page hpe.com/us/en/solutions/hpc-high-performance- computing/nvidia-collaboration.html

NVIDIA Volta architecture nvidia.com/en-us/data-center/volta-gpu-architecture/ NVIDIA Tesla nvidia.com/en-us/data-center/tesla/

NVIDIA GPU Cloud nvidia.com/en-us/gpu-cloud/

Mellanox Technologies mellanox.com

https://www.hpe.com/us/en/resources/solutions/accelerate-time-to-value-and-ai-insights.html

http://hpe.com/info/deep-learning

https://hpe.com/servers/apollog6500

http://developer.hpe.com/platform/hpe-deep-learning-cookbook/home

http://developer.hpe.com/platform/hpe-deep-learning-cookbook/home

http://hpe.com/us/en/solutions/hpc-high-performance-computing/nvidia-collaboration.html

http://hpe.com/us/en/solutions/hpc-high-performance-computing/nvidia-collaboration.html

http://nvidia.com/en-us/data-center/volta-gpu-architecture/

http://nvidia.com/en-us/data-center/tesla/

http://nvidia.com/en-us/gpu-cloud/

accelerate your business with deep learning

Documents