enabling performance-per-watt gains in high-performance ... · power consumption through the intel...

6
Enabling Performance-per-Watt Gains in High-Performance Cluster Computing Appro Xtreme-X™ Supercomputer with the Intel® Xeon® Processor E5-2600 Product Family Designed to meet the growing global demand for high-performance computing solutions, Appro’s next-generation Xtreme-X™ Supercomputer delivers superior performance-per-watt and reduced I/O latency with the flexible, efficient and highly versatile new Intel® Xeon® processor E5-2600 product family. Flexibility, Scalability and Performance-per-Watt In today’s datacenters, flexibility, scal- ability and performance-per-watt are all key factors influencing platform selec - tion. By incorporating the Intel® Xeon® processor E5-2600 product family into their Xtreme-X™ Supercomputers, Appro brings significant flexibility to HPC workload configurations that include capacity, hybrid, data intensive and capability computing. Built upon industry standard optimized server platforms, the new Appro Xtreme-X™ Supercomputer enables high compute density, increased performance-per-watt, high-performance network connectivity, streamlined I/O and diverse storage options. For high- performance networking topologies, the Appro Xtreme-X™ Supercomputer enables flexibility—offering Fat Tree (InfiniBand or Ethernet) or 3D Torus (InfiniBand) in single- or dual-rail configurations. The system also provides significant savings in power consumption through the intel- ligent design of the blade server platform featuring large, shared low-power fans, 208V or 277V platinum-rated power supplies, high-efficiency cooling, and inte- grated console management functionality. In addition, it provides the flexibility to take full advantage of liquid cooling solu- tions and system configuration options to meet ever-increasing wide-ranging data center architecture requirements. Appro HPC Software Stack The Appro Xtreme-X™ Supercomputer is also tightly integrated with the Appro HPC Software Stack. The Appro HPC Software stack offers a fully supported version of many cluster open source and commercial WHITE PAPER Appro Xtreme-X™ Supercomputer with the Intel® Xeon® Processor E5-2600 Product Family

Upload: others

Post on 19-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling Performance-per-Watt Gains in High-Performance ... · power consumption through the intel - ligent design of the blade server platform featuring large, shared low-power fans,

Enabling Performance-per-Watt Gains in High-Performance Cluster ComputingAppro Xtreme-X™ Supercomputer with the Intel® Xeon® Processor E5-2600 Product Family

Designed to meet the growing global demand for high-performance computing solutions, Appro’s next-generation Xtreme-X™ Supercomputer delivers superior performance-per-watt and reduced I/O latency with the flexible, efficient and highly versatile new Intel® Xeon® processor E5-2600 product family.

Flexibility, Scalability and Performance-per-WattIn today’s datacenters, flexibility, scal-ability and performance-per-watt are all key factors influencing platform selec-tion. By incorporating the Intel® Xeon® processor E5-2600 product family into their Xtreme-X™ Supercomputers, Appro brings significant flexibility to HPC workload configurations that include capacity, hybrid, data intensive and capability computing. Built upon industry standard optimized server platforms, the new Appro Xtreme-X™ Supercomputer enables high compute density, increased performance-per-watt, high-performance

network connectivity, streamlined I/O and diverse storage options. For high-performance networking topologies, the Appro Xtreme-X™ Supercomputer enables flexibility—offering Fat Tree (InfiniBand or Ethernet) or 3D Torus (InfiniBand) in single- or dual-rail configurations. The system also provides significant savings in power consumption through the intel-ligent design of the blade server platform featuring large, shared low-power fans, 208V or 277V platinum-rated power supplies, high-efficiency cooling, and inte-grated console management functionality. In addition, it provides the flexibility to take full advantage of liquid cooling solu-tions and system configuration options to meet ever-increasing wide-ranging data center architecture requirements.

Appro HPC Software StackThe Appro Xtreme-X™ Supercomputer is also tightly integrated with the Appro HPC Software Stack. The Appro HPC Software stack offers a fully supported version of many cluster open source and commercial

WHITE PAPERAppro Xtreme-X™ Supercomputer with the Intel® Xeon® Processor E5-2600 Product Family

Page 2: Enabling Performance-per-Watt Gains in High-Performance ... · power consumption through the intel - ligent design of the blade server platform featuring large, shared low-power fans,

“The strong partnership

between Appro and Intel

has resulted in one of

the best open-standards

supercomputer platform

designs based on the latest

Intel® Xeon® processor

E5-2600 product family.”

John Lee Vice President —Advanced

Technology Solutions Group Appro

compilers, tools, schedulers, and libraries including the Appro Cluster Engine™ (ACE) management suite. ACE offers server, cluster, storage and network management features combined with job scheduling, failover, load balancing and revision con-trol capabilities, and multiple Linux OS support.

Built Upon the Appro GreenBlade™ Server

Appro GreenBlade™ servers provide a scalable and modular building block foundation for the Appro Xtreme-X™ Supercomputer. Each dual-socket Appro GreenBlade™ server features two 8-core Intel® Xeon® processors supporting Intel® Hyper-Threading Technology that is capable of handling up to 16 simultaneous computing threads. Thanks to Intel’s latest processor, Appro has achieved significant performance-per-watt advantages and support for up to 16 DIMMs per two-socket server configuration in addition to significant improvements in memory speeds—up to 1600 MHz—over the pre-vious generation. This improved memory bandwidth makes the platform an ideal upgrade for Appro customers looking to run memory-intensive HPC applications.

“Appro is focused on driving High Performance Computing forward. The strong partnership between Appro and Intel has resulted in one of the best open-standards supercomputer platform designs based on the latest Intel® Xeon® processor E5-2600 product family,” said John Lee, Vice President of Advanced Technology Solutions Group of Appro. “This close partnership helped our cus-tomers provide early feedback related to platform designs and technologies, enabling fast-to-market cutting-edge supercomputing solutions available now for advanced scientific discovery.”

Impact of Intel® Integrated I/OInside the Intel® Xeon® processor E5-2600 product family, a new feature called Intel® Integrated I/O brings the I/O sub-system onto the processor die for the first time ever in an Intel Xeon processor. The result is a significant reduction in I/O latency, with up to 80 PCIe lanes per

two-socket server and support for the PCI Express* 3.0 standard. Each processor also includes an integrated DDR3 memory controller (IMC) with four memory chan-nels capable of supporting up to three ECC Registered DIMMs or three unbuffered ECC DIMMs per memory channel. The inte-grated I/O controller provides each socket with 40 PCI Express Gen3 lanes controlled by ten PCI Express Master Controllers.

Enhanced Capabilities Enabled by New MicroarchitectureWith the release of the Intel® Xeon® processor E5-2600 product family, several key system components, including the CPU, Integrated Memory Controller (IMC), and Integrated I/O Module (II/O), have been combined into a single processor package. A number of exclusive features complete the platform and enable a host of benefits—all designed to enable superior performance-per-watt as compared to previous generations. Each socket supports two Intel® QuickPath Interconnect (QPI) point-to-point links capable of up to 8.0 GT/s, up to 40 lanes of Gen 3 PCI Express* links capable of 8.0 GT/s, and 4 lanes of DMI2/PCI Express* Gen 2 interface with a peak transfer rate of 5.0 GT/s.

Intel® Xeon® Processor E5-2600 Product Family Feature Summary:

• Up to 8 execution cores and 16 threads per socket with Intel® Hyper-Threading Technology

• Up to 20 MB Cache (2.5 MB per core)

• Intel® QuickPath Interconnect (QPI)

• Up to 24 DIMMs per two-socket server to support multiple data-intensive Virtual Machines (VMs)

• Faster maximum memory speeds

• Intel® Advanced Vector Extensions (Intel® AVX) accelerates vector and floating point computations by doubling peak throughput rates

• Intel® Advanced Encryption Standards— New Instructions (Intel® AES-NI)1 enables pervasive encryption

• Intel® Trusted Execution Technology (Intel® TXT) for stronger security in virtual and cloud environments

• Intel® Intelligent Power Technology dynamically manages CPU and memory energy states

2

Page 3: Enabling Performance-per-Watt Gains in High-Performance ... · power consumption through the intel - ligent design of the blade server platform featuring large, shared low-power fans,

Speed Enhanced with Intel® QuickPath Interconnect (QPI)Intel® QuickPath Interconnect is a high-speed, packetized, point-to-point interconnect available in Intel’s latest processor generations. Together with an integrated I/O platform architecture, this distributed shared memory supports higher bandwidth and lower platform latency in a highly efficient architecture designed to enable excellent interconnect performance. A snoop protocol optimized for low latency and high scal-ability combine with packet and lane structures enabling quicker transaction completions. Intel® QuickPath Interconnect also includes a cache coherency protocol designed to keep the distributed memory and caching structures coherent during system oper-ation. Low-latency source snooping and a scalable home snoop behavior are both supported while the coherency protocol provides for direct cache-to-cache transfers for minimal latency.

Stepping up to a New GenerationAppro saw a direct impact from a number of exclusive features available in the Intel® Xeon® processor E5-2600 product family, including Intel® Integrated I/O, Intel® Data Direct Technology, support for the PCI Express* 3.0 Specification and Intel® Advanced Vector Extensions (Intel® AVX). Taken together, these fea-tures enabled Appro to achieve significant performance-per-watt enhancements over Appro Xtreme-X™ Supercomputers featuring previous-generation Intel® Xeon® 5600 series processors.

Higher Performance, Lower Power ConsumptionIn upgrading from the previous-genera-tion Intel® Xeon® processor 5600 series to HPC servers featuring the new Intel® Xeon® E5-2600 product family, Appro was able to achieve significant gains in performance-per-watt—notably around High-Performance Linpack 2.0 and Weather Research and Forecasting Model 3.0 (WRF).

Ethernet NetworkSwitch

NFS SSD global file serverInfiniband (IB) FDR

Network Switch

StorageSystem

Compute NodeAppro GreenBlade™Intel® E5-267064GB Memory

Appro Cluster Engine™Management Node

1GigE10GigEFDR IB

Benchmark Cluster ConfigurationAppro Xtreme-X™ Supercomputer featuring the Intel® Xeon® E5-2600 Product Family

Complete hardware and configuration details.2

3

Page 4: Enabling Performance-per-Watt Gains in High-Performance ... · power consumption through the intel - ligent design of the blade server platform featuring large, shared low-power fans,

High-Performance Linpack (HPL) 2.0 Benchmark ResultsThe High-Performance Linpack (HPL) 2.0 benchmark provides a method of measuring the floating point process-ing capabilities of a high-performance computing configuration. By running a program that solves a system of linear equations, HPL tests the ability of each processor in a configuration to perform dense matrix multiples—providing an effective example of a challenging paral-lell application with a complex inter-pro-cessor data communication pattern.

HPL provides an ideal environment for the demonstration and testing of a number of parallell software coding techniques and the hardware platforms designed to sup-port them—in this case those featuring Intel® Xeon® processors with Intel® Hyper-Threading Technology. Specifically tested is the ability of an application to use distributed memory and message-passing protocol to exchange information.

Weather Research and Forecasting Model 3.0 (WRF) Benchmark ResultsThe Weather Research and Forecasting (WRF) Model is a next-generation meso-scale numerical weather prediction sys-tem designed to serve both operational forecasting and atmospheric research needs. It features multiple dynamical cores, a 3-dimensional variational (3DVAR) data assimilation system, and a software architecture allowing for computational parallelism and system extensibility. WRF is suitable for a broad spectrum of applica-tions across scales ranging from meters to thousands of kilometers.3

The official benchmark version of WRF 3.0 was used to evaluate the Appro Xtreme-X™ Supercomputer featur-ing Intel® Xeon® E5-2670 processors run-ning the Continental United States 2.5 KM resolution dataset. The benchmark was configured to use the distributed memory layout, with all input and output files residing on the Storage Lustre global parallel file system. The bench-mark was run ‘as is’ with only Fortran and C compiler flags being specified for code optimizations.

CORES 128 256 512

5560E5-2670

WRF Benchmark Results

148% INCREASE

122% INCREASE

108% INCREASE

Complete hardware and configuration details.2

High Performance Linpack Results

NODES 128 256 512

Intel® Xeon® 5560Intel® Xeon® E5-2670

73% BETTER

73% BETTER

77% BETTER

Complete hardware and configuration details.2

4

Page 5: Enabling Performance-per-Watt Gains in High-Performance ... · power consumption through the intel - ligent design of the blade server platform featuring large, shared low-power fans,

“We observed significant

gains in the second

half of 2011 when we

first introduced servers

featuring Intel® AVX, and

look forward to what this

technology enables for

our customers running

floating point operations

like scientific simulations,

financial analytics and 3D

modeling and analysis.”

Steve Lyness VP —HPC Solutions Group Appro

Intel® Integrated I/O in PracticeBy integrating the I/O subsystem onto the processor die, Appro Xtreme-X™ Supercomputers featuring the Intel® Xeon® processor E5-2600 product family were able to achieve significant performance-per-watt gains as compared to previously published High-Performance Linpack (HPL) 2.0 and WRF 3.0 CONUS 2.5KM benchmarks. Intel® Integrated I/O comes standard in the Appro Xtreme-X™ Supercomputer with the PCI Express interface supporting up to 40 lanes offering outstanding speeds at 8 GT/s (no 8b/10b encoding), a x16 interface bifurcated down to two x8 or four x4 (or combinations) and a x8 interface bifur-cated down to two x4.

Intel® Data Direct I/O TechnologyWith Intel® Data Direct I/O Technology inside, direct data transfers from stor-age to cache are possible, reducing the need for performance-sapping memory accesses. Inside Appro Xtreme-X™ Supercomputers, when data flows faster, processor cores can stay more produc-tive and benchmarking applications like Linpack and WRF can achieve—and sustain—higher gigaflops per second while consuming less energy.

PCI Express* 3.0 SpecificationBy stepping up to the PCI Express* 3.0 specification, the latest generation of the Appro Xtreme-X™ Supercomputer can support increased I/O capabilities enabled by technologies including FDR InfiniBand, high-speed PCI Express flash devices, or co-processors/accelerators.

The Appro Xtreme-X™ Supercomputer provides superior memory and system bandwidth between processors and the I/O interface through increased interconnect bandwidth and message rates—effectively reducing latency and maximizing throughput in multi-node systems. The result is greater flexibility and performance—particularly when managing memory-intensive application workloads.

Intel® Advanced Vector Extensions (Intel® AVX)Intel® Advanced Vector Extensions (Intel® AVX) are new instructions that improve performance for applications like Linpack that are reliant on floating point. With the latest Intel® Xeon® processor E5-2600 product family, Appro Xtreme-X™ Supercomputers are better suited to highly demanding Floating Point intensive applications. Appro noted the impact of Intel® AVX enabled by the latest-genera-tion processors from Intel on a specific set of customers: “We observed significant gains in the second half of 2011 when we first introduced servers featuring Intel® AVX, and look forward to what this technology enables for our customers run-ning floating point operations like scien-tific simulations, financial analytics and 3D modeling and analysis,” said Steve Lyness, VP of HPC Solutions Group for Appro.

Aligned in InnovationAs a Platinum member of the Intel® Technology Provider Program, Appro benefits from direct alignment with Intel and early insights into the latest computing innovations from a global leader in silicon innovation. “It is a pleasure to collaborate with Appro to deliver the next-generation Intel® Xeon® processor E5-2600 product family, as well as solutions featuring Intel® Server Boards, which are optimized for memory bandwidth performance and maximum density, with a flexible I/O configuration.” said Lisa Graff, Vice President and General Manager of the Datacenter Platform Engineering Group at Intel.

SummaryBy incorporating the latest generation of the highly versatile Intel® Xeon® E5-2600 processors into the heart of Appro’s HPC computing solutions, Appro was able to achieve significant gigaflop-per-watt (and per-dollar) gains when running High-Performance Linpack (HPL) 2.0 and Weather Research and Forecasting Model 3.0 (WRF) benchmarks when upgrading from systems featuring the previous-generation Intel® Xeon® proces-sor 5500 series.

5

Page 6: Enabling Performance-per-Watt Gains in High-Performance ... · power consumption through the intel - ligent design of the blade server platform featuring large, shared low-power fans,

About ApproAppro is a leading developer of innovative supercomputing solutions uniquely positioned to support High-Performance Computing (HPC) markets. Over the past 20 years, Appro has grown from an OEM HPC server manufacturer to a complete innovative solution pro-vider serving world-class customers in Government Research Labs, University/Academic, Financial, Energy and Manufacturing. As a Platinum Member of the Intel® Technology Provider Program, Appro has early access to product and technology roadmaps, training and resources. Through this collaboration, Appro leads the HPC marketplace by combin-ing open platforms with the latest technologies to provide building blocks for supercomputing clusters.

About the Appro Xtreme-X™ SupercomputerThe Appro Xtreme-X™ Supercomputer is based on industry standard, optimized server platforms featuring extreme compute density, increased performance/watt, multiple high-performance network connectivity, I/O and disc options with choice of the latest processor technologies along with specialized professional services for medium to large-scale deployments.

About the Intel® Xeon® E5 ProcessorDesigned to sit at the heart of flexible and efficient data centers, the Intel® Xeon® processor E5-2600 product family is designed to meet diverse computing needs. Delivering the best combination of performance, built-in capabilities, and cost-effectiveness, servers featuring these processors support everything from virtu-alization and cloud computing to design automation or real-time financial transactions. New Intel® Integrated I/O helps to reduce latency and eliminate data bottlenecks— resulting in streamlined operations, and increased agility.

Learn more by visiting: www.appro.com

To learn more, visit: appro.com/product/intel_main.asp

To learn more, visit: www.intel.com/xeonE5

SOLUTION PROVIDED BY:

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, compo-nents, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

1 Intel ® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the instructions in the correct sequence. AES-NI is available on Intel® Core™ i5-600 Desktop Processor Series, Intel® Core™ i7-600 Mobile Processor Series, and Intel® Core™ i5-500 Mobile Processor Series. For availability, consult your reseller or system manufacturer. For more information, see http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/

2 Source: Appro. Hardware Configuration Details: Subrack: Appro GreenBlade SR5110, 5U, 10 blades; Nodes: 32; CPUs: Intel® Xeon® E5-2670 processor; 2.6 GHz; 8 cores/socket, 16 cores/node; Memory: 64 GB, 1600 MHz, DDR3, 8 GB DIMMS; InfiniBand* HCA: One (1) Mellanox MT27500 (ConnectX-3) FDR; InfiniBand Interconnect Topology: Single-Rail, Full Bi-section Bandwidth, Fat Tree; Infiniband Switches: ox 36-port Switch-X Infiniband FDR Switch; Management Node: One (1) Intel® X5560; Management Network: Two (2) Brocade* FCX648 48x1G Ethernet switch, Brocade 4x10GBE SFP+ Modules for FCX648.

Software Configuration Details: Cluster Management: Appro Cluster Engine (ACE 1.3.0); Queuing System: Sun* Grid Engine (SGE 6.2u5); Message Passing: Intel® MPI (IMPI Version 3.0 Update 2); Compilers (C/Fortran): Intel (Cluster Studio Version 12.1); Math Libraries: Intel® Math Kernel Library (Intel® MLK).

3 Source: http://www.wrf-model.org/index.php Intel, the Intel logo, Xeon and Xeon Inside, are trademarks of Intel Corporation in the U.S. and/or other countries. * Other names and brands may be claimed as the property of others. Copyright © 2012 Intel Corporation. All rights reserved. 0312/GIP/CAF/PDF Please Recycle 326983-001US