dell accelerates the business of hpc

Dell Accelerates the Business of HPC Sponsored Whitepaper

June 20, 2016

© TIRIAS RESEARCH ALL RIGHTS RESERVED

This paper is first of a series that will describe Dell’s initiatives and product strategy in commercial

and mid-sized research applications of high performance computing (HPC). The focus in this paper

is on academic and research institutions that run a multitude of workloads. Subsequent papers will

explore Dell’s focus on more HPC business buyers in vertical markets, such as life sciences and

manufacturing.

Executive Summary

HPC buying behavior has changed greatly from previous patterns.

Commercial and mid-sized general research HPC customers are

typically focused on product development and multitenant research

instead of massive basic research projects. The front sections of this

paper will summarize the current state of the overall HPC market, and

the later sections introduce Dell’s recent participation in the HPC

market and describe representative Dell HPC general research

customer use cases.

Dell is engaged in a number of initiatives and partnerships to address

the specific needs of the growing mid-market HPC customer

community in order to extend HPC into the enterprise. These include

forming a Dell HPC Community, being a founding member of

OpenHPC, focusing on market expertise for specific HPC vertical

markets, funding their expansive Dell HPC Innovation Lab, integrating in-memory analytics

solutions, and offering the HPC community enterprise-class financing, deployment, and support

services.

Dell’s HPC System for Research enables Dell to understand academic and research institutions

that run a multitude of workloads to better serve them with scalable and affordable HPC resources.

Many of these customers are mid-sized and smaller organizations.

HPC is Still Evolving

There are many definitions for HPC and supercomputing. Most of them compare high performance

systems to more “normal” enterprise or personal computing systems – of course, both normal and

high-performance are moving targets over time.

TIRIAS Research has a more functional description – in general, HPC systems are designed to

accurately simulate reality. Specifically, a large portion of HPC systems are designed to accurately

simulate some aspect of our physical, three-dimensional (3D) world. HPC systems are designed to

break up physical spaces into smaller pieces. These smaller pieces are called “voxels” (a mash-up

Figure 1 Dell’s System for Research in Dell’s HPC

Innovation Lab [Source: Dell]

Dell Accelerates the Business of HPC

TIRIAS RESEARCH

of “volume” and “pixel”). Each voxel is described by its position in the simulation space, its

resolution (height, width, depth), and a set of mathematical rules that describe what happens in the

voxel during each time-slice of a simulation (and considering what happens in adjacent voxels).

Current generation HPC systems can model useful components of real world systems, for example

automotive crash test simulations, local weather predictions, and molecular dynamics, and they

can model larger systems with commercially useful simulation size, time resolution and scale, and

model complexity. There are some classes of HPC that are designed to attack problems in pattern

matching (such as genomics), cryptography, number theory and other abstract mathematical

concepts, but the bulk of HPC system design investment is directed at simulating reality – weather,

biology, product design (from consumer goods to weapons), etc.

The reason HPC performance is still a moving target is that current HPC systems are still not

capable of running realistic simulations in real-time for meaningfully large systems and volumetric

spaces. The difficult part for many people to understand is how far away IT still is from simulating

important small scale physical systems, such as a fully functional single cell organism living in

human blood or a complete automobile and its local neighborhood, including local weather

conditions.

The driving factors behind HPC architecture today remain as they have been for decades:

Larger volumetric spaces with

finer volumetric precision

Longer time scales with finer

time slices

Increasingly complex models

with more intricate interactions

between model elements

While HPC performance is still a moving target, it has past the point where simulation size and

complexity are only useful to researchers. As HPC performance continues to increase, more

commercial uses will open up.

Scaling Performance is Getting Harder

HPC hardware architectures are settling on a core of best practices. One key best practice in

simulating physical spaces is to parallelize voxel processing by scaling-out the size of an HPC

cluster. The supercomputer Top5001 list now includes HPC clusters with hundreds of thousands

1 http://www.top500.org/

Figure 2 [Source images: NASA]


TIRIAS RESEARCH

of processors, and that consume tens of megawatts of power. This is not a sustainable trajectory in

many ways – power consumption, capital expenses and operational complexity are at the top of

the list – therefore many governments have funded initiatives2 to pioneer the next generation of

architecture.

Another best practice is the use of compute accelerators, in particular graphics processing units

(GPUs), to parallelize and accelerate voxel calculations. Graphics acceleration technology was

originally created to more efficiently render increasingly more complex two-dimensional (2D)

surfaces onto computer monitors – rows and columns of

pixels. GPUs were then designed to render complex 3D

volumes onto 2D pixel arrays, and as pixel processing grew

more sophisticated the HPC community discovered that

GPUs could accelerate many types of simple simulations.

GPU designers leaned into this trend and designed more

flexible pixel processing into their parallel pixel pipelines,

and the result was general purpose GPUs (GPGPU). Intel

designed a different type of highly parallel processing with

their Xeon Phi architecture, and for some HPC algorithms

it has similar acceleration behavior.

Not only do GPGPUs accelerate voxel processing, they also lower

the power consumption per voxel over mainstream processors. As a

result, eight of the top 25 clusters listed in the June 2016

supercomputer Top500 list use highly parallel accelerators; six of

those use NVIDIA GPGPUs and three use Intel’s Xeon Phi (one uses

both).

HPC cluster performance is also highly dependent on network

interconnect fabric and storage architectures. There are many

permutations for processor, accelerator, network and storage

architectures and no clear best practices for the growing class of

commercial HPC customers to leverage.

Understanding Results of Simulations is Getting More Difficult

HPC simulations are now modeling many aspects of our physical

world at a level of complexity where people cannot see the subtle

nuances of model behavior by looking at images or time-lapse movies of a simulation run. It is

2 http://www.exascaleinitiative.org/

Figure 3 NVIDIA's P100 GPU module [Source: TIRIAS Research]

Figure 4 Intel’ Xeon Phi processor chip and module [Source: TIRIAS Research]


TIRIAS RESEARCH

even harder for people to evaluate the subtle differences in behavior of two simulations with

slightly different initial conditions or voxel behavior.

Most commercial HPC simulations are now run tens or hundreds of times with varying simulation

characteristics. Correlating subtle simulation behavior changes to simulation math or initial

conditions is becoming impossible for mere mortal humans.

Pattern analytics is being pressed into service in HPC applications in order to close this feedback

loop in creating more accurate and faster simulations. Because comparing simulation results adds

time and power consumption on top of running the simulations, accelerating pattern analytics is

now vitally important in commercial HPC applications.

In-memory big data processing techniques and deep learning algorithms are being deployed to

close the gap. “In-memory” systems implement very large physical memory spaces to load as

much of a data set into memory as possible. This accelerates many operations that use different

portions of a large data set by reducing data transfers between storage and memory.

Commercial and Mid-Sized HPC Customers Are Different

Classic HPC customers are usually typecast as expensive and well-staffed cost plus government

research projects or fixed budget academic research purchases dependent on “free” grad student

time for everything from assembly to administration and applications coding.

Commercial HPC customers are development focused – they are in business to solve problems

and bring solutions to market. These commercial customers want to see fast time-to-results, low

operational cost per result, consistent performance, and operational flexibility in multitenant

provisioning and metering of HPC resources.

Many new HPC customers are already buying cloud services, either from public clouds or

managed private clouds. These customers expect the same multitenant ease of provisioning,

administration and management for their HPC clusters as they do for their public and private cloud

infrastructure. They also expect standard commercial support and maintenance contracts.

However, HPC hardware requirements are very different than cloud infrastructure. Generic,

processor based scale-out architectures work very well for many cloud workloads, but that is not

the case for evolving HPC workloads. As mentioned above, HPC architectures are not mature yet

and will continue to evolve for decades.

Vendors who succeed in the commercial and mid-sized HPC markets will borrow cloud

architecture concepts to create multi-tenant HPC clusters, which in effect will become HPC private

clouds. Commercial HPC customers want to charge back time on their HPC cloud to internal


TIRIAS RESEARCH

customers, which means that execution time and storage utilization must be metered so that it can

be billed.

This new class of HPC customers have adopted open source software frameworks, such as the

Linux operating system and OpenStack cloud framework. Unlike research oriented HPC,

development focused customers also leverage commercial off-the-shelf (COTS) software

applications whenever possible. Commercial customers don’t want to write the best possible

simulation themselves, they want to run useful, supported simulations in a production environment

for their vertical market. It is a customer’s understanding of their vertical market that drives their

simulation variations. However, their choice of COTS software provider often determines the mix

of compute, storage and networking they must deploy, while the simulation scope and budget

jointly determine the size of the HPC hardware deployment and phases of installation.

Equally as important, these new HPC customers do not want to spend calendar quarters installing,

verifying, and tuning their HPC cloud to extract optimal performance. Their IT staff is not

budgeted nor equipped to do the experimentation required to find optimal configuration settings

for a complex pile of compute, network and storage gear.

Dell’s HPC Investments

Dell’s self-declared mission is to “democratize HPC.” Dell uses the word “democratize” in the

sense of making HPC cluster ownership accessible to more and smaller organizations over time.

This is not a purchase price strategy – HPC technology is evolving too fast for a hardware pricing

race to the bottom. Dell’s HPC strategy is more in line with traditional IT buying practices; Dell

is creating packages of HPC solutions that can be sized and configured for individual customer

solutions. Similar to traditional IT, Dell is offering finance and service packages on top of

hardware and software. And while traditional IT suppliers can talk with many different CIO

organizations, the HPC market is different and Dell has formed an HPC community to better listen

to and learn from its customer base.

Newly Formed HPC Community

The kick-off meeting for Dell’s HPC Community3 occurred in Austin, Texas in mid-April 2016,

with a large, well-attended meeting. The Dell HPC Community will meet again at ISC 20164 and

then again as SC165. Although there is an independent and precompetitive HPC Advisory

3 http://www.dellhpc.org/program-agenda.html 4 http://www.isc-hpc.com/ 5 http://sc16.supercomputing.org/


TIRIAS RESEARCH

Council6, Dell has the benefit of hosting own channel for listening to HPC customers for direct

market and operational feedback.

Founding Member of OpenHPC

The OpenHPC Collaborative Project7 is precompetitive community effort between processing IP

designers and chip vendors, system vendors, software vendors, and research labs to standardize an

open source HPC software stack. Many of the components of OpenHPC have been deployed for

years at one or more member sites. A key goal is to find the right combinations of these tools that

work together to form a consistent “close to the metal” low latency and yet vendor agnostic

distribution. This distribution must be updated, integrated, tested and verified by the community.

OpenHPC includes compute and I/O drivers, message passing libraries, software tools for

developers and administrators, and also performance testing tools.

Focus on Vertical Market Expertise

Dell has hired HPC subject matter experts to create the Dell HPC Systems solutions portfolio. Dell

announced their HPC System for Research (the focus of this paper), HPC System for Life Sciences,

HPC System for Manufacturing, and is working on future solutions for other markets as well. We

plan to dig deeper into Dell’s vertical markets in future papers.

HPC Innovation Lab

Dell’s HPC Innovation Lab is not just a simple

software “try before you buy” facility, it is a

focal point for Dell’s joint R&D activities with

partners and system integrators, as well as

coordination with customers. The lab is

housed in a 13,000 square-foot shared facility

containing over 1000 servers of different form

factors and generations. Whenever Dell

investigates a new HPC technology, they

bring it into this lab to understand its impact

on the system and performance. The focus is on the design, development and integration of HPC

systems, with a focus on the software stack, plus compute, interconnect, and storage performance

analysis and performance tuning down to BIOS settings.

6 http://www.hpcadvisorycouncil.com/ 7 http://www.openhpc.community/

Figure 5 Dell's HPC Innovation Lab [Source: Dell]


TIRIAS RESEARCH

The lab hosts Dell’s Zenith HPC system. Zenith is

designed on Intel’s Scalable Systems Framework (SSF)8

and today contains 256 2P nodes using Intel Xeon E5-2697

v4 processors, 128 GB of memory per node, and OmniPath

Architecture (OPA) interconnects connected by a non-

blocking OPA fabric, and 480 TB of Dell HPC NFS

storage. Dell says that the systems is performing at 270 TFLOPS today, which almost qualifies it

for the June 2016 Top500 supercomputer list (number 500 is now at 286 TFLOPs). That is already

impressive, and Dell has plans to double the size of this system by the end of the year.

Dell uses Zenith to prototype and characterize performance of advanced technologies in their

Innovation Lab, for general HPC use and specifically for target vertical markets, such as genomics9

and manufacturing10. Zenith is used for co-development with partners and customer evaluation of

software for scalability and performance. Zenith has been used to create proof of concepts blending

HPC and cloud technologies, HPC and Big Data analytics, OpenStack distributions for HPC, their

Hadoop analytics framework distribution running on a Lustre database, and many more. As a side

note, Dell is a gold member of and long-standing contributor to the OpenStack Foundation.

Dell’s HPC Innovation Lab was involved with the design, development and performance analysis

of Dell’s newly announced PowerEdge C6320p server11, based on the latest “Knights Landing”

(code named “KNL”, now the 7200 series) generation of Intel Xeon Phi processors. The big

difference with Intel Xeon Phi 7200 series processors from previous Xeon Phi generations is that

the 7200 series can act as a stand-alone processor, not just a coprocessor. There are four single-

socket Xeon Phi 7200 nodes in the 2U PowerEdge C6320p chassis. The chassis also integrates

Dell’s Remote Access Controller 8 (iDRAC8) with Xeon Phi 7200 to automate systems

management just like the new Xeon Phi was a mainstream Xeon processor. These systems will

appear in both TACC’s Stampede 1.5 and Stampede 2 systems, below. Dell says that they will

target the Xeon Phi 7200 series at specific classes of applications, such as computational finance,

molecular dynamics, and weather simulation.

Dell’s HPC Innovation Lab works closely with Dell’s office of the CTO for forward looking

technology exploration, and has evaluated ARM processors, RDMA over Converged Ethernet

(RoCE), special purpose compute accelerators, new file systems, and other architectural concepts.

8 http://www.intel.com/content/www/us/en/high-performance-computing/product-solutions.html 9 https://www.dell.com/learn/us/en/555/hpcc/high-performance-computing-life-sciences 10 http://www.dell.com/en-us/work/learn/assets/business~solutions~whitepapers~en/documents~digital-manufacturing-vrtx-tech-whitepaper.pdf 11 http://www.dell.com/us/business/p/poweredge-c6320p/pd

"Nobody else buys a system for

the joy of putting it together and

tuning it; that's what we do."

-- Garima Kochhar, Dell Systems

Sr. Principal Engineer


TIRIAS RESEARCH

Dell’s focus for this lab is not far future basic research, but rather the practical aspects of

commercializing leading edge technology at scale.

Dell’s HPC Innovation Labs also collaborates with Dell’s Extreme Scale Infrastructure (ESI)

group, both labs are located in the same campus. One of ESI’s eminently practical general R&D

projects is Dell’s recently disclosed Triton water cooling pilot project. Triton is notable for cooling

the processors in a server rack using the inflow water supply pressure – it requires no pumps at a

rack level, and for small numbers of racks it can operate directly from the standard pressure of a

commercial municipal water supply. While the server sled portion of the cooling system is

designed to cool Intel Xeon E5 processors operating at 200W each, it could easily be modified to

cool leading edge 250W to 300W GPGPUs such as NVIDIA’s Tesla P100 module12. Triton is an

example of technology developed in the ESI lab for extreme scale customers that is also applicable

to many HPC customers.

Dell’s HPC Innovation Lab also creates the direct experience base for Dell’s “HPC System

Builder,” Dell’s internal sizing tool for recommending properly sized systems, taking into account

configuring, provisioning and running those systems. HPC system builder is operational for

research and life science customers today, and Dell will add manufacturing to it soon.

Integration with In-Memory Analytic Solutions

A few years ago in-memory analytics would not have been important to mention in a paper about

the HPC market, but today in-memory analytics are being used by many mid-sized organizations

to analyze HPC simulation runs in order to identify features and patterns across simulations that

people need to pay attention to.

Dell has been doing a lot in this space, they are shipping their Dell In-Memory Appliance for

Cloudera Enterprise with Apache Spark, a Cloudera Apache Hadoop reference architecture, plus

Statistica Big Data Analytics from Dell.

12 http://www.nextplatform.com/2016/04/21/nvidias-tesla-p100-steals-machine-learning-cpu/

Figure 6 Dell Triton server sled [Source: Dell]


TIRIAS RESEARCH

Dell enables Cloudera’s Hadoop MapReduce analytics to directly access HPC results in a Lustre

File System data store13 using the Bright Cluster Manager (BCM) tool to deploy and configure the

hybrid cluster, plus Intel’s Hadoop Adapter for Lustre (HAL) plug-in. The result is that large data

sets do not have to be moved from Lustre to the Hadoop File System (HDFS), which consumes

time, power, and bandwidth.

Dell is also collaborating closely with SAP to develop mid-market SAP HANA14 in-memory

database and analytics solutions, including SAP HANA Edge and SAP Predictive Analytics. Dell

develops appliances for SAP HANA, and then works with SAP to build platforms for vertical

markets, such as SAP’s Foundation for Health15.

Financing, Deployment and Support Services

Dell Financial Services (DFS)16 can facilitate purchases across a wide range of customer sizes and

budgets. While not directly related to technology, it is important not to underestimate the

importance of Dell’s ability to directly assist mid-market customers in financing data center build-

out, including HPC hardware, software, and services.

Dell offers customers the option for Dell to install, configure, and integrate17 new Dell data center

equipment, remotely manage18 and support that equipment through its lifecycle, and then remove

and retire equipment as it reaches the end of its life cycle.

Dell enables enterprise class management using Dell’s iDRAC, Open Manage, Active System

Manager, and Lifecycle controller products.

Focus on End-Customer Enablement

At SC15 Dell announced three market initiatives: one to make HPC more widely available to

smaller companies and researchers, and the other two to dive into vertical market applications for

life sciences and manufacturing. TIRIAS Research will cover the vertical market applications in

subsequent papers, our focus here is on democratizing HPC in the general research market.

13 http://i.dell.com/sites/doccontent/business/solutions/whitepapers/en/Documents/DellHPCStorageWithIntelEELustre.pdf 14 https://hana.sap.com/abouthana.html 15 https://help.sap.com/platform_health 16 https://dfs.dell.com/Pages/DFSHomePage.aspx 17 http://www.dell.com/learn/in/en/inbsd1/services/deployment-services?s=bsd 18 https://www.dell.com/en-us/work/learn/assets/legal~service-descriptions~en/documents~remote-hpc-cluster-management-service-en.pdf


TIRIAS RESEARCH

Dell HPC System for Research

One of Dell’s HPC System Builder configuration

models is designed to support general research

customers. Building on the description above, the HPC

System Builder tool provides guidance to rapidly and

accurately size a customer purchase for desired general

performance targets, from 4 to 1024 compute nodes per

system, and then optimize the operations of the

installation (including optimizing BIOS

configurations). Customers can tune their deployment

and operations for performance, efficiency, or a balance

of the two. Dell can scale performance based on Highly

Parallel Linpack (HPL) sustained or theoretical

TFLOPS performance, or to a specific customer node

type and count requirement.

Much of Dell’s emphasis for mid-sized research

customers is to deliver a multitenant HPC cluster with

balanced throughput. Applications on a single cluster

can span a wide range of research interests in modeling,

rendering and analysis, from complex simulations to

analyzing machine-generated data from sensor systems

and scientific instruments.

Dell’s HPC System for Research marquee customers include:

University of Cambridge: Wilkes19

The Wilkes cluster became operational in late 2013

and debuted at #2 on the Green500 list at 3,631

MFLOPS/Watt. It contains 128 Dell PowerEdge T620

servers with 256 NVIDIA Tesla K20 GPUs

interconnected by 256 Mellanox Connect InfiniBand

NICs. It is attached to a 4 PB custom Lustre file

system. The cluster has 183 TFLOP CPU performance

and 240 TFLOP GPU performance. Wilkes is housed

19 http://www.hpc.cam.ac.uk/services/wilkes

Figure 7 Dell HPC System for Research with PowerEdge R430 nodes [Source: Dell]

Figure 8 Wilkes [Source: University of Cambridge]


TIRIAS RESEARCH

in water cooled data center implementing evaporative coolers and back of rack water heat

exchangers, yielding a spot PUE of 1.075.

The University of Cambridge HPC Solution Centre20 is a cloud-based resource available to U.K.

based small and medium businesses (SMB) to foster national competitiveness in a global economy.

Wilkes also supports the international Square Kilometre Array (SKA)21 radio telescope project

with CHPC.

Centre for High Performance Computing (CHPC):

Lengau (Cheetah)22

Lengau is a new HPC cluster, built from 1,039 Dell

PowerEdge C6320 servers connected by Mellanox EDR

InfiniBand NICs in 19 racks. It has 5 PB of attached

storage. Each of the C6320 servers contains four dual-

socket Xeon E5 server nodes. This CPU-only cluster is

ranked at 120 on the Top500 list at 782 TFLOPS.

Lengau will support South African science, including SKA,

and it will also be available to private, non-academic users

to boost national economic competitiveness.

Indiana University (IU) Pervasive Technology

Institute (PTI): Jetstream23

Jetstream is a geographically distributed half-

PFLOPS cloud based on the OpenStack cloud

framework and KVM hypervisor. It links IU’s cluster

to an identical cluster at TACC (below) and a small

test cluster at the University of Arizona. The IU and

TACC quarter-PFLOPS clusters are each built from

320 dual-socket Dell PowerEdge M630 blade server

nodes using Xeon E5-2600 v4 family processors.

Each node has 128 GB of memory and 2 TB local

20 http://www.dell.com/learn/uk/en/ukbsdt1/hpcc/cambridge-hpc-solution-centre 21 https://www.skatelescope.org/ 22 http://www.chpc.ac.za/index.php/news2/203-chpc-unveils-petascale-machine 23 http://jetstream-cloud.org/partners.php

Figure 10 Jetstream [Source: IU PTI]

Figure 9 CHPC Lengau [Source: Dell]


TIRIAS RESEARCH

storage, with a total of 40 TB of memory and a 960 TB storage system per cluster. The combined

system has recently become available for production workloads.

Jetstream resources are scheduled through the U.S. National Science Foundation’s (NSF)

XSEDE24 program. Jetstream enables creating customized virtual machines and features the ability

to initiate interactive computing sessions on the cluster, essentially virtual Linux desktops running

in Jetstream’s virtual machine, with screens delivered to smartphones and tablets across cellular

networks or to PCs on slow network connections.

Jetstream’s anticipated user base includes historically black colleges and universities, minority

serving institutions, tribal colleges, and higher education institutions in EPSCoR states25. A wide

variety of “long tail” applications are planned for Jetstream, including biology, earth science,

geographic information services (GIS), building network analytics tools, social sciences, and

others.

San Diego Supercomputer Center (SDSC): Comet26

Comet is a Dell-integrated 2 PFLOPS (peak) cluster using Dell

PowerEdge C6320 servers connected by Mellanox FDR

InfiniBand. Each C6320 chassis contains four dual-core nodes

using Intel Xeon E5-2680 v3 processors and 128 GB of

memory and 320 GB solid-state drive (SSD) storage. There are

18 PowerEdge C6320 chassis in each of 27 racks. The Comet

cluster contains 247 TB total memory and 634 TB total SSD

capacity. SDSC’s Data Oasis parallel file storage system is

being upgraded to 7.6 PB of storage.

Comet is scheduled through XSEDE, and also supports NSF’s target of long tail modest-scale

users, with a focus on genomics, social sciences, and economics.

The cluster also contains 36 GPU nodes containing two NVIDIA Tesla K-80 cards each, and four

large memory nodes, each containing 1.5TB of global memory. These nodes support specific

applications, such as visualizations, molecular dynamics simulations, and genome assembly.

Comet implements single root I/O virtualization (SR-IOV) and virtual LAN (VLAN) technologies,

which means that researchers can quickly carve out virtual sub-clusters that behave as stand-alone

hardware clusters – they can run their own OS and software stacks. The overall cluster is designed

24 https://www.xsede.org/overview 25 https://www.nsf.gov/od/oia/programs/epscor/nsf_oiia_epscor_eligible.jsp 26 http://www.sdsc.edu/services/hpc/hpc_systems.html#comet

Figure 11 Comet [Source: SDSC]


TIRIAS RESEARCH

so that lots of small node count clusters can be run simultaneously, which boosts overall cluster

utilization and efficiency, as well as availability to researchers.

Texas Advanced Computing Center (TACC)

TACC’s resources are also scheduled through XSEDE, and they also

support NSF’s target of long tail modest-scale users. TACC focuses on

natural and social sciences, engineering, technology, medicine, and many

other applications.

Stampede27

The Stampede cluster contains 6,400 Dell PowerEdge C8220 dual-

processor nodes, each using Intel Xeon E5-2680 processors with 32 GB

of memory, for 2.2 PFLOPS peak CPU performance. Those C8220

chassis also contain 6,880 previous generation Intel Xeon Phi SE10P co-

processors, which contribute an additional 7.4 PFLOPs peak accelerator

performance. Stampede ranks 12th on the June 2016 Top500 list at 5.2

PFLOPS sustained performance.

There are also 128 NVIDIA GPU cards for remote visualization, plus 16 more Dell servers for

large data analysis, each containing 1 TB shared memory and two GPUs. The cluster is connected

by Mellanox FDR InfiniBand.

Stampede 1.5

This upgrade to the original Stampede system adds 500 Intel Xeon Phi 7250 based Dell nodes to

the existing cluster. The NIC interfaces use OmniPath (OPA) bridge to InfiniBand. This is a

revised plan; it replaced an earlier upgrade plan that called for adding Xeon Phi 7200 series add-

in cards to existing server chassis. TACC is currently evaluating OPA and the first wave of pre-

27 https://www.tacc.utexas.edu/systems/stampede

Figure 12 Inside Stampede [Source: TIRIAS Research]

Figure 13 Front of Stampede [Source: TACC]


TIRIAS RESEARCH

production Dell’s Xeon Phi 7250 based systems, however, the evaluation ranks 116 on the June

2016 Top500 list.

Stampede 228

This recently announced cluster will deploy in phases during 2017 and 2018 and deliver a peak

performance of up to 18 PFLOPS. Stampede 2 will implement future Dell servers using a mix of

Xeon CPUs and Xeon Phi 7200 series processors connected by OPA. The final phase of the project

will be among the first wave of systems to use Intel’s 3D XPoint non-volatile main memory

technology.

Conclusion

Dell’s strengths in traditional enterprise IT and cloud computing markets directly apply to the

modern HPC market. This was not the case several years ago, but now cloud customers are

deploying increasingly sophisticated and intelligent services at scale. These services are pushing

state-of-the-art in processor, accelerator (including GPUs and many other types), storage and

networking technologies. As cloud services push technology, the HPC market benefits via lower

costs and better power efficiency.

TIRIAS Research predicts that more types of simulations and more complex simulations will

waterfall into more affordable HPC deployments over the next few decades.

Dell’s investments in HPC innovation and Dell’s deep relationships with HPC research centers

serving smaller customers put Dell a good position to benefit from this waterfall effect, by

understanding commercial and mid-market customers’ business needs to better serve them with

scalable and affordable HPC resources.

28 https://www.tacc.utexas.edu/-/stampede-2-drives-the-frontiers-of-science-and-engineering-forward

Copyright TIRIAS Research LLC 2016. All rights reserved.

Reproduction in whole or in part is prohibited without written permission from TIRIAS Research LLC.

This report is the property of TIRIAS Research LLC and is made available only upon these terms and

conditions. The contents of this report represent the interpretation and analysis of statistics and information that

is either generally available to the public or released by responsible agencies or individuals. The information

contained in this report is believed to be reliable but is not guaranteed as to its accuracy or completeness.

TIRIAS Research LLC reserves all rights herein. Reproduction or disclosure in whole or in part is permitted

only with the written and express consent of TIRIAS Research LLC.