cray cs400-lc brochure

6
Smarter Cluster Supercomputing from the Supercomputer Experts Lowers energy costs; datacenter PUE of 1.1 or lower Capable of up to 80 percent heat capture

Upload: dangdat

Post on 14-Feb-2017

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cray CS400-LC Brochure

Smarter Cluster Supercomputing from the Supercomputer Experts

● Lowers energy costs; datacenter PUE of 1.1 or lower ● Capable of up to 80 percent heat capture

Page 2: Cray CS400-LC Brochure

Maximize Your Productivity with Flexible, High-Performance Cray® CS400™ Liquid-Cooled Cluster Supercomputers In science and business, as soon as one question is answered another is waiting. And with so much depending on fast, accu-rate answers to complex problems, you need reliable high perfor-mance computing (HPC) tools matched to your specific tasks.

Understanding that time is critical and all HPC problems are not created equal, we developed the Cray® CS400™ cluster super-computer series. These systems are industry standards-based, highly customizable, easy to manage, and purposefully designed to handle the broadest range of medium- to large-scale simulations and data-intensive workloads.

All CS400 components have been carefully selected, optimized and integrated to create a powerful, reliable high-performance compute environment capable of scaling to over 27,000 compute nodes and 46 peak petaflops.

Flexible node configurations featuring the latest processor and interconnect technologies mean you can get to the solution faster by tailoring a system to your specific HPC applications’ needs. Innovations in packaging, power, cooling and density translate to superior energy efficiency and compelling price/performance. Expertly engineered system management software instantly boosts your productivity by simplifying system administration and mainte-nance, even for very large systems.

Cray has long been a leader in delivering tightly integrated super-computer systems for large-scale deployments. With the CS400 system, you get that same Cray expertise and productivity in a flex-ible, standards-based and easy-to-manage cluster supercomputer.

CS400-LC™ Cluster Supercomputer: Liquid-Cooled and Designed for Your WorkloadThe CS400-LC™ system is our direct-to-chip warm water-cooled cluster supercomputer. Designed for significant energy savings, it features liquid-cooling technology that uses heat exchangers instead of chillers to cool system components. Compared to tradi-tional air-cooled clusters, the CS400-LC system can deliver three times more energy efficiency with typical payback cycles ranging from immediate to one year.

Along with lowering operational costs, the CS400-LC system offers the latest x86 processor technologies from Intel in a highly scalable package. Industry-standard server nodes and components have been optimized for HPC and paired with a comprehensive HPC software stack, creating a unified system that excels at capacity- and data-intensive workloads.

Page 3: Cray CS400-LC Brochure

Innovative Liquid Cooling Keeps Your System Cool — and Energy Costs LowDesigned to minimize power consumption without compromis-ing performance, the CS400-LC cluster supercomputer uses an innovative heat exchange system to cool system processors and memory.

The heat exchange cooling process starts with a coolant distri-bution unit (CDU), connected to each rack, and two separate cooling loops. One loop delivers warm or cool facility water to the CDU, where the heat is exchanged and the now-hot facility water exits the other end of the loop. A second loop repeats the pro-cess at the server level. A double-sealed low-pressure secondary loop, with dripless quick connects, cools the critical server com-ponents. It delivers cooled liquid to the servers where pump/cold plate units atop the processors capture the heat, and the now-hot liquid circulates back to the CDU for heat exchange. Facility water and server loop liquid never mix — liquid-to-liquid heat exchangers within the CDU transfer heat between the loops.

This isolated dual-loop design safeguards the nodes. First, the server loop is low pressure and low flow — server loop compo-nents are not subject to the high pressure of the facility loop. Second, the server loop is prefilled with nonconductive, deionized water containing additives to prevent corrosion.

Since it requires less powerful fans on the servers and fewer air conditioning units in the facility, the CS400-LC system reduces typical energy consumption by 50 percent with predicted power usage effectiveness (PUE) of 1.1 or lower. The system can also capture up to 80 percent of heat from the server components for possible reuse. Additionally, leak detection and prevention features are tightly integrated with the system remote monitoring and reporting capabilities.

Choice of Flexible, Scalable ConfigurationsFlexibility is at the heart of the Cray CS400-LC system design. At the system level, the CS400-LC cluster is built on the Cray® GreenBlade™ platform. Comprising server blades and chassis, the platform is designed to provide mix-and-match building blocks for easy, flexible configurations, at both the node and whole system level. Among its advantages, the GreenBlade platform offers high density (up to 60 compute nodes per 42U rack), excellent memory capacity (up to 1,024 GB per node), many power and cooling efficiencies and a built-in management module for indus-try-leading reliability.

The CS400-LC system features the latest Intel® Xeon® processors. It offers multiple interconnect and network topology options, maxi-mum bandwidth, local storage, many network-attached file system options and the ability to integrate with the Cray® Sonexion® scale-out Lustre® system, providing fast, high-performance scratch and primary storage.

Within this framework, the Cray CS400-LC system can be tailored to multiple purposes — from an all-purpose cluster, to one suited for shared memory parallel tasks, to a system optimized for hybrid compute- and data-intensive workloads.

Nodes are divided by function into compute and service nodes. Compute nodes run parallel MPI and/or Open MP tasks with maximum efficiency, while service nodes provide I/O and login functions.

Compute nodes feature two Intel Xeon processors per node and up to 1,024 gigabytes of memory. Each node can host one local hard drive.

With industry-standard components throughout, each system configuration can be replicated over and over to create a reliable and powerful large-scale system.

CS400-LC Hardware Configuration Options ● Two-socket x86 Intel Xeon processors ● Large memory capacity per node ● Multiple interconnect options: 3D torus/fat tree, single/dual rail FDR InfiniBand

● Local hard drives in each server ● Choice of network-attached file systems and Lustre-based parallel file storage systems

Outdoor Dry Cooler

Coolant Distribution Unit (CDU)

Facility Water

Low PressureServer Loop Server CoolerServer Cooler

Page 4: Cray CS400-LC Brochure

Easy, Comprehensive ManageabilityA flexible system is only as good as your ability to use it. The Cray CS400-LC cluster supercomputer offers two key productivity-boost-ing tools — a customizable HPC cluster software stack and the Cray Advanced Cluster Engine (ACE™) system management software.

Cray HPC Cluster Software Stack

HPC Programming

Tools

Development & Performance

ToolsCray PE on CS Intel Parallel Studio

XE Cluster Edition

PGI Cluster Development

Kit®GNU Toolchain NVIDIA® CUDA®

Application Libraries

Cray® LibSci™, LibSci_ACC Intel® MPI IBM Platform

MPI MVAPICH2 OpenMPI

Debuggers Rogue Wave TotalView® Allinea DDT, MAP Intel® IDB PGI PGDBG® GNU GDB

Schedulers, File Systems

andManagement

Resource Management /

Job SchedulingSLURM Adaptive Computing

Moab®, Maui, TorqueAltair PBS

ProfessionalIBM Platform™

LSF® Grid Engine

File Systems Lustre® NFS GPFS Panasas PanFS® Local (ext3, ext4, XFS)

Cluster Management Cray® Advanced Cluster Engine (ACE™) Management Software

Operating Systems and

Drivers

Drivers & Network Mgmt. Accelerator Software Stack & Drivers OFED

Operating Systems Linux® (Red Hat, CentOS)

®

The HPC cluster software stack con-sists of a range of software tools com-patible with most open source and commercial compilers, debuggers, schedulers and libraries. Also avail-able as part of the software stack is the Cray Programming Environment, which includes the Cray Compiling Environment, Cray Scientific and Math Libraries, and Performance Measurement and Analysis Tools.

Cray® Advanced Cluster Engine (ACE™)Hierarchical, Scalable Framework for Management, Monitoring and File Access

Hierarchical management infrastructure

Divides the cluster into multiple logical partitions, each with unique personality

Revision system with rollback

Remote management and remote power control

GUI and CLI to view/change/control, monitor health; plug-in capability

Automatic server/network discovery

Scalable, fast, diskless booting

High availability, redundancy, failover

Cluster event data available in real-time without affecting job performance

Node, IB network status

BIOS, HCA information

Disk, memory, PCIe errors

Temperatures, fan speeds

Load averages

Memory and swap usage

Sub-rack and node power

I/O status

RootFS – high-speed, cached access to root file system allowing for scalable booting

High-speed network access to external storage

ACE-managed, high-availability NFS storage

MANAGEMENT MONITORING FILE ACCESS

The Advanced Cluster Engine (ACE) management software simplifies cluster management for large scale-out environments with extremely scalable network, server, cluster and storage man-agement capabilities. Command line (CLI) and graphical user inter-face (GUI) options provide flexi-bility for the cluster administrator. An easy-to-use ACE GUI con-nects directly to the ACE daemon on the management server and can be executed on a remote system. With ACE, a large system is almost as easy to understand and manage as a workstation.

● Simplifies compute, network and storage management

● Supports multiple network topologies and diskless configurations with optional local storage

● Provides network failover with high scalability

● Integrates easily with standards-based HPC software stack components

● Manages heterogeneous nodes with different software stacks

● Monitors node and network health, power and component temperatures

ACE at a Glance

Page 5: Cray CS400-LC Brochure

Built-in Energy Efficiencies and Reliability Features Lower Your TCO Energy efficiency features, combined with our long-standing expertise in meeting the reliability demands of very large, high-usage deployments means you get more work done for less.

In addition to liquid cooling, the CS400-LC options for additional power and cost savings include high-efficiency load balancing power supplies and a 480V power distribution unit with a choice of 208V or 277V three-phase power supplies. It means you can use indus-try-standard 208V and 230V power as well as 277V (single-phase of a 480V three-phase input) and reduce power loss caused by step-down transformers and resistive losses as the power is delivered from the wall directly to the rack.

Reliability is built into the system design, starting with our careful selection of boards and components. Then multiple levels of redun-dancy and fault tolerance ensure the system meets your uptime needs. The CS400-LC cluster has redundant power, cooling and management servers and redundant networks — all with failover capabilities.

Intel® Xeon® Processor E5-2600 Product FamilyThe Intel Xeon processor is at the heart of the agile, efficient datacenter. Built on Intel’s industry-leading microarchi-tecture based on the 14nm second-generation Tri-Gate transistor technology, the Intel Xeon processor supports high-speed DDR4 memory technology with increased bandwidth, larger density and lower voltage over previous generations. The Intel support for PCI Express (PCIe) 3.0 ports improves I/O bandwidth, offering extra capacity and flexibility for storage and networking connections. The processor delivers energy efficiency and performance that adapts to the most complex and demanding workloads.

Page 6: Cray CS400-LC Brochure

© 2014-2015 Cray Inc. All rights reserved. Specifications are subject to change without notice. Cray is a registered trademark of Cray Inc. All other trademarks mentioned herein are the properties of their respective owners. 20160322EMS

Cray Inc. 901 Fifth Avenue, Suite 1000 Seattle, WA 98164 Tel: 206.701.2000 Fax: 206.701.2500 www.cray.com

Cray® CS400-LC™ SpecificationsArchitecture Liquid-cooled cluster architecture, up to 60 nodes per 42U rack

Processor, Coprocessor andAccelerators Support for 12-core, 64-bit, Intel® Xeon® processor E5-2600 v4 product family

Memory Up to 1,024 GB registered ECC DDR4 RAM per compute node using 16 x 64GB DDR4 DIMMs

Interconnectand Networks

External I/O interface

10 GbE Ethernet

FDR InfiniBand with ConnectIB®, QDR True Scale Host Channel Adapters or Intel® Omni-Path Host Fabric Interface

Options for single- or dual-rail fat tree or 3D torus

System Administration

Advanced Cluster Engine (ACE™) ● Complete remote management capability ● Graphical and command line system administration ● System software version rollback capability ● Redundant management servers with automatic failover ● Automatic discovery and status reporting of interconnect, server and storage hardware ● Ability to detect hardware and interconnect topology configuration errors ● Cluster partitioning into multiple logical clusters, each capable of hosting a unique software stack ● Remote server control (power on/off, cycle) and remote server initialization (reset, reboot, shut down)

● Scalable fast diskless booting for large node systems and root file systems for diskless nodes

Reliable, Available,Serviceable (RAS)

Redundant power, cooling and management servers with failover capabilitiesRedundant networks (InfiniBand, GbE and 10 GbE) with failoverAll critical components easily accessible and hot swappable

Resource Management andJob Scheduling

Options for SLURM, Altair PBS Professional, IBM® Platform™ LSF®, Adaptive Computing Torque, Maui and Moab, and Grid Engine

File SystemCray® Sonexion®, NFS, Local FS (Ext3, Ext4 XFS) Lustre® and Panasas® PanFS® available as global file systems

Cray® TAS, an open, capacity-optimized, tiered data system

Disk Storage Full line of FC-attached disk arrays with support for FC, SATA disk drives and SSDs

Operating System Red Hat, SUSE or CentOS

Performance Monitoring Tools Open source packages such as HPCC, Perfctr, IOR, PAPI/IPM, netperf

Compilers, Libraries and Tools

Options for Open MPI, MVAPICH2 or Intel MPI Libraries

Cray Compiler Environment (CCE), Cray LibSci, PGI, Intel Cluster Toolkit compilers, NVIDIA® CUDA™, CUDA C/C++, Fortran

OpenCL, DirectCompute Toolkits, GNU, DDT, TotalView, OFED programming tools and many others

Power

Up to 38 kW per cabinet depending on configuration

208V/230V/277V power

Optional 480V power distribution with 277V power supplies

Liquid Cooling FeaturesLow-pressure secondary loop completely isolated from primary datacenter liquid loopField-serviceable cooling kits with integrated pressure and leak detection with remote monitoring

Cabinet Dimensions (HxWxD) 82.40” (2,093 mm) H x 23.62” (600 mm) W x 59.06” (1,500 mm) D standard 42U/19” rack cabinet

Cabinet Weight 1,739 lbs.