cscfi computing services 12/2014
DESCRIPTION
Computing Services portfolio of CSC - IT Center for Science Ltd. 12/2014 editionTRANSCRIPT
CSC Computing Services
Olli-Pekka Lehto
Development Manager
Computing Platforms
@opleDecember 11th 2014
CSC Computing Capacity 1989–2014
2
244
1700
180
600
240
0
500
1000
1500
2000
2500
3000
2012 2013 2014
Bull
Taito
Sisu
Vuori
Louhi
CSC Computing Capacity 2012-2014
Phase 1 installed
Louhi retired
Phase 2 installed
Vuori retired
3,4x
5,6x
19,2x
Total performance:
2.54 PFlop/s
CSC is the most powerful academic
computing facility in the Nordics
CSC Computing Services
Performance Capacity Accelerated Cloud Hosting
Sisu
Massive
parallelism
Fast
interconnect
Taito
General use
Large memory
>100
applications
Taito
extension
Visualization
Special codes
Nvidia GPU
Intel Xeon Phi
cPouta
Build your
own
Openstack
IaaS
Kajaani
Espoo
Efficient and
secure
datacenters
Virtual and
physical
servers
Storage Services
Backup Archiving
Fast parallel storage
CSC Computing Services
Performance Capacity Accelerated Cloud Hosting
Sisu
40512
cores
1700 TFlops
Taito
18880
cores
600 TFlops
Taito
extension
76 Nvidia
K40 GPU
90 Intel Xeon
Phi 7120X
240 TFlops
cPouta
Dynamically
provisioned
from Taito
Kajaani
Espoo
Storage Services
>4PB, ~ 100GB/s
New in 2014: Xeon Haswell E5 CPUs
Intel Xeon E5-2690v3 2,6GHz
– 12 cores/CPU (+50%)
– AVX2 instructions (2x max flops/GHz)
– DDR4 memory
– “Energy-to-solution” at best 1/3 vs.
Sandy Bridge
We are one of earliest adopters
– Sisu upgraded 7/2014
– Taito upgraded 12/2014
Sisu Cray XC päivitys
8 new cabinets
Haswell CPUs
More memory per node
>7x performance
~2x energy consumption
40512 cores
108TB memory
680kW
15384kg
Sisu – Cray XC40
Designed for computationally intensive tasks
40512 cores, 64GB RAM / node
#37 on the Nov 14 Top500 -list
Aries-interconnect with 7TB/s bisection BW
Cray development tools
Comprehensive set of scalable applications
Cray XC blade
4 dual CPU nodes (96 cores)
64GB RAM per node
Aries Router
(500GB/s switching capability)
Power
Net
Blade in XC Rack
48 blades
384 CPUs
4608 cores
Aries Interconnect Cabling
Aries Interconnect Topology
2 dimensional
all-to-all network
in a group
All-to-all network
between groups
Source:
Robert Alverson, Cray
Hot Interconnects 2012 keynote
Optical uplinks to
inter-group net
13CSC presentation
Aries Bisection Bandwidth
9 000 000 x 1,75 x
Average European
consumer IP traffic in 2013
OR
1080p Netflix streams
~7 TB/s
=
Taito
HP cluster for general use
– 576 dual-CPU 8 core E5-2670 (Sandy Bridge)
64GB RAM per node
– 400 dual-CPU 12 core E5-2690v3 (Haswell)
128GB RAM per node
New HP Apollo 6000 chassis and blades
– Big memory nodes
10 x 256GB ; 2 x 1,5TB
56Gbit/s FDR InfiniBand –interconnect
Large selection of applications
taito-shell for instant interactive use
Taito Extension
Bull DLC 715
– Direct warm-water cooling
Special processors for computing
– 72 Nvidia Tesla K40 GPGPU
– 90 Intel Xeon Phi 7120X
High-performance and energy efficient
– Porting and optimization of applications needed
GPUs can be used for visualization
– For example by using VirtualGL
Connected to the Taito cluster
Taito Extension
Energy-efficiency of Systems
0
0.5
1
1.5
2
2.5
3
3.5
Vuori Sisu P1 Taito P1 Sisu P2 Taito P2 Bull
GF
lop
/W
Taito ja Sisu user interfaces
NoMachine NX –virtual desktop
(NX Web-client in beta testing)
Unix shell & X-forwarding
Scientists’ User Interface
https://sui.csc.fi
cPouta Cloud
IaaS cloud service for HPC
Use cases
– Running HTTP and other servers
– Non-CentOS Linux or Windows OS needed
– Superuser access needed
– Agile service development (“DevOps”)
Tuned for HPC
– Provisioned from Taito nodes
– Powerful CPUs, interconnect, storage
Simple web interface, CLI, REST API
https://research.csc.fi/pouta-iaas-cloud
Creating an Instance in cPouta
Creating a Virtual Volume in cPouta
Storage Services
HPC Storage (~4PB, ~100GB/s)
– Lustre parallel filesystem
– DDN SFA10k and SFA12k storage arrays
– Capacity and performance scalable
Cloud storage (in 2015)
– Ceph filesystem
– Software-defined storage (SDS)
– Block and object storage
Archive (iRODS), tape backup
Kajaani Datacenter
30MW power
PUE 1.05-1.2
3000m2 floorspace
99% free cooling
Modular Datacenter (MDC):
Easily Expandable, Highly Efficient
Interior view of the MDC
Cooling Technologies
Water-air hybrid (Sisu)
Air (Taito)
Direct warm-water
(Bull Taito extension)
PUE: 1,2
PUE: 1,05 PUE: 1,02
Why CSC?
High-performance, latest technologies
Secure environment (ISO27001:2013)
Finnish, non-profit organization
Ecologically sustainable infrastructure
Competitive and simple pricing model
Excellent network connectivity
Everything under one roof
– Cloud, traditional HPC, visualization, accelerators
– Various storage solutions, EUDAT, RDA
– Consulting, training, porting, optimization
ISO27001:2013 certification
Reflects our commitment to security
– Risk management
– Leadership
– Technical solutions and documentation
– Continual improvement
– Recovery planning
– Security as part of company culture
Covers nearly all ICT services and
datacenters
– Cloud services to be certified in early 2015
Leveraging Best-of-breed Open
Source Software
Deployment & ClusteringCloud
Operating Systems
MonitoringStorage Queuing
Logstash
Graphite
Near future developmentcPouta improvements, including:
– Oversubscribed instances (WWW-servers etc)
– Docker-support
– ISO27001:2013 certification
New ePouta service
– Productization of Biomedinfra
– Secure computing for organizations
– No direct visibility to Internet (VPN / OPN)
– Possible to extend your local resources
seamlessly
Services for data-intensive computing
– Hadoop/MapReduce optimized systems
– SSD storage
Backup slides
Intro to HPC architecture
Supercomputers in Olden Days
Supercomputers Today
Commodity technologies
– Server clusters
– Linux
– Ethernet, InfiniBand
– x86
Proprietary solutions in very high-end
– BlueGene, Cray, NEC
Cloud services on the rise
– Especially for modest compute needs
Use is constantly spreading to new fields
– Skilled people needed!
Basic Supercomputer Architecture
Edustasolmu
Frontend nodes
Interconnect networkCompute nodes
Storage servers
Storage system
Internet
Hallintasolmut
Management nodes
Management network
Management nodes
§