aws webcast - an introduction to high performance computing on aws

KD SinghAWS Solutions Architect

• High performance and high throughput

computing on AWS

• Integrating on-premise HPC environments

with AWS

• HPC ecosystem – partners and tools

• Demo

Agenda

HPC and HTC on AWS

Concepts, Patterns & Practices

Take a typical big computation task…

…that an average cluster is too small (or

simply takes too long to complete)…

…optimization of algorithms can give some

leverage…

…and complete the task in hand…

Applying a large cluster…

…can sometimes be overkill

AWS instance clusters can be balanced to

the job in hand…

…neither too large…

…nor too small…

…with multiple clusters running at the

same time

…HPC clusters are too small when

you need them most,

…and too large the rest of the time

Jason Stowe, Cycle Computing

Why AWS for HPC?

Low cost with flexible pricing Efficient clusters

Unlimited infrastructure

Faster time to results

Concurrent Clusters on-demand

Increased collaboration

Elastic Cloud-Based Resources

Actual demand

Resources scaled to demand

Waste Customer

Dissatisfaction

Actual Demand

Predicted Demand

Rigid On-Premises Resources

Benefits of Agility

Pay As You Go Model

Use only what you need

Multiple pricing models

On-Premises

Capital Expense Model

High upfront capital cost

High cost of ongoing support

Cost Benefits of HPC in the Cloud

Reserved

Make a low, one-time

payment and receive

a significant discount

on the hourly charge

For committed

utilization

Free Tier

Get Started on AWS

with free usage &

no commitment

For POCs and

getting started

On-Demand

Pay for compute

capacity by the hour

with no long-term

commitments

For spiky workloads,

or to define needs

Spot

Bid for unused

capacity, charged at

a Spot Price which

fluctuates based on

supply and demand

For time-insensitive

or transient

workloads

Dedicated

Launch instances

within Amazon VPC

that run on hardware

dedicated to a single

customer

For highly sensitive or

compliance related

workloads

Many Pricing Models to Support Different Workloads

Customers running HPC Workloads on AWS

484.14 TFLOPS76th fastest supercomputer in

the world June 2014 Top500 list

26496 cores cluster of C3 instances

On-Demand Supercomputer!

• 8 Regions; 156,314 cores; 16,788 instances

• 1.21 petaFLOPS RPeak

• 264 Compute years in 18 hours

• Supercomputing environment worth $68M cost $33K

1 c|net newshttp://news.cnet.com/8301-1001_3-57611919-92/supercomputing-simulation-employs-156000-amazon-

processor-cores/

“Supercomputing simulation employs 156,000 Amazon

processor cores

To simulate 205,000 molecules as quickly as possible for a

USC simulation, Cycle Computing fired up a mammoth

amount of Amazon servers around the globe.” 1

http://news.cnet.com/8301-1001_3-57611919-92/supercomputing-simulation-employs-156000-amazon-processor-cores/

Characterizing HPC

Tightly

Coupled

Loosely

Coupled

Supporting

Services

Embarrassingly

parallel

Elastic

Batch workloads

Data management

Task distribution

Workflow

management

Interconnected jobs

Network sensitivity

Job specific

algorithms

Feature Details

Flexible Run windows or Linux distributions

Scalable Wide range of instance types from micro to cluster compute

Machine

Images

Configurations can be saved as machine images (AMIs) from which new

instances can be created

Full control Full root or administrator rights

Secure Full firewall control via Security Groups

Monitoring Publishes metrics to Cloud Watch

Inexpensive On-demand, Reserved and Spot instance types

VM

Import/Export

Import and export VM images to transfer configurations in and out of EC2

Compute

Elastic Compute Cloud (EC2)Basic unit of compute capacity

Range of CPU, memory & local disk options

35+ Instance types available, from micro to cluster

compute

c3.8xlarge

c3.2xlarge

c3.large

Vertical Scaling

Automation & Control

ec2-run-instances ami-xxxxxxxx

--instance-count 3

--availability-zone eu-west-1a

--instance-type m3.medium

http://docs.amazonwebservices.com/AWSEC2/latest/CommandLineReference/

CLI, API and Console

Scripted configurations

Auto Scaling

as-create-auto-scaling-group MyGroup

--launch-configuration MyConfig

--availability-zones eu-west-1a

--min-size 2

--max-size 200

Automatic re-sizing of compute clusters

based upon demand

Monitoring & Alerting

CloudWatch alerts based upon CPU load,

memory, I/O & user defined triggers

Trigger

scaling

policy

X

Time: +00h

<10 cores

Elastic Capacity

Time: +24h>1500

cores

Elastic Capacity

Time: +72h

<10 cores

Elastic Capacity

Time: +120h

>600 cores

Elastic Capacity

Computational Chemistry project for

Cancer treatment

Estimated computation time: 39 years

Estimate project cost: $40 million

87,000 Core AWS Cluster

Spot Instances

Completed in 9 hours

Total Cost $4,232

Import Export

Glacier

S3 EC2

RedshiftDynamoDB

EMR

Data Pipeline

S3Direct Connect

Kinesis

AWS Big Data PortfolioWhen data sets and data analytics need to

scale to the point that you have to start

innovating around how to collect, store,

organize, analyze and share it

COLLECT | STORE | ANALYZE | SHARE

Analyzed more than 3 billion data

points in 2.8 seconds instead of weeks

or months

SEC used Tradeworx and

the AWS Cloud to create an

analytics platform at 10%

the cost of a traditional

environment in less than 4

months

AWS gives Tradeworx the

ability to collect and analyze

billions of data over years,

allowing the SEC to

reconstruct any market event,

down to the individual record

Characterizing HPC

Tightly

Coupled

Loosely

Coupled

Supporting

Services

Embarrassingly

parallel

Elastic

Batch workloads

Data management

Task distribution

Workflow

management

Interconnected jobs

Network sensitivity

Job specific

algorithms

What if you need to:

Implement MPI?

Code for GPUs?

Tightly coupled

Enhanced Networking EC2 InstancesSingle Root I/O Virtualization (SR-IOV)

Higher Packets per Seconds, lower latencies, low network jitter

Implement HVM process execution

10 Gigabit Ethernet

R3 instances

Intel Xeon E5-2670

v2 2.5GHz

32 vCPUs

640GB SSD Local

Disk

244 GB RAM C3 instances

Intel Xeon E5-2680

v2 2.8 GHz

32 vCPUs

640GB SSD Local

Disk

60GB RAM

I2 instances

Intel Xeon E5-2670

v2 2.5GHz

32 vCPUs

1.6TB SSD Local

Disk

244 GB RAM

Tightly coupled

Network Placement GroupsCluster instances can be launched within a

Placement Group. All instances launched in a

Placement Group have low latency, full

bisection, 10 Gbps bandwidth between

instances.

10Gbps

Compute-intensive clinical trial

simulations that previously took 60

hours are finished in only 1.2 hours on

the AWS Cloudhttp://aws.amazon.com/solutions/case-studies/bristol-myers-squibb/

BMS used AWS to build a

secure, self-provisioning portal

for hosting research so

scientists can run clinical trial

simulations on-demand while

BMS is able to establish rules

that keep compute costs low.

Running simulations 98%

faster has led to more

efficient and less costly

clinical trials and better

conditions for patients.

http://aws.amazon.com/solutions/case-studies/bristol-myers-squibb/

GPU Computing

GPU compute instancesIntel® Xeon processors

NVIDIA GPUs

CUDA, OpenCL frameworks

Cluster GPU CG1

Intel Xeon X5570

16 vCPUs

10 Gigabit Ethernet

2x NVIDIA Tesla Fermi

M2050 448 cores each

G2 instances

Intel Xeon E5-2670

2.5 GHz

8 vCPUs, on-board

Hardware encoder

1,536 CUDA cores

15 GB RAM, 4GB

Video memory

CUDA & OpenCL

CUDA & OpenCLMassive parallel clusters running in GPUs

NVIDIA GRID and Tesla cards in specialized

instance types

National Taiwan University50 x cg1.4xlarge instance types

100 nvidia Tesla M2050

“Our purpose is to break the record of solving the shortest vector problem

(SVP) in Euclidean lattices…the vectors we found are considered the hardest

SVP anyone has solved so far.” Prof. Chen-Mou Cheng, the Principal Investigator of Fast Crypto Lab

$2,300 for using 100 Tesla M2050 for ten hours

Coming Soon…

New Compute-Optimized EC2 Instances

C4 family

C4 instances

Intel Xeon E5-2666

v3 Haswell, custom

36 vCPUs

60GB RAM

2.9GHz, up to 3.5GHz

with Turbo boost

Larger and Faster Elastic Block Store (EBS)

Volumes

Up to 16TB per volume

Up to 10,000 baseline IOPS per volume

Up to 20,000 provisioned IOPS per volume

Characterizing HPC

Tightly

Coupled

Loosely

Coupled

Supporting

Services

Embarrassingly

parallel

Elastic

Batch workloads

Data management

Task distribution

Workflow

management

Interconnected jobs

Network sensitivity

Job specific

algorithms

Middleware Services

Data managementFully managed SQL, NoSQL and object storage

Relational Database Service

Fully managed database

(MySQL, Oracle, MSSQL)

DynamoDB

NoSQL, Schemaless,

Provisioned throughput

database

S3

Object datastore up to 5TB

per object

99.999999999% durability

Collection CollaborationComputation

Moving computation closer to the data“Big Data” changes dynamic of computation and data sharing

Direct Connect

Import/Export

S3

DynamoDB

EC2

GPUs

Elastic Map Reduce

CloudFormation

Simple Workflow

S3

Zocalo

Middleware Services

Feeding workloadsUsing highly available Simple Queue

Service to feed EC2 nodes

Amazon SQS

Processing

task/processing trigger

Processing results

Middleware Services

Coordinating workloads & task clustersHandle long running processes across many nodes and task steps

with Simple Workflow

Task A

Task B

(Auto-

scaling)

Task C

2

3

1

Grid Engine

cfncluster

LSF

OpenLava

Bright Cluster Manager

Integrated Solutions

Legacy

Data Centers

On-Premises

Resources

Cloud

ResourcesIntegration

Cloud isn’t an ‘all or nothing’ choice

Active Directory Shibboleth

/ SAML

Network Configuration

Encryption

Backup Appliances

Your On-Premises

Apps

Legacy

Data Centers

Users & Access Rules (IAM)

Your Private Network (VPC)

Encryption (S3, RDS, HSM)

Backups (Storage Gateway)

Your Cloud Apps

AWS Direct Connect

VPN

Integrating AWS with your existing on-premises

infrastructure

AZ-1

AZ-2

Public

Public

Private

Private

Private

Private

Customer

Gateway

VPN

Gateway

Internet

Gateway

Amazon S3

VPN

Connection

SpotMaster

SpotClustered Storage

Server

Clustered Storage

Server

Internet

Example HPC Design Pattern

AWS HPC Partners

and Tools

HPC Software on AWS Marketplace

HPC Partners and Apps

Use your current development toolsNVIDIA CUDA drivers pre-loaded

Intel MPI and Intel MKL® libraries

OpenMPI and MPICH2

Applications/ServicesMathWorks MatLab, Intel Lustre, OrangeFS, Ansys Fluent,

COMSOL, OpenFOAM etc.

Use your favorite batch scheduler and configuration

management tools

cfncluster Univa Sun Grid

Engine

HTCondor MIT StarCluster

Torque Slurm Rocks+

(StackIQ)

AWS

CloudFormation

Openlava Chef Puppet Elasticluster

HPC Applications and Tools

Oil and Gas

Seismic Data Processing

Reservoir Simulations,

Modeling

Manufacturing & Engineering

Computational Fluid

Dynamics (CFD)

Finite Element Analysis (FEA)

Life SciencesMedia &

Entertainment

Transcoding and Encoding

DRM, Encryption

Rendering

Scientific Computing

Computational Chemistry

High Energy Physics

Stochastic Modeling

Quantum Analysis

Climate Models

EDA

Simulation

Verification

Genome Analysis

Molecular Modeling

Protein Docking

Popular HPC Workloads on AWS

[email protected]

cloud formation cluster

(cfncluster) demo

https://github.com/awslabs/cfncluster

aws webcast - an introduction to high performance computing on aws

Technology