aws webcast - an introduction to high performance computing on aws
DESCRIPTION
High Performance Computing (HPC) allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. Learn how the AWS cloud can cost- effectively provide the scalable computing resources, storage services, and analytic tools that enable running various kinds of HPC workloads. Who should attend? Engineers, architects, product managers, data scientists, high performance computing specialists, and researchers from industry and academia, along with technically-minded business stakeholders looking to put data to work for their organization.TRANSCRIPT
KD SinghAWS Solutions Architect
• High performance and high throughput
computing on AWS
• Integrating on-premise HPC environments
with AWS
• HPC ecosystem – partners and tools
• Demo
Agenda
HPC and HTC on AWS
Concepts, Patterns & Practices
Take a typical big computation task…
…that an average cluster is too small (or
simply takes too long to complete)…
…optimization of algorithms can give some
leverage…
…and complete the task in hand…
Applying a large cluster…
…can sometimes be overkill
AWS instance clusters can be balanced to
the job in hand…
…neither too large…
…nor too small…
…with multiple clusters running at the
same time
…HPC clusters are too small when
you need them most,
…and too large the rest of the time
Jason Stowe, Cycle Computing
Why AWS for HPC?
Low cost with flexible pricing Efficient clusters
Unlimited infrastructure
Faster time to results
Concurrent Clusters on-demand
Increased collaboration
Elastic Cloud-Based Resources
Actual demand
Resources scaled to demand
Waste Customer
Dissatisfaction
Actual Demand
Predicted Demand
Rigid On-Premises Resources
Benefits of Agility
Pay As You Go Model
Use only what you need
Multiple pricing models
On-Premises
Capital Expense Model
High upfront capital cost
High cost of ongoing support
Cost Benefits of HPC in the Cloud
Reserved
Make a low, one-time
payment and receive
a significant discount
on the hourly charge
For committed
utilization
Free Tier
Get Started on AWS
with free usage &
no commitment
For POCs and
getting started
On-Demand
Pay for compute
capacity by the hour
with no long-term
commitments
For spiky workloads,
or to define needs
Spot
Bid for unused
capacity, charged at
a Spot Price which
fluctuates based on
supply and demand
For time-insensitive
or transient
workloads
Dedicated
Launch instances
within Amazon VPC
that run on hardware
dedicated to a single
customer
For highly sensitive or
compliance related
workloads
Many Pricing Models to Support Different Workloads
Customers running HPC Workloads on AWS
484.14 TFLOPS76th fastest supercomputer in
the world June 2014 Top500 list
26496 cores cluster of C3 instances
On-Demand Supercomputer!
• 8 Regions; 156,314 cores; 16,788 instances
• 1.21 petaFLOPS RPeak
• 264 Compute years in 18 hours
• Supercomputing environment worth $68M cost $33K
1 c|net newshttp://news.cnet.com/8301-1001_3-57611919-92/supercomputing-simulation-employs-156000-amazon-
processor-cores/
“Supercomputing simulation employs 156,000 Amazon
processor cores
To simulate 205,000 molecules as quickly as possible for a
USC simulation, Cycle Computing fired up a mammoth
amount of Amazon servers around the globe.” 1
Characterizing HPC
Tightly
Coupled
Loosely
Coupled
Supporting
Services
Embarrassingly
parallel
Elastic
Batch workloads
Data management
Task distribution
Workflow
management
Interconnected jobs
Network sensitivity
Job specific
algorithms
Characterizing HPC
Tightly
Coupled
Loosely
Coupled
Supporting
Services
Embarrassingly
parallel
Elastic
Batch workloads
Data management
Task distribution
Workflow
management
Interconnected jobs
Network sensitivity
Job specific
algorithms
Feature Details
Flexible Run windows or Linux distributions
Scalable Wide range of instance types from micro to cluster compute
Machine
Images
Configurations can be saved as machine images (AMIs) from which new
instances can be created
Full control Full root or administrator rights
Secure Full firewall control via Security Groups
Monitoring Publishes metrics to Cloud Watch
Inexpensive On-demand, Reserved and Spot instance types
VM
Import/Export
Import and export VM images to transfer configurations in and out of EC2
Compute
Elastic Compute Cloud (EC2)Basic unit of compute capacity
Range of CPU, memory & local disk options
35+ Instance types available, from micro to cluster
compute
c3.8xlarge
c3.2xlarge
c3.large
Vertical Scaling
Automation & Control
ec2-run-instances ami-xxxxxxxx
--instance-count 3
--availability-zone eu-west-1a
--instance-type m3.medium
http://docs.amazonwebservices.com/AWSEC2/latest/CommandLineReference/
CLI, API and Console
Scripted configurations
Auto Scaling
as-create-auto-scaling-group MyGroup
--launch-configuration MyConfig
--availability-zones eu-west-1a
--min-size 2
--max-size 200
Automatic re-sizing of compute clusters
based upon demand
Monitoring & Alerting
CloudWatch alerts based upon CPU load,
memory, I/O & user defined triggers
Trigger
scaling
policy
X
Time: +00h
<10 cores
Elastic Capacity
Time: +24h>1500
cores
Elastic Capacity
Time: +72h
<10 cores
Elastic Capacity
Time: +120h
>600 cores
Elastic Capacity
Computational Chemistry project for
Cancer treatment
Estimated computation time: 39 years
Estimate project cost: $40 million
87,000 Core AWS Cluster
Spot Instances
Completed in 9 hours
Total Cost $4,232
Import Export
Glacier
S3 EC2
RedshiftDynamoDB
EMR
Data Pipeline
S3Direct Connect
Kinesis
AWS Big Data PortfolioWhen data sets and data analytics need to
scale to the point that you have to start
innovating around how to collect, store,
organize, analyze and share it
COLLECT | STORE | ANALYZE | SHARE
Analyzed more than 3 billion data
points in 2.8 seconds instead of weeks
or months
SEC used Tradeworx and
the AWS Cloud to create an
analytics platform at 10%
the cost of a traditional
environment in less than 4
months
AWS gives Tradeworx the
ability to collect and analyze
billions of data over years,
allowing the SEC to
reconstruct any market event,
down to the individual record
Characterizing HPC
Tightly
Coupled
Loosely
Coupled
Supporting
Services
Embarrassingly
parallel
Elastic
Batch workloads
Data management
Task distribution
Workflow
management
Interconnected jobs
Network sensitivity
Job specific
algorithms
What if you need to:
Implement MPI?
Code for GPUs?
Tightly coupled
Enhanced Networking EC2 InstancesSingle Root I/O Virtualization (SR-IOV)
Higher Packets per Seconds, lower latencies, low network jitter
Implement HVM process execution
10 Gigabit Ethernet
R3 instances
Intel Xeon E5-2670
v2 2.5GHz
32 vCPUs
640GB SSD Local
Disk
244 GB RAM C3 instances
Intel Xeon E5-2680
v2 2.8 GHz
32 vCPUs
640GB SSD Local
Disk
60GB RAM
I2 instances
Intel Xeon E5-2670
v2 2.5GHz
32 vCPUs
1.6TB SSD Local
Disk
244 GB RAM
Tightly coupled
Network Placement GroupsCluster instances can be launched within a
Placement Group. All instances launched in a
Placement Group have low latency, full
bisection, 10 Gbps bandwidth between
instances.
10Gbps
Compute-intensive clinical trial
simulations that previously took 60
hours are finished in only 1.2 hours on
the AWS Cloudhttp://aws.amazon.com/solutions/case-studies/bristol-myers-squibb/
BMS used AWS to build a
secure, self-provisioning portal
for hosting research so
scientists can run clinical trial
simulations on-demand while
BMS is able to establish rules
that keep compute costs low.
Running simulations 98%
faster has led to more
efficient and less costly
clinical trials and better
conditions for patients.
GPU Computing
GPU compute instancesIntel® Xeon processors
NVIDIA GPUs
CUDA, OpenCL frameworks
Cluster GPU CG1
Intel Xeon X5570
16 vCPUs
10 Gigabit Ethernet
2x NVIDIA Tesla Fermi
M2050 448 cores each
G2 instances
Intel Xeon E5-2670
2.5 GHz
8 vCPUs, on-board
Hardware encoder
1,536 CUDA cores
15 GB RAM, 4GB
Video memory
CUDA & OpenCL
CUDA & OpenCLMassive parallel clusters running in GPUs
NVIDIA GRID and Tesla cards in specialized
instance types
National Taiwan University50 x cg1.4xlarge instance types
100 nvidia Tesla M2050
“Our purpose is to break the record of solving the shortest vector problem
(SVP) in Euclidean lattices…the vectors we found are considered the hardest
SVP anyone has solved so far.” Prof. Chen-Mou Cheng, the Principal Investigator of Fast Crypto Lab
$2,300 for using 100 Tesla M2050 for ten hours
Coming Soon…
New Compute-Optimized EC2 Instances
C4 family
C4 instances
Intel Xeon E5-2666
v3 Haswell, custom
36 vCPUs
60GB RAM
2.9GHz, up to 3.5GHz
with Turbo boost
Larger and Faster Elastic Block Store (EBS)
Volumes
Up to 16TB per volume
Up to 10,000 baseline IOPS per volume
Up to 20,000 provisioned IOPS per volume
Characterizing HPC
Tightly
Coupled
Loosely
Coupled
Supporting
Services
Embarrassingly
parallel
Elastic
Batch workloads
Data management
Task distribution
Workflow
management
Interconnected jobs
Network sensitivity
Job specific
algorithms
Middleware Services
Data managementFully managed SQL, NoSQL and object storage
Relational Database Service
Fully managed database
(MySQL, Oracle, MSSQL)
DynamoDB
NoSQL, Schemaless,
Provisioned throughput
database
S3
Object datastore up to 5TB
per object
99.999999999% durability
Collection CollaborationComputation
Moving computation closer to the data“Big Data” changes dynamic of computation and data sharing
Direct Connect
Import/Export
S3
DynamoDB
EC2
GPUs
Elastic Map Reduce
CloudFormation
Simple Workflow
S3
Zocalo
Middleware Services
Feeding workloadsUsing highly available Simple Queue
Service to feed EC2 nodes
Amazon SQS
Processing
task/processing trigger
Processing results
Middleware Services
Coordinating workloads & task clustersHandle long running processes across many nodes and task steps
with Simple Workflow
Task A
Task B
(Auto-
scaling)
Task C
2
3
1
Grid Engine
cfncluster
LSF
OpenLava
Bright Cluster Manager
Integrated Solutions
Legacy
Data Centers
On-Premises
Resources
Cloud
ResourcesIntegration
Cloud isn’t an ‘all or nothing’ choice
Active Directory Shibboleth
/ SAML
Network Configuration
Encryption
Backup Appliances
Your On-Premises
Apps
Legacy
Data Centers
Users & Access Rules (IAM)
Your Private Network (VPC)
Encryption (S3, RDS, HSM)
Backups (Storage Gateway)
Your Cloud Apps
AWS Direct Connect
VPN
Integrating AWS with your existing on-premises
infrastructure
AZ-1
AZ-2
Public
Public
Private
Private
Private
Private
Customer
Gateway
VPN
Gateway
Internet
Gateway
Amazon S3
VPN
Connection
SpotMaster
SpotClustered Storage
Server
Clustered Storage
Server
Internet
Example HPC Design Pattern
AWS HPC Partners
and Tools
HPC Software on AWS Marketplace
HPC Partners and Apps
Use your current development toolsNVIDIA CUDA drivers pre-loaded
Intel MPI and Intel MKL® libraries
OpenMPI and MPICH2
Applications/ServicesMathWorks MatLab, Intel Lustre, OrangeFS, Ansys Fluent,
COMSOL, OpenFOAM etc.
Use your favorite batch scheduler and configuration
management tools
cfncluster Univa Sun Grid
Engine
HTCondor MIT StarCluster
Torque Slurm Rocks+
(StackIQ)
AWS
CloudFormation
Openlava Chef Puppet Elasticluster
HPC Applications and Tools
Oil and Gas
Seismic Data Processing
Reservoir Simulations,
Modeling
Manufacturing & Engineering
Computational Fluid
Dynamics (CFD)
Finite Element Analysis (FEA)
Life SciencesMedia &
Entertainment
Transcoding and Encoding
DRM, Encryption
Rendering
Scientific Computing
Computational Chemistry
High Energy Physics
Stochastic Modeling
Quantum Analysis
Climate Models
EDA
Simulation
Verification
Genome Analysis
Molecular Modeling
Protein Docking
Popular HPC Workloads on AWS