magellan a test bed to explore cloud computing for science shane canon lawrence berkeley lab uc...

30
Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Upload: kelsi-boykin

Post on 02-Apr-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

MagellanA Test Bed to Explore Cloud

Computing for Science

Shane Canon Lawrence Berkeley Lab

UC Cloud SummitApril 19, 2011

Page 2: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

MagellanExploring Cloud Computing

Co-located at two DOE-SC Facilities• Argonne Leadership Computing

Facility (ALCF) • National Energy Research

Scientific Computing Center (NERSC)

• Funded by DOE under the American Recovery and Reinvestment Act (ARRA)

2

Page 3: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Magellan Scope

• Mission– Determine the appropriate role for private cloud

computing for DOE/SC midrange workloads

• Approach– Deploy a test bed to investigate the use of cloud

computing for mid-range scientific computing– Evaluate the effectiveness of cloud computing

models for a wide spectrum of DOE/SC applications

3

Page 4: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

What is a Cloud?Definition

According to the National Institute of Standards & Technology (NIST)…

• Resource pooling. Computing resources are pooled to serve multiple consumers.

• Broad network access. Capabilities are available over the network.

• Measured Service. Resource usage is monitored and reported for transparency.

• Rapid elasticity. Capabilities can be rapidly scaled out and in (pay-as-you-go)

• On-demand self-service. Consumers can provision capabilities automatically.

4

Page 5: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

SU SU SU SU

720 nodes, 5760 cores in 9 Scalable Units (SUs) 61.9 TeraflopsSU = IBM iDataplex rack with 640 Intel Nehalem cores

SU SU SU SU SU

Magellan Test Bed at NERSCPurpose-built for Science Applications

Load Balancer

I/O

I/O

NERSC Global Filesystem

8G FCNetworkLogin

NetworkLogin

QDR IB Fabric

10G Ethernet

14 I/O nodes(shared)

18 Login/network nodes

HPSS (15PB)

Internet 100-G Router

ANI

1 Petabyte with GPFS

5

Page 6: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Magellan Computing Models

Purpose Comments

Mix of node types and queues. Future: Dynamic provisioning, VMs, and virtual private clusters

Can expand based on demand. Supports: VMs, block storage

MapReduce. Both configured with HDFS

6

Page 7: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Magellan Research Agenda and Lines of Inquiry

• Are the open source cloud software stacks ready for DOE HPC science?

• Can DOE cyber security requirements be met within a cloud?

• Are the new cloud programming models useful for scientific computing?

• Can DOE HPC applications run efficiently in the cloud? What applications are suitable for clouds?

• How usable are cloud environments for scientific applications?

• When is it cost effective to run DOE HPC science in a cloud?

• What are the ramifications for data intensive computing?

7

Page 8: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Application Performance

• Can parallel applications run effectively in virtualized environments?

• How critical are high-performance interconnects that are available in current HPC systems?

• Are some applications better suited than others?

8

Can DOE HPC applications run efficiently in the cloud? What applications are suitable for clouds?

Page 9: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Application PerformanceApplication Benchmarks

GAMESS GTC IMPACT fvCAM MAESTRO2560

2

4

6

8

10

12

14

16

18

Carver

Franklin

Lawrencium

EC2-Beta-Opt

Amazon EC2

Ru

nti

me

Rel

ativ

e to

Mag

ella

n (

no

n-V

M)

Amazon CC

Page 10: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Application PerformanceApplication Benchmarks

MILC PARATEC0

10

20

30

40

50

60

Carver

Franklin

Lawrencium

EC2-Beta-Opt

Amazon EC2

Ru

nti

me

rela

tive

to

Car

ver

Amazon CC

Page 11: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Application PerformanceEarly Findings and Next Steps

Early Findings:• Benchmarking efforts demonstrate the importance of high-

performance networks to tightly coupled applications• Commercial offerings optimized for web applications are poorly

suited for even small (64 core) MPI applications

Next Steps:• Analyze price-performance in the cloud compared with traditional

HPC centers• Analyze workload characteristics for applications running on

various mid-range systems• Examine how performance compares at larger scales• Gathering additional data running in commercial clouds

11

Page 12: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

12

Programming Models

• Platform as a Service models have appeared that provide their own Programming and Model– for parallel processing of large data sets– Examples include Hadoop and Azure

• Common constructs – MapReduce: map and reduce functions– Queues, Tabular Storage, Blob storage

Are the new cloud programming models useful for scientific computing?

Page 13: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Programming ModelsHadoop for Bioinformatics

• Bioinformatics using MapReduce– Researchers at the Joint Genome Institute

have developed over 12 applications written in Hadoop and Pig

– Constructing end-to-end pipeline to perform gene-centric data analysis of large metagenome data sets

– Complex operations that generate parallel execution can be described in a few dozen lines of Pig

13

Page 14: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

300 400 500 600 700 800 900 1000 11000

2

4

6

8

10

12 Teragen (1TB) HDFS

Linear (HDFS)

Exponen-tial (HDFS)

GPFS

Linear (GPFS)

Exponen-tial (GPFS)

Number of maps

Tim

e (m

inu

tes)

Programming ModelsEvaluating Hadoop for Science

• Benchmarks such as Teragen and Terasort– evaluation of different file systems and storage options

• Ported applications to use Hadoop Streaming– Bioinformatics, Climate100 data analysis

14

Page 15: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Programming ModelsEarly Findings and Next Steps

Early Findings:• New models are useful for addressing data intensive computing• Hides complexity of fault tolerance• High-level languages can improve productivity• Challenge in casting algorithms and data formats into the new

model

Next Steps:• Evaluate scaling of Hadoop and HDFS• Evaluate Hadoop with alternate file systems• Identify other applications that can benefit from these

programming models

15

Page 16: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

User Experience

• How difficult is it to port applications to Cloud environments?

• How should users manage their data and workflow?

16

How usable are cloud environments for scientific applications?

Page 17: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

User ExperienceUser Community

• Magellan has a broad set of users– Various domains and projects (MG-RAST, JGI,

STAR, LIGO, ATLAS, Energy+)– Various workflow styles (serial, parallel) and

requirements– Recruiting new projects to run on cloud

environments

• Three use cases discussed today– Joint Genome Institute– MG-RAST - Deep Soil sequencing– STAR – Streamed real-time data analysis

17

STAR

Page 18: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

User ExperienceJGI on Magellan

• Magellan resources made available to JGI to facilitate disaster recovery efforts– Used up to 120 nodes– Linked sites over layer-2 bridge across

ESnet SDN link– Manual provisioning took ~1 week including

learning curve– Operation was transparent to JGI users

• Practical demonstration of HaaS– Reserve capacity can be quickly provisioned

(but automation is highly desirable)– Magellan + ESnet were able to support

remote departmental mission computing

18

Page 19: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Early Science - STAR

Details• STAR performed Real-time

analysis of data coming from RHIC at BNL

• First time data was analyzed in real-time to such a high degree

• Leveraged existing OS image from NERSC system

• Used 20 8-core instances to keep pace with data from the detector

• STAR is pleased with the results

19

Page 20: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

User ExperienceEarly Findings and Next Steps

Early Findings:• IaaS clouds can require significant system administration

expertise and are difficult to debug due to lack of tools.• Image creation and management are a challenge• I/O performance is poor• Workflow and Data management are problematic and time

consuming• Projects were eventually successful, simplifying further use of

cloud computing

Next Steps:• Gather additional use cases• Deploying fully configured virtual clusters• Explore other models to deliver customized environments• Improve tools to simplify deploying private virtual clusters

20

Page 21: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

ConclusionsCloud Potential

• Enables rapid prototyping at a larger scale than the desktop without the time consuming requirement for an allocation and account

• Supports tailored software stacks• Supports different levels of service• Supports surge computing• Facilitates resource pooling

– But DOE HPC clusters are typically saturated

21

Page 22: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

ConclusionsCloud Challenges

• Open source cloud software stacks are still immature, but evolving rapidly

• Current MPI-based application performance can be poor even at small scales due to interconnect

• Cloud programming models can be difficult to apply to legacy applications

• New security mechanisms and potentially policies are required for insuring security in the cloud

22

Page 23: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

ConclusionsNext Steps

• Characterize mid-range applications for suitability to cloud model

• Cost analysis of cloud computing for different workloads

• Finish performance analysis including IO performance in cloud environments

• Support the Advanced Networking Initiative (ANI) research projects

• Final Magellan Project report

23

Page 24: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Thanks to Many

• Lavanya Ramakrishnan• Iwona Sakrejda• Tina Declerck• Others

– Keith Jackson– Nick Wright– John Shalf– Krishna Muriki

(not picture)

24

Page 25: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Thank you!

Contact Info:Shane [email protected]

Page 26: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Attractive Features of the Cloud

• On-demand access to compute resources– Cycles from a credit card! Avoid lengthly procurements.

•  Overflow capacity to supplement existing systems– Berkeley Water Center has analysis that far exceeds the

capacity of desktops

• Customized and controlled environments– Supernova Factory codes have sensitivity to OS/compiler

version

• Parallel programming models for data intensive science– Hadoop (data parallel, parametric runs)

• Science Gateways (Software as a Service)– Deep Sky provides an Astrophysics community data base

26

Page 27: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Magellan Research Agenda and Lines of Inquiry

• What are the unique needs and features of a science cloud?

• What applications can efficiently run on a cloud?

• Are cloud computing Programming Models such as Hadoop effective for scientific applications?

• Can scientific applications use a data-as-a-service or software-as-a-service model?

• What are the security implications of user-controlled cloud images?

• Is it practical to deploy a single logical cloud across multiple DOE sites?

• What is the cost and energy efficiency of clouds?

27

Page 28: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

It’s All Business

• Cloud computing is a business model• It can be used on HPC systems as well

as traditional clouds (ethernet clusters)• Can get on-demand elasticity through:

– Idle hardware (at ownership cost)– Sharing cores/nodes (at performance cost)

• How high a premium will you pay for it?

28

Page 29: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

Is an HPC Center a Cloud?

• Resource pooling. • Broad network access. • Measured Service. • Rapid elasticity.

– Usage can grow/shrink; pay-as-you-go.

• On-demand self-service. – Users cannot demand (or pay for) more

service than their allocation allows– Jobs often wait for hours or days in queues

29

HPC Centers ?

X

Page 30: Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon Lawrence Berkeley Lab UC Cloud Summit April 19, 2011

What HPC Can Learn from Clouds

• Need to support surge computing– Predictable: monthly processing of

genome data; nightly processing of telescope data

– Unpredictable: computing for disaster recovery; response to facility outage

• Support for tailored software stack • Different levels of service

– Virtual private cluster: guaranteed service– Regular: low average wait time– Scavenger mode, including preemption

30