on-demand cloud computing for life sciences research and education

45
funded by the National Science Foundation Award #ACI-1445604 On-Demand Cloud Computing for Life Sciences Research and Education Matthew Vaughn(@mattdotvaughn) ORCID 0000-0002-1384-4283 Director, Life Science Computing Texas Advanced Computing Center PI @ Jetstream | Cyverse | Araport | CODE@TACC

Upload: matthew-vaughn

Post on 14-Apr-2017

43 views

Category:

Science


5 download

TRANSCRIPT

Page 1: On-Demand Cloud Computing for Life Sciences Research and Education

funded by the National Science FoundationAward #ACI-1445604

On-Demand Cloud Computing for Life Sciences Research and Education Matthew Vaughn(@mattdotvaughn)ORCID 0000-0002-1384-4283Director, Life Science ComputingTexas Advanced Computing CenterPI @ Jetstream | Cyverse | Araport | CODE@TACC

Page 2: On-Demand Cloud Computing for Life Sciences Research and Education

Overview

• What is Cloud Computing?• Why do we need Cloud Computing?• What is Jetstream?• How is Jetstream different from

CyVerse Atmosphere?• How can one get started using

Jetstream?

Page 3: On-Demand Cloud Computing for Life Sciences Research and Education

“…A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing

resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released

with minimal management.” NIST Definition of Cloud Computing

http://www.nist.gov/itl/csd/cloud-102511.cfm

What is Cloud Computing?

Page 4: On-Demand Cloud Computing for Life Sciences Research and Education

Why do we need Cloud Computing?

Page 5: On-Demand Cloud Computing for Life Sciences Research and Education

SCIENTIFIC COMPUTING THEN

• C/C++/FORTRAN/PERL/SHELL• MPI• LAPACK/BLAS/PETSC• GRID ENGINE• UNIX*• X86/PPC/SPARC• COPROCESSORS

Page 6: On-Demand Cloud Computing for Life Sciences Research and Education

SCIENTIFIC COMPUTING NOWLANGUAGES

• Python 2 & 3• R• Julia• Perl• Matlab• Java• Scala, Clojure, etc• .NET • C/C++• Swift• Haskell• Go• Javascript

FRAMEWORKS

• MapReduce Hadoop, Storm, Pachyderm, Cloudera• Event & Streaming: Kinesis, Azure Stream Analytics, Camel, Streambase• Deep/Machine Learning: Watson, Azure BI, Tensorflow, Caffe• In-memory parsing: Kognito, Apache Spark• Containers: Docker, Rocket, MESOS, Kubernetes• Cloud: AWS, GCE, OpenStack, vCloud, Azure

HARDWARE

•Many-core computing - 50-100 threads/node*

• Xeon / Xeon Phi• GPU• OpenPower• ARM• ShenWei• Google TPU

• Multi-level memory architecture• Hierarchical storage• FPGAs• Quantum-like systems

Page 7: On-Demand Cloud Computing for Life Sciences Research and Education

Why don’t we all use Cloud?

Cloud computing is complex to adopt

• It can require a high degree of technical knowledge

• Users can incur unexpected costs• Data management and movement can

be challenging• There’s no systematic support for

adopting cloud technologies or services

Page 8: On-Demand Cloud Computing for Life Sciences Research and Education

Jetstream is a Managed Cloud tailored to science and engineering

Page 9: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

Supported by NSF ACI, operated by your friendsConstruction

Application / Community LeadsManagement & Operations

Vendors

Page 10: On-Demand Cloud Computing for Life Sciences Research and Education

It’s built to address multiple user perspectives

User Accomplishments Role

• Learned how to use the shell and how to work with Linux• Mastered using R to develop plots for his manuscript• Published VM to IU Scholarworks to allow reproducible analysis

Laboratory scientist

• Launches an instance and has full sudo access to customize• Developed a software with R and Python library dependencies• She updates it regularly by creating VM image snapshots

Informatics specialist

• Linked several Jetstream instances with Apache Hadoop• Worked with XSEDE ECSS to import existing Amazon image• Built a simple science gateway to allow others to use his tools

Core facility staff

Page 11: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

Leadership/national class machinesInstitutional HPC, HTC systems

Jetstream serves the “Long Tail”

Page 12: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

Jetstream serves the “Long Tail”

Researchers, developers, and scientists who:– Need between 1 and a few hundred cores

• RIGHT NOW• For the foreseeable future, but not forever

– Want to fully customize the OS and configuration for their research computing environment

– Are working with cloud-native applications & workflows– Use interactive mode for their computing & analytics

Page 13: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

• Jetstream is also helpful for:– Science gateway operators

• Run web applications, databases, and services (front-end)• Use it for on-demand provisioned compute capacity

– STEM educators teaching a variety of subjects• Create a reference VM appliance• Provision an entire classroom• Minimize need for local IT support

• It democratizes access to cloud-native technologies and approaches – no credit card or PO needed

Jetstream serves the “Long Tail”

Page 14: On-Demand Cloud Computing for Life Sciences Research and Education

Jetstream is an innovative combination of powerful hardware and sophisticated

software

Page 15: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

Hardware Systems Overview

Page 16: On-Demand Cloud Computing for Life Sciences Research and Education

Platform Overview

Atmosphere APIGlobus Auth

Atmo Services XSEDE Services

OpenStack CEPH

Indiana University

OpenStack CEPH

TACC

OpenStack CEPH

Potentially, Others

Web App

Page 17: On-Demand Cloud Computing for Life Sciences Research and Education

Jetstream is easy-to-use Cyberinfrastructure

Page 18: On-Demand Cloud Computing for Life Sciences Research and Education
Page 19: On-Demand Cloud Computing for Life Sciences Research and Education
Page 20: On-Demand Cloud Computing for Life Sciences Research and Education
Page 21: On-Demand Cloud Computing for Life Sciences Research and Education
Page 22: On-Demand Cloud Computing for Life Sciences Research and Education

GateOne Web Shell

• Zero install SSH client• Allows tablets or

Windows to easily use the system

• Supports screen-casting and terminal sharing

Can also use native SSH client with SSH keys

Page 23: On-Demand Cloud Computing for Life Sciences Research and Education

Jetstream is programmable Cyberinfrastructure

Page 24: On-Demand Cloud Computing for Life Sciences Research and Education

Jetstream enables access modes uncommon in academic research computing

Web Services

• OpenStack API• Compatible with all clients +

libraries• Python, Java, Go + CLI

• Atmosphere API – Pre-release• Preview docs @ http

://docs.atmospherev2.apiary.io/• CLI later in 2017

Finicky

Automation, Orchestration, and Workflow

• Marathon/MESOS• Docker Compose, Machine, Swarm• Kubernetes• CloudMan & Elasticluster• OpenStack Magnum

Configuration Management Tools

• Vagrant / Terraform• Chef• Ansible• Puppet

In Dev

Page 25: On-Demand Cloud Computing for Life Sciences Research and Education

Annotate a Genome Using WQ-Maker

Data analysis in Rstudio

Run your own JupyterHub

Publish a VM with DOI

Operate a Galaxy server

Teach a Bioinformatics class

Experiment with Deep Learning

Master Docker or K8S

Mine data using ArcGIS

Develop web services

Page 26: On-Demand Cloud Computing for Life Sciences Research and Education

Jetstream is a little different from CyVerse Atmosphere (but not too much)

Page 27: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

Compared to CyVerse Atmosphere, Jetstream…

• Is allocated via XSEDE competitive process• Gets its support from XSEDE-sponsored staff• Uses XSEDE credentials and SSH keys for login• Supports more APIs and automation• Has more, bigger virtual machine hosts &

better networking• Is not integrated with CyVerse Data Store

Page 28: On-Demand Cloud Computing for Life Sciences Research and Education

How can one get started using Jetstream?

Page 29: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

It’s pretty easy:• Sign up for a free XSEDE account (portal.xsede.org)• Get an allocation (startups take 1 business day to

approve)• Start using it use.jetstream-cloud.org• Read more and get help at www.jetstream-cloud.org• Cite Jetstream in your papers (instructions on web)• Follow the project on Twitter @jetstream_cloud

Page 30: On-Demand Cloud Computing for Life Sciences Research and Education

Questions and Discussion

Slides at slideshare.net/mattdotvaughn

Page 31: On-Demand Cloud Computing for Life Sciences Research and Education

Spin Up Your Own Cluster

elasticluster start slurm-js-iu

“Launch a 3-node SLURM cluster with Gluster storage”

Cluster in a box• Slurm, SGE, Torque• Centos, Ubuntu• Hadoop• Gluster, Ceph, NFS• Ansible

-> XSEDE XNIT• compatible clusters

Page 32: On-Demand Cloud Computing for Life Sciences Research and Education

Use Marathon, MESOS, & Docker

Page 33: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

What comes next?

• Passed acceptance (handily) in May• OpenStack clouds, CEPH storage, and software components functional

– Substantial software integration and development required post-acceptance. Terra incognita!

• Running in “Early operations mode”– Extra maintenance days– A few rough edges (especially for power users)

• Full production - Sep 1, 2016• More features and capabilities will come• Public roadmap soon

Page 34: On-Demand Cloud Computing for Life Sciences Research and Education

Free tier makes it really easy to get started on public cloud$

Our idea: Let any user with active XSEDE User Portal account use a small (but functional) slice of Jetstream• Get an XSEDE account• Sign in to XSEDE User Portal• Click “Trial Jetstream Access”

button• Get access to Jetstream in

about 30 minutes

Page 35: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

Jetstream in context

• HTC system• GPGPUs• Virtual Clusters• Gateways

• Data Intensive• Portal-based

configuration• Services• Data sets

• Large Memory• Data Intensive

Computing• Managed VM

• Self-service VM• Gateways• Minimal disk• Extensible

COMET WRANGLER BRIDGES JETSTREAM

Page 36: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

Jetstream in context (2)Blue Waters & Stampede 1/2Comet, Bridges, Regional HPC, Public

Cloud$

Page 37: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

SummaryJetstream is a public, production cloud platform

• Offers on-demand interactive shell + VNC• Supports configurable software

environments• Enables configurable computing resources• Encourages work with cloud-native software• Empowers novice, intermediate, & expert

users

Page 38: On-Demand Cloud Computing for Life Sciences Research and Education

[email protected]

@mattdotvaughnwww.slideshare.net/mattdotvaughn

@jetstream_cloud

http://use.jetstream-cloud.org/

http://portal.xsede.org

https://github.com/jetstream-cloud

Page 39: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

How can I use Jetstream?• An XSEDE User Portal (XUP) account is required. They are

free! Get one at https://portal.xsede.org• Read the Allocations Overview -

https://portal.xsede.org/allocations-overview• Write a successful allocation request – start with a Startup

or Education request - https://portal.xsede.org/successful-requests

Page 40: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

Where can I get help or learn more?• Production:

– Web app: http://use-jetstream-cloud.org/– User guides: https://portal.xsede.org/jetstream– XSEDE KB: https://portal.xsede.org/knowledge-

base– Email: [email protected]– Campus Champions:

https://www.xsede.org/campus-champions– Training Videos / Virtual Workshops (TBD)

Page 41: On-Demand Cloud Computing for Life Sciences Research and Education

Expanding NSF XD’s reach and impactAround 299,000 researchers, educators, & learners received NSF support in 2012-2013

– Only 1.5% completed a computation, data analysis, or visualization task on XSDEDE program resources

– Less than 3% had an XSEDE Portal account– 70% of researchers surveyed* claimed to be resource

constrainedWhy don’t they use XSEDE systems?

– Activation energy is pretty high– HPC resources are scarce and not well-matched to their needs– They just don’t need that much capability

* https://www.xsede.org/xsede-nsf-release-cloud-survey-report

Page 42: On-Demand Cloud Computing for Life Sciences Research and Education

Funded by the National Science FoundationAward #ACI-1445604

http://jetstream-cloud.org/

What is Jetstream?A production cloud platform for NSF-sponsored researchers

• Provides on-demand interactive computing and analysis• Enables configurable environments and architectures• Supports computational reproducibility and sharing• Democratizes access to cloud-native software• Focused on ease of use for all adopters

Expands the community of users who benefit from NSF investment in shared cyberinfrastructure

Page 43: On-Demand Cloud Computing for Life Sciences Research and Education

Flavor vCPUs RAM Storage Per Nodem.tiny 1 2 20 46

m.small 2 4 40 23

m.medium 6 16 130 7

m.large 10 30 230 4

m.xlarge 22 60 460 2

m.xxlarge 44 120 920 1

VM Host Configuration• Dual Intel E-2680v3 “Haswell”• 24 physical cores/node @ 2.5

GHz (Hyperthreading on)• 128 GB RAM• Dual 1 TB local disks• 10GB dual uplink NIC• Running Centos7+KVM

Hypervisor

Hardware Specifics

CEPH Storage• 20x Dell 730xd per cloud• 2x10Gbs bonded NIC per 730xd• Running CEPH 0.94.5 Hammer• Configured as OpenStack Storage

• Storage is XSEDE-allocated• Implemented on backend as OpenStack Volumes• Each user gets 10 volumes up to 500GB total storage• Exploring object storage as well but that’s in the future

Page 44: On-Demand Cloud Computing for Life Sciences Research and Education

Key components: What is OpenStack?

• Free and open source software for creating private and public clouds

• Started in 2010 by Rackspace and NASA, now managed by OpenStack Foundation

• Widespread adoption across industry and public sector

• Modular architecture with a 6-month update cycle

https://www.openstack.org/

Page 45: On-Demand Cloud Computing for Life Sciences Research and Education

Key components: Everything elseComponent Function

Atmosphere Web application + middleware for providing user-friendly provider-agnostic IaaS

Globus Auth Provides powerful identity and access management functions that are easily integrated into web and mobile applications

XSEDE Services Centralized account management, allocation, and reporting

CEPH Distributed object store and file system designed to provide excellent performance, reliability and scalability