on-demand cloud computing for life sciences research and education
TRANSCRIPT
funded by the National Science FoundationAward #ACI-1445604
On-Demand Cloud Computing for Life Sciences Research and Education Matthew Vaughn(@mattdotvaughn)ORCID 0000-0002-1384-4283Director, Life Science ComputingTexas Advanced Computing CenterPI @ Jetstream | Cyverse | Araport | CODE@TACC
Overview
• What is Cloud Computing?• Why do we need Cloud Computing?• What is Jetstream?• How is Jetstream different from
CyVerse Atmosphere?• How can one get started using
Jetstream?
“…A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released
with minimal management.” NIST Definition of Cloud Computing
http://www.nist.gov/itl/csd/cloud-102511.cfm
What is Cloud Computing?
Why do we need Cloud Computing?
SCIENTIFIC COMPUTING THEN
• C/C++/FORTRAN/PERL/SHELL• MPI• LAPACK/BLAS/PETSC• GRID ENGINE• UNIX*• X86/PPC/SPARC• COPROCESSORS
SCIENTIFIC COMPUTING NOWLANGUAGES
• Python 2 & 3• R• Julia• Perl• Matlab• Java• Scala, Clojure, etc• .NET • C/C++• Swift• Haskell• Go• Javascript
FRAMEWORKS
• MapReduce Hadoop, Storm, Pachyderm, Cloudera• Event & Streaming: Kinesis, Azure Stream Analytics, Camel, Streambase• Deep/Machine Learning: Watson, Azure BI, Tensorflow, Caffe• In-memory parsing: Kognito, Apache Spark• Containers: Docker, Rocket, MESOS, Kubernetes• Cloud: AWS, GCE, OpenStack, vCloud, Azure
HARDWARE
•Many-core computing - 50-100 threads/node*
• Xeon / Xeon Phi• GPU• OpenPower• ARM• ShenWei• Google TPU
• Multi-level memory architecture• Hierarchical storage• FPGAs• Quantum-like systems
Why don’t we all use Cloud?
Cloud computing is complex to adopt
• It can require a high degree of technical knowledge
• Users can incur unexpected costs• Data management and movement can
be challenging• There’s no systematic support for
adopting cloud technologies or services
Jetstream is a Managed Cloud tailored to science and engineering
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Supported by NSF ACI, operated by your friendsConstruction
Application / Community LeadsManagement & Operations
Vendors
It’s built to address multiple user perspectives
User Accomplishments Role
• Learned how to use the shell and how to work with Linux• Mastered using R to develop plots for his manuscript• Published VM to IU Scholarworks to allow reproducible analysis
Laboratory scientist
• Launches an instance and has full sudo access to customize• Developed a software with R and Python library dependencies• She updates it regularly by creating VM image snapshots
Informatics specialist
• Linked several Jetstream instances with Apache Hadoop• Worked with XSEDE ECSS to import existing Amazon image• Built a simple science gateway to allow others to use his tools
Core facility staff
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Leadership/national class machinesInstitutional HPC, HTC systems
Jetstream serves the “Long Tail”
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Jetstream serves the “Long Tail”
Researchers, developers, and scientists who:– Need between 1 and a few hundred cores
• RIGHT NOW• For the foreseeable future, but not forever
– Want to fully customize the OS and configuration for their research computing environment
– Are working with cloud-native applications & workflows– Use interactive mode for their computing & analytics
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
• Jetstream is also helpful for:– Science gateway operators
• Run web applications, databases, and services (front-end)• Use it for on-demand provisioned compute capacity
– STEM educators teaching a variety of subjects• Create a reference VM appliance• Provision an entire classroom• Minimize need for local IT support
• It democratizes access to cloud-native technologies and approaches – no credit card or PO needed
Jetstream serves the “Long Tail”
Jetstream is an innovative combination of powerful hardware and sophisticated
software
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Hardware Systems Overview
Platform Overview
Atmosphere APIGlobus Auth
Atmo Services XSEDE Services
OpenStack CEPH
Indiana University
OpenStack CEPH
TACC
OpenStack CEPH
Potentially, Others
Web App
Jetstream is easy-to-use Cyberinfrastructure
GateOne Web Shell
• Zero install SSH client• Allows tablets or
Windows to easily use the system
• Supports screen-casting and terminal sharing
Can also use native SSH client with SSH keys
Jetstream is programmable Cyberinfrastructure
Jetstream enables access modes uncommon in academic research computing
Web Services
• OpenStack API• Compatible with all clients +
libraries• Python, Java, Go + CLI
• Atmosphere API – Pre-release• Preview docs @ http
://docs.atmospherev2.apiary.io/• CLI later in 2017
Finicky
Automation, Orchestration, and Workflow
• Marathon/MESOS• Docker Compose, Machine, Swarm• Kubernetes• CloudMan & Elasticluster• OpenStack Magnum
Configuration Management Tools
• Vagrant / Terraform• Chef• Ansible• Puppet
In Dev
Annotate a Genome Using WQ-Maker
Data analysis in Rstudio
Run your own JupyterHub
Publish a VM with DOI
Operate a Galaxy server
Teach a Bioinformatics class
Experiment with Deep Learning
Master Docker or K8S
Mine data using ArcGIS
Develop web services
Jetstream is a little different from CyVerse Atmosphere (but not too much)
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Compared to CyVerse Atmosphere, Jetstream…
• Is allocated via XSEDE competitive process• Gets its support from XSEDE-sponsored staff• Uses XSEDE credentials and SSH keys for login• Supports more APIs and automation• Has more, bigger virtual machine hosts &
better networking• Is not integrated with CyVerse Data Store
How can one get started using Jetstream?
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
It’s pretty easy:• Sign up for a free XSEDE account (portal.xsede.org)• Get an allocation (startups take 1 business day to
approve)• Start using it use.jetstream-cloud.org• Read more and get help at www.jetstream-cloud.org• Cite Jetstream in your papers (instructions on web)• Follow the project on Twitter @jetstream_cloud
Questions and Discussion
Slides at slideshare.net/mattdotvaughn
Spin Up Your Own Cluster
elasticluster start slurm-js-iu
“Launch a 3-node SLURM cluster with Gluster storage”
Cluster in a box• Slurm, SGE, Torque• Centos, Ubuntu• Hadoop• Gluster, Ceph, NFS• Ansible
-> XSEDE XNIT• compatible clusters
Use Marathon, MESOS, & Docker
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
What comes next?
• Passed acceptance (handily) in May• OpenStack clouds, CEPH storage, and software components functional
– Substantial software integration and development required post-acceptance. Terra incognita!
• Running in “Early operations mode”– Extra maintenance days– A few rough edges (especially for power users)
• Full production - Sep 1, 2016• More features and capabilities will come• Public roadmap soon
Free tier makes it really easy to get started on public cloud$
Our idea: Let any user with active XSEDE User Portal account use a small (but functional) slice of Jetstream• Get an XSEDE account• Sign in to XSEDE User Portal• Click “Trial Jetstream Access”
button• Get access to Jetstream in
about 30 minutes
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Jetstream in context
• HTC system• GPGPUs• Virtual Clusters• Gateways
• Data Intensive• Portal-based
configuration• Services• Data sets
• Large Memory• Data Intensive
Computing• Managed VM
• Self-service VM• Gateways• Minimal disk• Extensible
COMET WRANGLER BRIDGES JETSTREAM
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Jetstream in context (2)Blue Waters & Stampede 1/2Comet, Bridges, Regional HPC, Public
Cloud$
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
SummaryJetstream is a public, production cloud platform
• Offers on-demand interactive shell + VNC• Supports configurable software
environments• Enables configurable computing resources• Encourages work with cloud-native software• Empowers novice, intermediate, & expert
users
@mattdotvaughnwww.slideshare.net/mattdotvaughn
@jetstream_cloud
http://use.jetstream-cloud.org/
http://portal.xsede.org
https://github.com/jetstream-cloud
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
How can I use Jetstream?• An XSEDE User Portal (XUP) account is required. They are
free! Get one at https://portal.xsede.org• Read the Allocations Overview -
https://portal.xsede.org/allocations-overview• Write a successful allocation request – start with a Startup
or Education request - https://portal.xsede.org/successful-requests
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Where can I get help or learn more?• Production:
– Web app: http://use-jetstream-cloud.org/– User guides: https://portal.xsede.org/jetstream– XSEDE KB: https://portal.xsede.org/knowledge-
base– Email: [email protected]– Campus Champions:
https://www.xsede.org/campus-champions– Training Videos / Virtual Workshops (TBD)
Expanding NSF XD’s reach and impactAround 299,000 researchers, educators, & learners received NSF support in 2012-2013
– Only 1.5% completed a computation, data analysis, or visualization task on XSDEDE program resources
– Less than 3% had an XSEDE Portal account– 70% of researchers surveyed* claimed to be resource
constrainedWhy don’t they use XSEDE systems?
– Activation energy is pretty high– HPC resources are scarce and not well-matched to their needs– They just don’t need that much capability
* https://www.xsede.org/xsede-nsf-release-cloud-survey-report
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
What is Jetstream?A production cloud platform for NSF-sponsored researchers
• Provides on-demand interactive computing and analysis• Enables configurable environments and architectures• Supports computational reproducibility and sharing• Democratizes access to cloud-native software• Focused on ease of use for all adopters
Expands the community of users who benefit from NSF investment in shared cyberinfrastructure
Flavor vCPUs RAM Storage Per Nodem.tiny 1 2 20 46
m.small 2 4 40 23
m.medium 6 16 130 7
m.large 10 30 230 4
m.xlarge 22 60 460 2
m.xxlarge 44 120 920 1
VM Host Configuration• Dual Intel E-2680v3 “Haswell”• 24 physical cores/node @ 2.5
GHz (Hyperthreading on)• 128 GB RAM• Dual 1 TB local disks• 10GB dual uplink NIC• Running Centos7+KVM
Hypervisor
Hardware Specifics
CEPH Storage• 20x Dell 730xd per cloud• 2x10Gbs bonded NIC per 730xd• Running CEPH 0.94.5 Hammer• Configured as OpenStack Storage
• Storage is XSEDE-allocated• Implemented on backend as OpenStack Volumes• Each user gets 10 volumes up to 500GB total storage• Exploring object storage as well but that’s in the future
Key components: What is OpenStack?
• Free and open source software for creating private and public clouds
• Started in 2010 by Rackspace and NASA, now managed by OpenStack Foundation
• Widespread adoption across industry and public sector
• Modular architecture with a 6-month update cycle
https://www.openstack.org/
Key components: Everything elseComponent Function
Atmosphere Web application + middleware for providing user-friendly provider-agnostic IaaS
Globus Auth Provides powerful identity and access management functions that are easily integrated into web and mobile applications
XSEDE Services Centralized account management, allocation, and reporting
CEPH Distributed object store and file system designed to provide excellent performance, reliability and scalability