the iplant collaborative cyberinfrastructure · 2010-11-08 · the iplant collaborative...

25
The iPlant Collaborative Cyberinfrastructure aka Development of Public Cyberinfrastructure to Support Plant Science Nirav Merchant University of Arizona

Upload: others

Post on 25-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

The iPlant Collaborative Cyberinfrastructure

aka Development of Public Cyberinfrastructure to Support Plant Science"

Nirav Merchant!University of Arizona!

Page 2: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

PowerPoint Does Rocket Science--and Better Techniques for Technical Reports Essay by Edward Tufte

Page 3: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

What is iPlant?"•  iPlantʼs mission is to build the CI to support plant

biologyʼs Grand Challenge solutions"

•  Grand Challenges were not defined in advance, but identified through engagement with the community"

•  A virtual organization with Grand Challenge teams relying on national cyberinfrastructure "

•  Long term focus on sustainable food supply, climate change, biofuels, ecological stability, etc"

•  Hundreds of participants globally… Working group members at >50 US institutions, USDA, DOE, etc."

Page 4: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

Brief History"•  Formally approved by National Science Board – 12/2007"•  Funding by NSF – February 1st, 2008 "•  iPlant Kickoff Conference at CSHL – April 2008"

o  ~200 participants" Grand Challenge Workshops – Sept-Dec 2008" CI workshop – Jan 2009" Grand Challenge White Paper Review – March 2009" Project Recommendations – March 2009" Project Kickoffs – May 2009 & August 2009"  First Release of Discovery Environments – April 2010"

Page 5: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

The  paradigm  shi-  

  Classic  paradigm:  You  produce  data,  analyze,  interpret  (end  to  end)  

  Conven=onal  paradigm:  Consor=um/centers  produce  data  and  you  consume  it  

  New  Paradigm:  Consor=um/centers  have  produced  data  and  crea=ng  “cyber  infrastructure”  to  tackle  the  “grand  challenge”    

Page 6: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

GC Projects Recommended by the iPlant Board of Directors March 2009 Initial Projects:

Plant Tree of Life – iPToL – May ʻ09 "+Taxonomic Intelligence "+ APWeb2 "+ Social Networking Website

Genotype to Phenotype – iPG2P – Aug ʻ09 "+ Image Analysis Platform"

Page 7: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

  Trait Evolution, Brian Omeara"–  Post-tree analysis and mapping of ancestral traits"

  Tree Reconciliation, Todd Vision"–  Large-scale reconciliation of gene trees, co-evolving parasites, etc.,

with species trees"  Big Trees, Alexandros Stamatakis"

–  HPC Phylogenetic inference with 500K taxa"  Tree Visualization Michael Sanderson; Karen Cranston"

–  Cross cutting group for the viz needs of all!  Data Integration, Val Tannen, Bill Piel"

–  Cross cutting group for the data integration needs of all"  Data Assembly, Doug Soltis, Pam Soltis, Michael Donoghue"

–  Community and network building, data assembly"

Page 8: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

iPlant Genotype to Phenotype Working Groups"

•  NextGen Sequencing"–  Establishing an informatics pipeline that will allow the plant community to process

NextGen sequence data"•  Statistical Inference"

–  Developing a platform using advanced computational approaches to statistically link genotype to phenotype"

•  Modeling Tools"–  Developing a framework to support tools for the construction, simulation and analysis

of computational models of plant function at various scales of resolution and fidelity"•  Visual Analytics"

–  Generating, adapting, and integrating visualization tools capable of displaying diverse types of data from laboratory, field, in silico analyses and simulations"

•  Data Integration"–  Investigating and applying methods for describing and unifying data sets into virtual

systems that support iPG2P activities"

Page 9: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

What is Cyberinfrastructure? (Originally about TeraGrid)"

And More!:

- Viz

- Facilities

- Data collections

It’s a Grid!

It’s Storage!

It’s a Common Software Environ!

It’s a Network! They are

HPC Centers!

It’s Apps and

Support!

It was six men of Indostan, To learning much inclined,

Who went to see the elephant, (Though all of them were blind),

That each by observation Might satisfy his mind.

WWW.TERAGRID.ORG

Page 10: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

The iPlant Cyberinfrastructure"

Physical  Infrastructure  

Compute  Storage  

Persistent  Virtual  Machines  

TeraGrid  Open  Science  Grid  UA/ASU/TACC  

iPlant  Middleware  

Job  Submission  Workflow  Management    Service/Data    APIs      iRODS,  Grid  Technologies,  Condor,  RESTful  Services  

iPlant  Discovery  Environments  

Grand  Challenge  Workflows,  iPlant  Interfaces  Third  Party  Tools,  iPlant-­‐built  Tools,  Community  Contributed  Tools  and  Data!  

Build a CI that’s robust, leverages national infrastructure, and can grow through community contribution!

User

Page 11: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

Open Source Philosophy, Commercial Quality Process"

•  iPlant is open in every sense of the word:"– Open access to source"– Open API to build a community of contributors"– Open standards adopted wherever possible"– Open access to data (where users so choose). "

•  iPlant code design, implementation, and quality control will be based in best industrial practice"

Page 12: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

Portfolio of Activities"•  Maintaining a balance of “past, present, future”

strategies"–  “Past”: make services, systems, and support

available to existing bioinformatics projects, either to enhance them or simply make critical tools more widely available. "

–  “Present” build the best bioinformatics software tools that todayʼs technologies can provide. "

–  “Future” track emerging technologies, and where appropriate stimulate research into the creation and use of those technologies. "

Page 13: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

Portfolio of Activities"•  In a nutshell:"–  12 Working groups in the two grand challenges, each of

which is defining requirements for DE development. "Each group not only has discussions that leads to final projects, but they also

spawn prototyping efforts, tech eval projects, tool support projects, etc. "

–  Services group: provide cycles, storage, hosting, etc. to users. "

–  A comprehensive technology evaluation program to find, borrow, or build relevant technologies, headlined by the semantic web effort."

–  A number of ancillary projects related to grand challenges, i.e. APWEB, high throughput image analysis "

–  The Core development/integration effort. "

Page 14: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

Systems and Services"

•  Provide access for problems like these on large scale systems"

•  Provide the storage infrastructure for biological data (again, in support of existing projects)"

•  Provide cloud style VM infrastructure for service hosting. "

Page 15: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

iPlant : Connecting Users, Ideas and Resources"

The core foundation component comprises of :"

Data layer"Registry and Integration layer"Compute and Analysis layer"Interaction and Collaboration layer"

Page 16: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

iPlant: Using proven technologies"•  Data layer:

providing access to raw and ingested data sets including high throughput data transfers!

•  iRODS"•  GridFTP , Aspera"•  Dspace (DuraSpace), OpenArchive initiative"•  Content Distribution Networks (CDN)"•  High performance storage @ TACC (Lustre)"•  MySQL and Postgres database clusters"•  Connection to established data sources (NCBI, TAIR,

Gramene)"•  Connection to DataOne, DataNet initiatives"•  Cloud style storage (similar to Amazon S3 and Walrus)"

Page 17: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

iPlant: Using proven technologies"

•  Registry and Integration Layer:Connecting services, data and meta data elements with semantic understanding !

•  Meta data catalog management "•  Provenance tracking (W7 model)"•  Integrated Registry and Service discovery servers"•  Data Client and Data Provider Ontology

development Kit"•  Semantic Architecture (OWL based SSWAP)"

Page 18: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

iPlant: Using proven technologies"•  Compute and Analysis Layer:

Connecting tasks with scalable platforms and algorithms!

•  Virtualization (Xen clusters)"•  High Performance Computing at TACC and TeraGrid"•  Grid (Condor, BOINC, Gearman)"•  Cloud (Eucalyptus, Nimbus)"•  Reconfigurable Hardware (GP GPU, FPGA)"•  Checkpoint & Restart (DMTCP)"•  Scaling and parallelizing code (MPI)"•  Workflow engines (DAGman, Pegasus, Kepler)"

Page 19: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

iPlant: Using proven technologies"•  Interaction and Collaboration layer:

Providing end user access to unified services and data, from API to large scale visualization!

•  Google Web Toolkit (GWT driven front end)"•  Messaging bus (Java Mule, RabbitMQ, XMPP/Jabber)"•  RESTful web services (web API access)"•  Single sign-on/identity management (Shibboleth. Oauth ?)"•  Transparent HPC integration (TeraGrid science gateway and

TACC resources"•  Integration with desktop applications (via web services)"•  Collaboration platforms (openmeeting, webex wiki, mailman)"•  Shared analysis (shared workflows, desktop view)"•  Sharing data (DOI, persistent URL, CDN, social networks)"•  Large scale visualization (Large Tree, Paraview, SAGE)"

Page 20: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

Storage Services"•  We have also begun offering storage to a

number of projects connected to the grand challenges in some way, as well as iPlant internal."–  IRODS interface"– Corral at TACC, a local storage array at UA"

•  Data arriving now for 1KP project, Gates C3/C4 project. "

Page 21: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

Cloud Services"

•  iPlant is now offering “cloud” style hosting services."

•  Dynamically launch virtual servers hosted by iPlant."

•  Still in prototype "

Page 22: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

22

SaaS:  So;ware  as  a  Service  

(e.g.  Clustering/Assembly  is  a  service)  

IaaS:  Infrastructure  as  a  Service    

(get  computer  Eme  with  a  credit  card  and  with  a  Web  interface  like  EC2)  

PaaS:  PlaIorm  as  a  Service  

IaaS  plus  core  so;ware  capabiliEes  on  which  you  build    SaaS  

(e.g.  Hadoop/MapReduce  is  a  PlaIorm)    

Cyberinfrastructure    Is  “Research  as  a  Service”  

Arrival of “As a Service” models

Page 23: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

What do working groups want ?"•  Wiki"•  Shared storage"•  WebEX"•  CMS"•  Google apps"•  Machine for prototyping/development"•  Change management s/w (git/svn)"•  Access to compute grid/cluster"

Page 24: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

What iPlant wants"

•  Ability to integrate single sign on (sso) with all services we offer (api, cloud, grid, irods etc)"

•  Leverage credentials from users home institutions"

•  Lower the barrier to access while still being secure"

•  Emphasis on ease of access to “research as a service”"

Page 25: The iPlant Collaborative Cyberinfrastructure · 2010-11-08 · The iPlant Collaborative Cyberinfrastructure ... (Java Mule, RabbitMQ, XMPP/Jabber)" • RESTful web services (web API

25

Phases of a project"

•  Enthusiasm "•  Disillusionment "•  Panic "•  Search for the guilty "•  Punishment of the innocent "•  Praise and honor for the non-participants "