the iplant collaborative community cyberinfrastructure for life science jason williams cold spring...

25
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant www.iPlantCollaborative.org

Upload: loren-cannon

Post on 25-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

The iPlant Collaborative Community Cyberinfrastructure for Life Science

Jason Williams Cold Spring Harbor Laboratory, iPlant

www.iPlantCollaborative.org

Page 2: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

The iPlant CollaborativeVision

How can we prepare for science we can’t anticipate?

Page 3: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

The iPlant CollaborativeVision

Enable life science researchers and educators to use and extend cyberinfrastructure to understand and ultimately predict the complexity of biological systems

Page 4: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

The iPlant CollaborativeVision

Fulfilling our vision will mean enabling access to datasets and tools

Environmental data

Phenotype data

Phylogenetic Inferences

Ecological Models

Crop Models

Association Studies

Molecular Networks

Genomic data and analysis:• Sequencing/assembly• Transcriptome profiling• Variants• Functional annotation• Proteomics

Page 5: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

The iPlant CollaborativeVision

Genomic data

Environmental data

Phenotype data

Phylogenetic Inferences

Ecological Models

Crop Models

Association Studies

Molecular Networks

Predictive and synthetic

Knowledge gathering

Retrodictive insights

Genomic data and analysis:• Sequencing/assembly• Transcriptome profiling• Variants• Functional annotation• Proteomics

This means working with a vast landscape of data and tools:

Page 6: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

The iPlant CollaborativeVision

Genomic data

Environmental data

Phenotype data

Phylogenetic Inferences

Ecological Models

Crop Models

Association Studies

Molecular Networks

Predictive and synthetic

Knowledge gathering

Retrodictive insights

Navigating this landscape requires cyberinfrastructure:

Page 7: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

The iPlant CollaborativeWhat is cyberinfrastructure?

Cyberinfrastructure consists of computing systems, data storage systems, instruments and data repositories, visualization environments, and people, linked together by software and networks to improve research productivity and enable breakthroughs not otherwise possible. --Craig Stewart

iPlant makes computation, data storage, cloud services, and software tools easily available to informaticians and researchers, leveraging existing CI investments.

Page 8: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

Biological CyberinfrastructureThe Problem of Big Data in Biology

Page 9: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

Biological CyberinfrastructureThe Problem of Big Data in Biology

Page 10: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

• Initial funding in 2008• Almost 2 years of community input

gathering – software development starts in 2009

• Major CI components appear late 2010• Finished 5th year• Recommended for second 5 year term• > 9000 users • > 20K (analyses) jobs in 2012• > 10K HPC jobs)• 500 terabytes of user data

The iPlant CollaborativeWhere iPlant is today and where we are going

Image from: http://adammclane.com/2011/12/06/bottlenecks/

Page 11: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

iPlant Renewed by NSFSeptember 2013 begins next 5 year period

Scientific Advisory Board

Focus on Genotype-Phenotype science

NSF Recommended expansion of scope beyond plants

The iPlant CollaborativeWhere iPlant is today and where we are going

Page 12: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

The iPlant CollaborativeWhat we have to offer you

• Data Management & Storage Resources• Access to High Performance Computing Resources• Tool Integration System• Application Programming Interfaces (APIs)• Cloud Computing Resources• Genotype To Phenotype Science Enablement Portfolio• Tree of Life Science Enablement Portfolio • Image Analysis Platform• Support for Molecular Breeding Platform (IBP)• Support for AgMIP

Page 13: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

How iPlant CI Enables DiscoveryChallenge: Create an easy-to-use platform powerful enough

to handle data-intensive biology

Many bioinformatics tools “off limits” to those without specialized computational backgrounds.

Page 14: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

How iPlant CI Enables DiscoverySolution: Discovery Environment

An extensible platform for science

• High-powered computing• Data sharing/collaboration• Easy to use interface• Virtually limitless apps• Analysis history (provenance)

Page 15: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

How iPlant CI Enables DiscoveryWhat the Discovery Environment means to bench biologists

“In one week I was able to align my RNA-Seq samples using a method that had previously took me a month on the bioinformatics laboratory computers…

Richard Barker – Univ. Wisconsin, Madison

Page 16: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

How iPlant CI Enables DiscoveryChallenge: Collaborate and access software on demand

Frustrated bioinformaticians serving the needs of severalusers

+ works well / powerful- expensive / complex

Cartoon: http://phdhumor.blogspot.com/2008/12/on-lazy-day-for-bioinformatician.html

Page 17: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

How iPlant CI Enables DiscoverySolution: Atmosphere

On-demand computing resource built on a cloud infrastructure

• Virtual Machine pre-configured with: Software Memory requirements Processing power

• Plant authentication and storage and HPC capabilities

• Build custom images/appliances and share with community

• Cross-platform desktop access to GUI applications in the cloud (using VNC)

Page 18: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

How iPlant CI Enables DiscoveryWhat Atmosphere means to bioinformaticians

“What my users used to call me for, they now do on their own through Atmosphere. Now I can scale up my user community”

Nathan Miller, Univ. Wisconsin, Madison

• BLAST 400k transcripts against NCBI nr in 36 h vs. 2 months• Use iPlant Data Store to move 1500 high-res images per day

for analysis

“iPlant is a great equalizer.” Mike Covington, UC Davis

Page 19: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

How iPlant CI Enables DiscoveryChallenge: Navigate biology’s “Data deluge”

HT Image data – GB’s per dayHT sequence data – TB’s per run

Page 20: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

How iPlant CI Enables DiscoverySolution: iPlant Data Store

All data in within the same platform speed and accessibility

• Access your data from multiple iPlant services

• Automatic data backup redundant between University of Arizona and University of Texas (NSF Data management plan)

• Multiple ways to share data with collaborators

• Multi-threaded high speed transfers

• Default 100GB allocation. >1TB allocations available with justification

Source Time (s)

CD 320

Berkeley Server 150

External Drive 36*

USB2.0 Flash 30

iPlant Data Store 18*

My Computer 15

Page 21: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

How iPlant CI Enables DiscoverySolution: iPlant Data Store

“The ability to transport 2TB of data overnight using the iRODS system was particularly helpful because previously, we had been mailing hard drives which is not an optimal solution to sharing big data.”

James Koltes ,Iowa State

Page 22: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

• DNA Subway: Annotation, DNA Barcoding, RNA-Seq• Standalone Apps: TNRS, TreeViewer, PhytoBisque, etc.• iPlant Semantic Web – “Intelligent” workflow authoring• Foundation API: For programmers embedding iPlant CI capabilities

How iPlant CI Enables DiscoveryMany more applications not covered here…

Page 23: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

Highlighted Objectives and Deliverables Community identified priorities

• Increased interoperability with other data providers – e.g. BioMarts, CoGe, MaizeGDB

• Data discovery through interaction with trait repositories (trait/plant ontologies)

• Workflows for variant discovery – SNP detection pipelines

• Scalable Genome Assembly Workflows – expanded capabilities with MAKER, InterProScan

• iPlant Data Commons – Resources for storage, data conversion, and metadata

Page 24: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

The iPlant CollaborativeYour colleagues

Staff:Greg AbramSonali AdityaRitu AroraRoger BarthelsonRob BovillBrad BoyleGordon BurleighJohn CazesMike ConwayVictor CorderoRion DooleyAaron DubrowAndy EdmondsDmitry FedorovMelyssa FratkinMichael GattoUtkarsh GaurCornel Ghiban

Leadership Team

Steve Goff - UADan Stanzione – TACCMatthew Vaughn - TACCNirav Merchant - UADoreen Ware – CSHLMichael Schatz – CSHLDavid Micklos – CSHLAnn Stapleton – UNC WilmingtonRon Vetter – UNC Wilmington

Faculty Advisors & Collaborators:Ali AkogluKobus BarnardTimothy ClausnerBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDavid LowenthalB.S. Manjunath

Students:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYaDi ChenDavid ChoiBarbara Dobrin

David NealeBrian O’MearaSudha RamDavid SaltMark SchildhauerDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisSteve Welch

Zhenyuan LuEric LyonsAaron MarcuseKubitzNaim MatasciSheldon McKayRobert McLayNathan MillerSteve Mock Martha NarroShannon OliverBenoit ParmentierJmatt PetersonDennis RobertsPaul SarandoJerry SchneiderBruce Schumaker

Steve GregoryMatthew HanlonNatalie HenriquesUwe HilgertNicole HopkinsEunSook JeongLogan JohnsonChris JordanKathleen KennedyMohammed KhalfanDavid KnappLars KoersterkSangeeta KuchimanchiKristian KvilekvalSue LauterTina LeeAndrew LenardsMonica Lent

Edwin SkidmoreBrandon SmithMary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellJonathan StrootmanPeter Van BurenHans VasquezGrossRebeka VillarrealRamona WalllsLiya WangAnton Westveld Jason WilliamsJohn WregglesworthWeijia Xu

Andrew PredoehlSathee RavindranathKyle SimekGregory StriemerJason VandeventerNicholas WoodwardKuan Yang

Postdocs:Barbara BanburyChristos Noutsos Solon PissisBrad Ruhfel

John DonoghueYekatarina KhartianovaChris La RoseAmgad MadkourAniruddha MaratheAndre MercerKurt MichaelsZack Pierce

Page 25: The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant

The iPlant CollaborativeWorkshop Goals (and considerations)

• Demonstrate some of the ways iPlant CI can advance your science

• Familiarize you with iPlant tools and services

• Help you identify the best way to get started

• Workshop is fast-paced

• Use the handouts and other resources to complete what we don’t finish (+30 people/sharing limited bandwidth!)