imicrobe and ivirus: extending the iplant cyberinfrastructure from plants to microbes

68
Bonnie Hurwitz, PhD Arizona Health Sciences Center Extending the iPlant Cyberinfrastructure: From Plants to Microbes

Upload: bonnie-hurwitz

Post on 10-May-2015

638 views

Category:

Science


2 download

DESCRIPTION

iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes. Overview of work underway to add applications and computational analysis pipelines to iPlant for metagenomics and microbial ecology.

TRANSCRIPT

Page 1: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Bonnie  Hurwitz,  PhD  Arizona  Health  Sciences  Center  

Extending  the  iPlant  Cyberinfrastructure:  From  Plants  to  Microbes  

Page 2: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

The  iPlant  Collabora,ve    Community  Cyberinfrastructure  for  Life  Science  

hEp://www.iplantcollaboraIve.org  

Page 3: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

iVirus  and  iMicrobe  

Joaquin  Ruiz,  PhD Dean,  College  of  Science Darren  Boss Devesh  Chourasiya  

Funding   Staff  

Ma=  Sullivan,  PhD

Shane  Burgess,  PhD Dean,  CALS

Page 4: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

The iPlant Collaborative

Vision

Enable life science researchers and educators to use and extend cyberinfrastructure to understand and ultimately predict the complexity of biological systems

Page 5: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

How  iPlant  CI  Enables  Discovery  Challenge:  Create  an  easy-­‐to-­‐use  plaNorm  powerful  enough  

to  handle  data-­‐intensive  biology  

Many  bioinformaIcs  tools  “off  limits”  to  those  without  specialized  computaIonal  backgrounds.  

Page 6: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

iPlant is a collaborative virtual organization

The iPlant Collaborative Who makes up iPlant?

Page 7: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

The iPlant Collaborative How is iPlant funded?

iPlant Renewed by NSF

September 2013 begins next 5 year period Scientific Advisory Board Focus on Genotype-Phenotype science NSF Recommended expansion of scope beyond plants

 

Page 8: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

iPlant collaborates to enable access to the solutions that work the

best for the community…

The iPlant Collaborative Who does iPlant collaborate with?

Page 9: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

How  iPlant  CI  Enables  Discovery  Overview  of  resources  

End  Users  

Compu

ta0o

nal  U

sers   Teragrid

XSEDE

ü  Storage  ü  Computa0on  ü  Hos0ng  ü  Web  Services  ü  Scalability  

Building  a  plaNorm  that  can  support  diverse  and  constantly  evolving  needs.  

Page 10: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

iPlant Data Store

ü  Initial 100 GB allocation – TB allocations available

ü  Automatic data backup

ü  Easy upload /download and sharing

The resources you need to share and manage data with your lab, colleagues and community

Page 11: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Discovery Environment Hundreds of bioinformatics Apps in an easy-to-use interface ü  A platform that can run almost any bioinformatics application

ü  Seamlessly integrated with data and high performance computing

ü  User extensible – add your own applications

Page 12: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Agave API Fully customize iPlant resources ü  Science-as-a-service platform

ü  Define your own compute, and storage resources (local and iPlant)

ü  Build your own app store of scientific code and workflows

Page 13: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Atmosphere Cloud computing for the life sciences ü  Simple: One-click access to more than 100 virtual machine

images

ü  Flexible: Fully customize your software setup

ü  Powerful: Integrated with iPlant computing and data resources

Page 14: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

DNA Subway Educational workflows for Genomes, DNA Barcoding, RNA-Seq ü  Commonly used bioinformatics tools in streamlined workflows

ü  Teach important concepts in biology and bioinformatics

ü  Inquiry-based experiments for novel discovery and publication of data

Page 15: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Bisque Image analysis, management, and metadata

ü  Secure image storage, analysis, and data management

ü  Integrate existing applications or create new ones

ü  Custom visualization and image handling routines and APIs

Page 16: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Typical  End  Users  

Computa0onal  Users   Teragrid

XSEDE

iMicrobe  and  iVirus  Leverage  the  iPlant  Cyberinfrastructure  

ü  Storage  ü  Computa0on  ü  Analysis  ü  App  dev.  ü  Pipeline  dev.  ü  Code  distrib.  ü  Data  

Discoverability    

 

Using  iPlant  for:  

Page 17: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

What’s  Under  the  Hood?  Stampede  -­‐  High  Level  Overview  

•  Base  Cluster  (Dell/Intel/Mellanox):  –  Intel  Sandy  Bridge  processors  –  Dell  dual-­‐socket  nodes  w/32GB  RAM  (2GB/core)  –  6,400  nodes  –  56  Gb/s  Mellanox  FDR  InfiniBand  interconnect  –  More  than  100,000  cores,  2.2  PF  peak  performance  

•  Co-­‐Processors:    –  Intel  Xeon  Phi  “MIC”  Many  Integrated  Core  processors  –  Special  release  of  “Knight’s  Corner”  (61  cores)  –  All  MIC  cards  are  on  site  at  TACC  

more  than  6000  installed  final  installa0on  ongoing  for  formal      

summer  acceptance  –  7+  PF  peak  performance  

•  Max  Total  Concurrency:  –  exceeds  500,000  cores  –  1.8M  threads  

 •  Entered  produc,on  opera,ons  on  January  7,  2013  

Page 18: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

iMicrobe/ iVirus: New App Development

June 2013 – May 2014: 13: New Apps 1: High-throughput analysis pipeline

Page 19: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Forging  Ahead  with  iPlant  

•  Build  a  metegenomics  toolkit    

•  Streamline  metagenomics  workflows  

•  Enable  high-­‐throughput  compuIng  

•  Provide  key  datasets  for  computaIon  

Page 20: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

iPlant Data Store

The resources you need to share and manage data with your lab, colleagues and community

Page 21: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Overview  of  the  iPlant  Data  Store Some  Complica0ons  of  Big  Data  

•  Difficult/slow  transfers    •  Expense  for  storage/backup    •  Difficult  to  share  and  publish    •  Metadata    •  Analysis  

Page 22: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

iPlant  Supports  the  Life  Cycle  of  Data  

Store  

Markup   Search  

Transfer  

Analyze  Visualize  

Collaborate  Share  

Data                        Results  A                        Results  B                    Algo1                                  Algo2        

Pre-­‐  PublicaIon  

Post-­‐  PublicaIon  

Page 23: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Teragrid XSEDE

Overview  of  the  iPlant  Data  Store Scalable,  Reliable,  Redundant,  High-­‐performance  

•  Access  your  data  from  mul0ple  iPlant  services    

•  Automa0c  data  backup  (redundant  between                University  of  Arizona  and  University  of  Texas)    •  Mul0ple  ways  to  share  data  with  collaborators  

•  Mul0-­‐threaded  high  speed  transfers  

•  Default  100GB  alloca0on.  >1TB  alloca0ons                available  with  jus0fica0on    

Page 24: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Overview  of  the  iPlant  Data  Store Some  important  items  we  won’t  see  

Source   DesInaIon   Copy  Method   Time  (seconds)  

CD   My  Computer   cp   320  

Berkeley  Server   My  Computer   scp   150  

External  Drive   My  Computer   cp   36  

USB2.0  Flash   My  Computer   cp   30  

iDS   MyComputer   iget   18  

My  Computer   My  Computer   cp   15  

Close  to  op0mum  condi0ons;  transfer  between    Univ.  of  Arizona  and  UC  Berkeley    

100GB:  29m15s  1  GB  /  17.5  seconds  

   

Page 25: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Discovery Environment

Hundreds of bioinformatics Apps in an easy-to-use interface

Page 26: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Overview  of  the  iPlant  Discovery  Environment

Through  the  Discovery  Environment  you  have:  

 •  High-­‐powered  compu0ng  

•  iPlant  data  store    

•  Easy  to  use  interface  

•  Virtually  limitless  apps  

•  Analysis  history  (provenance)  

Page 27: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

What  you  can  do  in  the  iPlant  DE?

Scalable  plajorm  for      powerful  compu0ng,  data,  and  applica0on  resources  

 •  Navigate  the  components  of  the  DE  

•  Access  and  manipulate  data  

•  Start  and  complete  an  analysis  

•  Track  your  analysis  and  see  your  results    

Page 28: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Why  is  iPlant  DE  Scalable?

Democra0ze  your  code    

•  Rich  plajorm  for  bioinforma0cs                    ~400  apps  (and  coun0ng)  •  Data  co-­‐localized  with  analysis  •  Easy  to  use  interface,  with  access                to  support  •  Easy  to  integrate  and  customize  your  own              tools  

Page 29: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Goal:  Create  a  metagenomic  assembly.    Task  1:  Upload  metagenomic  fasta  file  to  your  personal  data  store    Task  2:  Run  quality  control  on  your  raw  sequence  reads    Task  3:  Find  and  select  an  assembly  tool  (e.g.  Metavelvet)    Task  4:  Specify  parameters  and  your  input  files.    Run  the  assembly  App.    Task  5:  Monitor  the  progress  of  your  analysis  and  save  parameters.    Task  6:  View  your  results.  

Discovery  Environment  Example  

Page 30: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Sequence Quality Control in the iPlant DE

Page 31: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Genome, Metagenome, and Transcriptome

Assembly

Genome and Metagenome Assembly

ALLPATHS-LG

Newbler

SOAPdenovo

Velvet

MetaVelvet

ABySS

SPA

Digital Norm.

IDBA-UD

Transciptome Assembly

TrinityDe novo:

Reference-guided:

SOAPdenovo-Trans

Velvet/Oasis

Trans-ABySS

Tophat

Cufflinks

In the DEKey:

Page 32: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Where is the sample data?

Page 33: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Where is the Assembly App?

Page 34: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Specify Data and Assembly Parameters

Page 35: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Specify Run Settings

Page 36: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Track Analyses and Results

Page 37: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

What about Annotations?

•  Annotations are descriptions of features on contigs in a genome / metagenome –  Ab initio gene predictions –  Protein homology (Genbank nr, SIMAP) –  Curated protein resources (COG, Kegg, …)

•  Secondary annotations –  InterPro Scan (Pfam, PIR, Prosite, …) –  GO and other ontologies –  Pathway Mapping (Kegg, Metacyc, Ecocyc)

Page 38: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Genome and Metagenome Assembly

ALLPATHS-LG

Newbler

SOAPdenovo

Velvet

MetaVelvet

ABySS

SPA

Digital Norm.

IDBA-UD

Ab initio Gene Prediction

Glimmer

Prodigal

FragGeneScan

Metagene

MetaGenmark

Transciptome Assembly

TrinityDe novo:

Reference-guided:

SOAPdenovo-Trans

Velvet/Oasis

Trans-ABySS

Tophat

Cufflinks

Meta-Genome

input

Evidenceinput

Conversion Tools

Annotation

Primary:

Secondary:

BLAST

tophat2gff

cufflinks2gff

Visualization

k-mer based

InterProScan

InterPro2GO

JBrowse

Web-Apollo

Data Commons:Genomes and MetagenomesProteins / GenesReference AnnotationsMetadata (in irods)

At TACCIn the DE Under DevelopmentKey:

Assembly & Annotation at

iPlant

ü  Storage  ü  Computa0on  ü  Analysis  ü  Data  Access  ü  Code  Distr.  ü  Query  by  

metadata  

Page 39: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes
Page 40: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

The  Louis  Pasteur  Method:  We  can’t  “see”  all  bacteria  using  culture-­‐based  approaches  

Razumov  (1932)  “The  Great  Plate  Anomaly.”  

Page 41: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

         Community  

         Genomics      

       Isolate  

                                     Metagenomics  

The  Post-­‐Genomic  Era:  from  Pasteur  to  CSI  

Page 42: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Environmental    Sample  

Extract  DNA   High  throughput  sequencing  

Assemble  reads   Gene  Prediction  

library  creation  

Making  Sense  of  Metagenomes  

Function  

Taxonomy  Compare  to    known  proteins  

Page 43: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Viromes are dominated by the Unknown

PhoIc   AphoIc  

Hurwitz BL & Sullivan MB. The Pacific Ocean Virome (POV). PLoS One. 8: e57355.

Bacteria  5%   Eukaryota  

1%  

Archaea    0%  

Viruses  3%  

Viruses  7%   Bacteria  

4%  Eukaryota  1%  

Archaea    0%  

Unknown  88%  

Unknown  91%  

We  need   new  tools!

Page 44: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Phage  FuncIon  based  on  Environment  

PcPipe:  a  VigneEe  in  Viral  Metagenomics  

Page 45: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Assemble Find GenesProteinClusters

Input reads

Input reads

Cluster Genes

BIN

Organizing  the  Unknown  

Yooseph  S,  et  al.  (2007)  The  Sorcerer  II  Global  Ocean  Sampling  expedi0on:  expanding  the  universe  of  protein  families.  PLoS  Biol  5(3):e16.    

Page 46: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

27K  High-­‐Confidence  Viral  Protein  Clusters  

GOS    50%  

POV  +  GOS  22%  

POV    28%  

Isolate    Phage  1%  

2X  environmental  viral  protein  clusters  

 

70%  of  data  now  included  

Hurwitz BL & Sullivan MB. (2013) The Pacific Ocean Virome (POV). PLoS One. 8: e57355.

Page 47: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Ocean  Microbial  CommuniIes  Vary  by  Environmental  Factors  

Pacific  Ocean  Virome:  Geographic  Region  LocaIon  on  a  Transect  Season  Depth    Hurwitz BL & Sullivan MB. (2013) The Pacific Ocean Virome (POV). PLoS One. 8: e57355.

Page 48: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

GDSGFSM5ODM4OSM2MSLF26SLA26SLJ26SLJ12SLJ4SM1CSSTCSSFCSSFSSSFDSM3MDLJ12DLJ26DLJ4OLJ12ALJ4DLJ4AM6O1KM7O4KLF26DLF26OLJ12OLF26ALA26ALA26OLJ26OLA26D

LJ4O

LJ12A

LJ4D

LJ4A

M6O1K

M7O4K

LF26D

LJ12O

LF26O

LF26A

LJ26O

LA26A

LA26O

LA26D

LJ26D

LJ12D

M3MD

GDS

GFS

M4OS

M5OD

LJ4S

LJ12S

LJ26S

LA26S

LF26S

M2MS

M1CS

SFSS

SFDS

SFCS

STCS

Aphotic Photic

Aphotic

Photic

Hurwitz  BL,  Brum  J.  and  Sullivan  MB.  Depth  Stra0fied  Func0onal  and  Taxonomic  Niche  Specializa0on  in  the  ‘Core’  and  ‘Flexible’  Pacific  Ocean  Virome  .    In  Review.    

Photic vs Photic

Aphotic vs Photic

Aphotic vs Aphotic

Photic vs Aphotic

Protein  Clusters  group  by  phoIc  zone  

Many  PCs  shared Some  PCs  shared Few  PCs  shared

Page 49: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Host  Genes  that  Promote  Viral  ReplicaIon  

Fe-­‐S  cluster  biogenesis  and  funcIon  DNA/Protein  biosynthesis  and  repair  Host  “wake-­‐up”  Energy  producIon  in  photosynthesis  

Niche  Defining  PhoIc  Core:  

Hurwitz  BL,  Hallam  S.,  Sullivan  MB.  (2013)  Metabolic  Reprogramming  by  Viruses  in  the  Sunlit  and  Dark  Ocean.  Genome  Biology,  14,  R123.  Hurwitz  BL,  Brum  J.  and  Sullivan  MB.  Depth  Stra0fied  Func0onal  and  Taxonomic  Niche  Specializa0on  in  the  ‘Core’  and  ‘Flexible’  Pacific  Ocean  Virome  .    In  Review.    

Page 50: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

AdapIve  for  High  Pressure  Environments  

DNA  replicaIon  iniIaIon  

DNA  repair  

MoIlity  

Energy  producIon  in  the  TCA  cycle  

Niche  Defining  AphoIc  Core:  

Hurwitz  BL,  Hallam  S.,  Sullivan  MB.  (2013)  Metabolic  Reprogramming  by  Viruses  in  the  Sunlit  and  Dark  Ocean.  Genome  Biology,  14,  R123.  Hurwitz  BL,  Brum  J.  and  Sullivan  MB.  Depth  Stra0fied  Func0onal  and  Taxonomic  Niche  Specializa0on  in  the  ‘Core’  and  ‘Flexible’  Pacific  Ocean  Virome.    In  Review.    

Page 51: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

QC  sequences  •  FASTQ_            shrinker  

Assembly    part  1  

•  Velveth  

pcpipe  part  1  •  Cd-­‐hit-­‐2d  

 Input  to  Analyses  

•  Blastx  to  nr  •  QIIME  •  RarefacMon    

New.fastq  

Find  Genes  •  Meta-­‐

Gene-­‐Mark  

POV  PCs  

pcpipe  part  2  •  Cd-­‐hit  

Assembly    part  2  

•  Velvetg  

New.a.faa  

iPlant  Discovery  Environment:    Automated  Workflows  

POV  +  Novel  PCs  

PCpipe:  creaIng  protein  clusters  for  viral  ecology  

Page 52: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

1.    Select  the  Apps  2.    Order  the  Apps  3.    Map  Outputs  to  Inputs  4.    Run  the  analysis  

Crea0ng  Workflows  Easy  as  1-­‐2-­‐3-­‐4  

Page 53: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Create  a  New  Workflow  

Page 54: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Provide  Workflow  Informa0on  

Page 55: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Select  the  Apps  

Page 56: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Add  the  Apps  

Page 57: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Remove  an  App  

Page 58: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Order  the  Apps  

Page 59: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

New.a.faa   POV  PCs  

Map  Outputs  to  Inputs  

Page 60: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

A  New  Workflow  

Page 61: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

User’s  ORFs  

POV  PCs  

Run  the  Workflow  

Page 62: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Automated  workflows  cannot  use  Apps  that  run  

on  the  HPC  

Page 63: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

QC  sequences  •  FASTQ_            shrinker  

Assembly    part  1  

•  Velveth  

pcpipe  part  1  •  Cd-­‐hit-­‐2d  

 AnnotaIon  

•  Protein  annotaMon  

•  Secondary  annotaMon  

 

New.fastq  

Find  Genes  •  Meta-­‐

Gene-­‐Mark  

POV  PCs  

pcpipe  part  2  •  Cd-­‐hit  

pcpipe  workflow  

Assembly    part  2  

•  Velvetg  

New.a.faa  

Gotchas  in  the  PCpipe  Workflow  

FoundaIon  API  Runs  on  XSEDE  (HPC)  cannot  be  used  in  a  workflow  

POV  +  Novel  PCs  

FoundaIon  API  Runs  on  XSEDE  

Page 64: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

iPlant App iMicrobe adapter

iMicrobe condornode

BLAST vs SIMAP

cd-hit-2d cd-hit extract proteins in novel PCs

SIMAP Annotation

Pipeline Management

Foundation Code

HPC Job distribution

on condor on condor on condor on TACC on condor

Step 1 Step 2 Step 3 Step 4 Step 5

UserORFs

ExistingProteinClusters

Input 1 Input 2

ORFs inexistingclusters

ORFs innew

clusters

Annotationfor newclusters

Output 1 Output 2 Output 3

An  Integrated  PCPipe    

Page 65: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Exis0ng  PCs  (POV)  

Directory  of  User  defined  

ORFS  

PCPipe  App    

Page 66: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

Collaborating with iPlant

•  Solve  computa0onal  boulenecks    •  Make  tools  easier  to  use  •  Share  Data  •  Provide  community  input  

Collaboration

Page 67: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

QuesIons  or  Comments?  

Bonnie  Hurwitz,  PhD  

Page 68: iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

QC  sequences  •  FASTQ_            shrinker  

Assembly    •  Velvet  

pcpipe  part  1  •  Cd-­‐hit-­‐2d  

 Gene  

AnnotaIon  •  SIMAP  •  GO  •  PFAM…    

New.fastq  

PCs  

pcpipe  part  2  •  Cd-­‐hit  

Find  Genes  •  Prodigal  

ORFs  

PCpipe:  Protein  Cluster  Pipeline  

Steps  in  iPlant  DE  

PCs  +  Novel  PCs  

(HPC  or  Cloud)