apollo: scalable & collaborative curation of genomes - biocuration 2015

19
APOLLO: Scalable and collaborative genome curation Monica Munoz-Torres, PhD | @monimunozto Nathan Dunn, Colin Diesh * , Deepak Unni * , Seth Carbon, Heiko Dietze, Christopher Mungall, Nicole Washington, Ian Holmes * , Christine Elsik * , and Suzanna E. Lewis Berkeley Bioinformatics Open-Source Projects Genomics Division, Lawrence Berkeley National Laboratory 8 th International Biocuration Conference. Beijing, China. 24 April, 2015

Upload: monica-munoz-torres

Post on 17-Jul-2015

122 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

APOLLO: Scalable and collaborative genome curation

Monica Munoz-Torres, PhD | @monimunoztoNathan Dunn, Colin Diesh*, Deepak Unni*, Seth Carbon, Heiko Dietze, Christopher Mungall, Nicole Washington, Ian Holmes*, Christine Elsik*, and Suzanna E. LewisBerkeley Bioinformatics Open-Source ProjectsGenomics Division, Lawrence Berkeley National Laboratory8th International Biocuration Conference. Beijing, China. 24 April, 2015

Page 2: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

OUTLINE

•  LAST  TIME  where  we  le.  off  last  year  

 •  IMPROVEMENTS  

architecture,  scalability,  features    •  COLLABORATIONS  

JBrowse  &  GenSAS    •  FUTURE  PLANS  

what  lies  on  the  horizon  

Apollo  Scalable  and  CollaboraJve    Genome  CuraJon  

2 OUTLINE

Page 3: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

APOLLOgenome annotation editing tool

3

v  Web  based,  integrated  with  JBrowse.  v  Supports  real  Jme  collaboraJon!  v  AutomaJc  generaJon  of  ready-­‐made  computable  data.    v  Supports  annotaJon  of  genes,    pseudogenes,  tRNAs,  snRNAs,  

snoRNAs,  ncRNAs,  miRNAs,  TEs,  and  repeats.  v  IntuiJve  annotaJon,  gestures,  and  pull-­‐down  menus  to  create  and  

edit  transcripts  and  exons  structures,  insert  comments  (CV,  freeform  text),  GO  terms,  etc.  

INTRODUCTION

Page 4: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

DETAILS FROM OUR LAST UPDATE

•  ~  100  insJtuJons  worldwide    •  >  60  genomes  across  the  tree  of  life:    

•  from  plants  to  arthropods,  to  fungi,    to  fish  and  other  vertebrates  including  human,  bovine  ca]le,  and  dog  

PREVIOUSLY WE LEARNED 4

©BroadInsJtute.org    

Nature Rev Gen 2009

©alexanderwild.com

©alexanderwild.com

©outdooralabama.com

National Agricultural Library

Page 5: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

LESSONS WE HAVE LEARNED

What  we  have  learned:    •  CollaboraJve  work  disJlls  invaluable  knowledge  •  We  must  enforce  strict  rules  and  formats  •  We  must  evolve  with  the  data  •  A  li]le  training  goes  a  long  way  •  NGS  poses  addiJonal  challenges  

PREVIOUSLY WE LEARNED 5

Page 6: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

HIGHLIGHTED IMPROVEMENTSscalability

SCALABILITY 6

•  Easier  deployment,  more  detailed  documentaJon  

•  Supports  mulJple  organisms  per  server,  improved  comparaJve  tools  

•  Easier  to  query  the  data  and  build  extensions    •  More  flexible  user  interface  via  removable  side-­‐dock  with  customizable  tabs;  

be]er  search  funcJonality,  validaJon  checks,  and  ediJng  capability    •  Allows  larger  set  of  sequence  annotaJons  based  on  the  Sequence  Ontology  

•  Offers  fine-­‐grained  user  and  group  level  permissions  

Page 7: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

NEW APOLLO ARCHITECTUREsimpler, more flexible

ARCHITECTURE 7

Web-­‐based  client  +  annotaJon-­‐ediJng  engine  +  server-­‐side  data  service  

REST / JSON Websockets

Annotation Engine (Server)

Shiro

LDAP

OAuth

JBrowse Data Organism 2

Annotations

Security

Preferences

Organisms

Tracks

BAM BED VCF GFF3 BigWig

Annotators

Google Web Toolkit (GWT) / Bootstrap

JBrowse DOJO / jQuery JBrowse Data Organism 1

Load genomic evidence for selected organism

Single Data Store PostgreSQL, MySQL,

MongoDB, ElasticSearch

Apollo v2.0

Page 8: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

NEW APOLLO ARCHITECTUREsimpler, more flexible

ARCHITECTURE 8

REST / JSON Websockets

Annotation Engine (Server)

Shiro

LDAP

OAuth

JBrowse Data Organism 2

Annotations

Security

Preferences

Organisms

Tracks

BAM BED VCF GFF3 BigWig

Annotators

Google Web Toolkit (GWT) / Bootstrap

JBrowse DOJO / jQuery JBrowse Data Organism 1

Single Data Store PostgreSQL, MySQL,

MongoDB, ElasticSearch

Apollo v2.0

Single Data Store PostgreSQL, MySQL,

MongoDB, ElasticSearch

   

Grails controllers (J2EE servlet) route requests to the appropriate JBrowse data directory for a given organism NEW!

Load genomic evidence for selected organism

Page 9: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

NEW APOLLO ARCHITECTUREsimpler, more flexible

ARCHITECTURE 9

REST / JSON Websockets

Annotation Engine (Server)

Shiro

LDAP

OAuth

JBrowse Data Organism 2

Annotations

Security

Preferences

Organisms

Tracks

BAM BED VCF GFF3 BigWig

Annotators

Google Web Toolkit (GWT) / Bootstrap

JBrowse DOJO / jQuery JBrowse Data Organism 1

Single Data Store PostgreSQL, MySQL,

MongoDB, ElasticSearch

Apollo v2.0

Load genomic evidence for selected organism

Single Data Store PostgreSQL, MySQL,

MongoDB, ElasticSearch

A single, queryable datastore houses annotations NEW!

Apollo v2.0

Page 10: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

HIGHLIGHTED IMPROVEMENTSscalability

SCALABILITY 10

•  Improvements  to  architecture:  easier  deployment,  be]er  documentaJon  

•  Supports  mulJple  organisms  per  server,  improved  comparaJve  tools  

•  Easier  to  query  the  data  and  build  extensions    •  More  flexible  user  interface  via  removable  side-­‐dock  with  customizable  tabs;  

be]er  search  funcJonality,  validaJon  checks,  and  ediJng  capability    •  Allows  larger  set  of  sequence  annotaJons  based  on  the  Sequence  Ontology    •  Offers  fine-­‐grained  user  and  group  level  permissions  

Page 11: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

HIGHLIGHTED IMPROVEMENTSremovable side dock with customizable tabs

HIGHLIGHTED IMPROVEMENTS 11

Tracks Organism Users Groups Preferences Annotations Reference Sequence

Page 12: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

HIGHLIGHTED IMPROVEMENTSannotation details, exon boundaries, data export

HIGHLIGHTED IMPROVEMENTS 12

Annotations Reference Sequences

1 2 3

1

2

3

Page 13: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

HIGHLIGHTED IMPROVEMENTSvisible in the Apollo window

HIGHLIGHTED IMPROVEMENTS 13

AutomaJcally  calculates  upstream  and  downstream  acceptor  and  donor  sites.  

Page 14: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

OTHER IMPROVEMENTSbehind the scenes

OTHER IMPROVEMENTS 14

h]ps://github.com/GMOD/Apollo  

Page 15: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

APOLLOdemonstration

DEMO 15

See  Apollo  DemonstraJon  Video  at:  h]ps://youtu.be/VgPtAP_fvxY      

Page 16: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

COLLABORATIONSApollo is open-source and extensible

HIGHLIGHTED IMPROVEMENTS 16

The Genome Sequence Annotation Server (GenSAS) Annotate

Examples:    •  GenSAS    

whole-­‐genome  structural  annotaJon  pipeline.  

•  i5K  Workspace@NAL  space  to  display  and  share  genome  assemblies  &  gene  models,  and  conduct  manual  annotaJon  efforts.  

Apollo  users  can  add  so.ware  to  support  their  own  workflow.  

Page 17: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

FUTURE PLANScurrently working on

Footer 17

Page 18: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

JOIN US

Footer 18

h]p://GenomeArchitect.org/  

Nathan  Dunn    Apollo  Technical  Lead  

Please  bring  your  suggesJons,  requests,  and  contribuJons  to:  

Special  Thanks  to:  Stephen  Ficklin  

GenSAS,  Washington  State  University    

Deepak  Unni  Colin  Diesh  

Apollo  Developers,    University  of  Missouri  

Suzi  Lewis  Principal  InvesJgator  

BBOP  

Eric  Yao  JBrowse,  UC  Berkeley  

Page 19: Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

•  Berkeley  Bioinforma9cs  Open-­‐source  Projects  (BBOP),  Berkeley  Lab:  Web  Apollo  and  Gene  Ontology  teams.  Suzanna  E.  Lewis  (PI).  

•  §  Chris5ne  G.  Elsik  (PI).  University  of  Missouri.    

•  *  Ian  Holmes  (PI).  University  of  California  Berkeley.  

•  Arthropod  genomics  community:  i5K  Steering  Commi]ee  (esp.  Sue  Brown  (Kansas  State)),  Alexie  Papanicolaou  (UWS),  BGI,  Oliver  Niehuis  (1KITE  h]p://www.1kite.org/),  and  the  Honey  Bee  Genome  Sequencing  ConsorJum.  

•  Apollo  is  supported  by  NIH  grants  5R01GM080203  from  NIGMS,  and  5R01HG004483  from  NHGRI;  by  Contract  No.  60-­‐8260-­‐4-­‐005  from  the  NaJonal  Agricultural  Library  (NAL)  at  the  United  States  Department  of  Agriculture  (USDA);  and  by  the  Director,  Office  of  Science,  Office  of  Basic  Energy  Sciences,  of  the  U.S.  Department  of  Energy  under  Contract  No.  DE-­‐AC02-­‐05CH11231.  

•  Insect  images  used  with  permission:  h]p://AlexanderWild.com  

•  For  your  aAen9on,  thank  you!  Thank you. 19

Web  Apollo  

Nathan  Dunn  

Colin  Diesh  §  

Deepak  Unni  §    

 

Gene  Ontology  

Chris  Mungall  

Seth  Carbon  

Heiko  Dietze  

 

BBOP  

Web  Apollo:  h]p://GenomeArchitect.org    

i5K:  h]p://arthropodgenomes.org/wiki/i5K  

GO:  h]p://GeneOntology.org  

Thanks!  

NAL  at  USDA  

Monica  Poelchau  

Christopher  Childers  

Gary  Moore  

HGSC  at  BCM  

fringy  Richards  

Dan  Hughes  

Kim  Worley  

 

JBrowse          Eric  Yao  *