ebi is an outstation of the european molecular biology laboratory. every genome deserves a home dan...
TRANSCRIPT
![Page 1: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/1.jpg)
EBI is an Outstation of the European Molecular Biology Laboratory.
Every genome deserves a home
Dan LawsonEMBL-EBI
![Page 2: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/2.jpg)
Disclosure - my background
VectorBase http://www.vectorbase.org
• NIAID-funded Bioinformatic Resource Center focused on arthropod vectors of human pathogens
• Collaborates with sequencers and community on 1o annotation
• Community resource, ‘One stop shop’
Ensembl Genomes http://www.ensemblgenomes.org
• Extending Ensembl across taxonomic space
• 5 taxonomic portals to present genome assemblies and annotation
• Integrated resource for cross-species interrogation
![Page 3: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/3.jpg)
Find a home for every genome
![Page 4: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/4.jpg)
Every genome deserves a home
• Sequencing the genome of your favourite species is a beginning
• You will want to make your genome:
• Useful to your group/community
• Useful to other communities
• You will (hopefully) want to update/improve:
• Assembly (new sequencing technologies, mapping strategies)
• Gene predictions (new models, correct existing models, delete unsupported models)
• Gene annotation (add gene names/symbols, descriptions)
• Data richness (new high-throughput datasets, xrefs to relevant resources)
![Page 5: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/5.jpg)
Finding a home for every genome
• All genomes deserve a home
• Houses
• Apartments/Flats
• Dormitories/Barracks
Genomic information infrastructure after the delugeJulian Parkhill, Ewan Birney and Paul KerseyGenome Biology 2010, 11:402 http://genomebiology.com/2010/11/7/402
![Page 6: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/6.jpg)
Anatomy of a home
• Genome browser
• Similarity searches
• BLAST/BLAT
• Query tools
• Simple keyword
• Complex queries
• DownloadsSimilarity searches
Query tool
Downloads
Browser
Browser
Compara
![Page 7: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/7.jpg)
Finding a home
• Factors to take into account when choosing a home for your genome
• Required functionality
• Data access (Bulk download, tailored download, computational)
• Visualization (Genome browser)
• Search (Sequence based, simple keyword queries, complex queries)
• Extendability for new data types (e.g. NGS transcriptomics, variation)
• Resources required for maintenance
• Compute/servers
• Staff (with appropriate skills)
![Page 8: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/8.jpg)
Tier 2 databases: VectorBase
• One of 4 NIAID Bioinformatics Resource Centers
• Integrated genomic resource for arthropod vectors of human pathogens
• Collaboration of 3 European and 3 US Institutes
• VectorBase is:
• Both service provider and content generator
• A collator of genomic information
• A genome annotation group (gene structure prediction)
• A provider of tools for browsing and data mining vector genomes
• A helpdesk for community queries
• Responsible for data submissions to the public archival databases
• Committed to regular release cycles (5-6 releases per year)
![Page 9: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/9.jpg)
VectorBase highlights 2012
• Website orientated around data rather than species
• Consolidation of legacy sections
• Faceted universal search
• Scalable handling of:
• organism
• strain
• assembly
• gene set
• Ensembl genome browser
• Extensive user data upload facilities
• More species
• Community Annotation Portal overhaul
![Page 10: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/10.jpg)
Tier 3 databases: Ensembl Genomes
![Page 11: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/11.jpg)
Ensembl Genomes release 18 (http://metazoa.ensembl.org)
• 43 species
• Stakeholders:
• VectorBase
• FlyBase
• WormBase
• BeetleBase
• Hymenoptera Genome Database
• Other highlights
• Lepidoptera (3 spp. one to come)
• Sole location of a number of arthropod genomes
![Page 12: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/12.jpg)
Ensembl Genomes - home analogy
• Integration into the Ensembl relational database schema
• Genome browser
• Data centric views
• Downloads
• Similarity searches (Blast/Blat)
• Comparative analysis with other species
• Programmatic access (Perl API)
• BioMart query tool
• Data consistency across species
![Page 13: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/13.jpg)
Benefits of inclusion in Ensembl Genomes
• Integration with a wide range of other species
• Ability to include other data types
• Variation
• Functional genomics
• Alignments
• Community data sets (configuration of site)
• BAMs (RNA-seq, re-sequencing)
• VCFs (SNPs, CNVs)
• Wiggle plots for regulatory elements/ChiP-Seq etc.
• User addition of data sets (temporary visualization)
• Downstream usage by 3rd party tools/analyses
![Page 14: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/14.jpg)
Choosing a solution
• Look at existing solutions
“Off the shelf”
• Generic Model Organism Database project (http://www.gmod.org/wiki/Main_Page)
• Ensembl (http://www.ensembl.org)
“Roll your own”
• Content Management Systems (Drupal)
• Wikis (many flavours)
![Page 15: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/15.jpg)
Publicise your resource
• Meetings
• Mailing lists
• Publication
• NAR Database issue
• a little bit of SEO
• Google/Bing etc.
• Social media
![Page 16: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/16.jpg)
Make your data available in common formats
Just as we use a lingua franca to communicate between nationalities we use the same in sharing data
Sequences
• Fasta format
• http://www.ebi.ac.uk/help/formats.html
Assembly
• AGP (Golden Path)
• GenBank http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/AGP_Specification.shtml
Annotation
• GFF3 (Gene Feature Format v3)
• Sequence Ontology http://www.sequenceontology.org/gff3.shtml
![Page 17: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/17.jpg)
Bulk downloads are not an afterthought...
• The provision of data as bulk downloads should not be an afterthought for your project
• Make data available in common formats
• Be responsive to community needs (in terms of alternative formats, other data types)
• Run quality assurance over the download files
• Completeness
• Within files
• Across files
• ‘Round trip’ data where possible - “I have a dream”
![Page 18: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/18.jpg)
but by far the most important thing is
Submission to the public archival databases
![Page 19: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/19.jpg)
Why submit to the public archival databases?
• Visability
• Integration with the widest possible community
• xrefs back to your resource
• Longevity
• Funding for INSDC is always going to be more secure than your database
• Accreditation
• Publication
• Many funders and journals require submission prior to publication
• NCBI/EBI/UCSC Browser agreement
• Only assemblies submitted to INSDC can be visualised through these resources
Personally - I don’t consider a genome to be in the public domain until it has been submitted to INSDC
![Page 20: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/20.jpg)
Submission makes you do a number of things
Requirement to conform to standards
• Some are mandatory, some advisory
• Opportunity to capture metadata
• Minimum information about a genome sequence (MIGS)
Encourages good practice
Explicit nomenclature and versioning
• Caveat that you need to make updates!
![Page 21: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/21.jpg)
GenBank nomenclature
• BioProject accessions
• WGS accessions
• Assembly accessions
![Page 22: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/22.jpg)
i5k BioProject at INSDC
• We encourage communities to submit data to the appropriate public archival database (GenBank/ENA/DDBJ), Short Read Archive (SRA) etc.
• We encourage you to join us and add your project when submitting data to INSDC
• http://www.ncbi.nlm.nih.gov/bioproject/163993
![Page 23: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/23.jpg)
Encourage collaboration
• “Many cooks spoil the broth” v “Many hands make light work”
• Send your genome to school to learn
• Encourage collaboration within your community
• Encourage the next generation of researchers
• Don’t be afraid to ask “experts” for specific help
• Fort Lauderdale agreement
• Outcome from a 2003 meeting
• Sequencing group reserves right to publish
• Strike a balance between fair use (i.e. no pre-emptive publication) and early disclosure.
![Page 24: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/24.jpg)
http://arthropodgenomes.org
![Page 25: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/25.jpg)
arthropodgenomes.org
• > 600 registered users from 178 institutes worldwide
• 30 community resources/databases
• ≅800 species nominated by individuals, consortia, museums or societies
![Page 26: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/26.jpg)
Built around Person & Organism pages
![Page 27: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/27.jpg)
Stakeholders - Databases
• Outreach opportunity
• Includes species (living in this home)
• Contact details for the project
• Contact details for the developers
• References
![Page 28: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/28.jpg)
Stakeholders - Resources
• Outreach opportunity
• Includes species (living in this home)
• Contact details for the project
• Contact details for the developers
• References
![Page 29: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/29.jpg)
Encourage collaboration
![Page 30: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/30.jpg)
Finding “experts” from outside your community
Genome papers, supplemental data
![Page 31: EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI](https://reader036.vdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15da2/html5/thumbnails/31.jpg)
Future challenges
• Scaling bioinformatics infrastructure to deal with 1000s of genomes
• Centralised or federated models
• Democratisation of genome analysis
• “Best practices” for genome assembly & annotation
• Metrics for assessing genome assemblies and annotations
• e.g. Assemblathon (http://assemblathon.org)
• Facilitating and improving community involvement in genome projects
• e.g. VectorBase Community Annotation Portal (CAP), WebApollo.