introduction to genomes with ensemblensembl paul flicek (ebi), steve searle (wellcome trust sanger...
TRANSCRIPT
![Page 1: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/1.jpg)
1 of 24
Dr. Giulietta M. Spudich
Ensembl Outreach Team
Introduction to Genomes
with Ensembl
![Page 2: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/2.jpg)
2 of 31
Objectives
What information about a gene can I find?
What about a region of the genome?
How do I navigate the data?
![Page 3: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/3.jpg)
Introduction
1977: 1st genome to be sequenced (5 kb) 2004: finished human sequence (3 gb)
Large amounts of raw DNA sequence data
![Page 4: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/4.jpg)
Fragment
BAC clones
Sequence
Contigs
Assemble
Scaffolds
Assemble
Genome Sequencing
![Page 5: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/5.jpg)
CGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAG
CTTACTCCGGCCAAAAAAGAACTGCACCTCTGGAGCGGACTTATTTACCAAGCA
TTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATT
GCACTGCTGCGCCTCTGCTGCGCCTCGGGTGTCTTTTGCGGCGGTGGGTCGC
CGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAG
CTTACTCCGGCCAAAAAAGAACTGCACCTCTGGAGCGGACTTATTTACCAAGCA
TTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATT
TTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGATTTAGGACCAATAAGTCTT
AATTGGTTTGAAGAACTTTCTTCAGAAGCTCCACCCTATAATTCTGAACCTGCAG
ACTAAAATGGATCAAGCAGATGATGTTTCCTGTCCACTTCTAAATTCTTGTCTTAG
AAGAATCTGAACATAAAAACAACAATTACGAACCAAACCTATTTAAAACTCCACAA
AGGAAACCATCTTATAATCAGCTGGCTTCAACTCCAATAATATTCAAAGAGCAAG
GGCTGACTCTGCCGCTGTACCAATCTCCTGTAAAAGAATTAGATAAATTCAAATT
AGACTTAGGAAGGAATGTTCCCAATAGTAGACTAAAAGTCTTCGCACAGTGAAAT
CGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAG
CTTACTCCGGCCAAAAAAGAACTGCACCTCTGGAGCGGACTTATTTACCAAGCA
TTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATT
ACTAAAATGGATCAAGCAGATGATGTTTCCTGTCCACTTCTAAATTCTTGTCTTAG
AATTGGTTTGAAGAACTTTCTTCAGAAGCTCCACCCTATAATTCTGAACCTGCAG
TGAAAGTCCTGTTGTTCTACAATGTACACATGTAACACCACAAAGAGATAAGTCA
Genome sequence
![Page 6: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/6.jpg)
21 May 2012 6
The Ensembl genome browser:
making it interesting
Regulation
Gene
Allele
Conserved
sequence
Figure adapted from the ENCODE project www.nature.com/nature/focus/encode/
• Splice variants, proteins, non-coding RNA
• Small and large scale sequence variation, phenotype associations
• Whole genome alignments, protein trees
• Potential promoters and enhancers, DNA methylation
• User upload, custom data
![Page 7: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/7.jpg)
7 of 31
Genome Browsers
• Ensembl Genome Browsers
http://www.ensemblgenomes.org
• NCBI Map Viewer
http://www.ncbi.nlm.nih.gov/mapview/
• UCSC Genome Browser
http://genome.ucsc.edu
![Page 8: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/8.jpg)
Ensembl is Used Worldwide
8 of 31
Top users:
UK
US
Canada
China
France
Germany
Italy
Japan
Spain
![Page 9: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/9.jpg)
Data Volume Challenge
• UniProtKB/Swiss-Prot (reviewed)
536,029 (25,871 human) protein sequences
• UniProtKB/TrEMBL
22,128,511 (217,918)
9 of 24 www.uniprot.org
NCBI RefSeq (reviewed)
15,744,232 (24,539) NP_006570
NM_006579
Q8IU82
![Page 10: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/10.jpg)
10 of 31
A consensus set of protein coding
sequences
• Reaching a consensus coding
sequence set for human and mouse.
• 26,473 (human)
22,187 (mouse) (*as of Sept 2011)
• If you see a “CCDS ID”, the coding
sequence is agreed upon.
Genome Res. 2009 Jul;19(7):1316-23. Epub 2009 Jun 4
![Page 11: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/11.jpg)
11 of 31
What are the gold transcripts?
UTR Coding Intron
![Page 12: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/12.jpg)
12 of 31
VEGA/Havana
(human, mouse, z-fish)
• Automatic annotation pipeline: Gene
building all at once (whole genome)
Ensembl
• Manual curation: reviewed by experts
VEGA: Vertebrate Genome Annotation
Havana
![Page 13: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/13.jpg)
13 of 31
Genes and Transcripts in Ensembl
High Quality:
• CCDS transcripts
• Ensembl/Havana merged (gold)
transcripts
![Page 14: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/14.jpg)
14 of 31
Ensembl/Havana
• Transcripts are from:
Ensembl
Havana
Ensembl/Havana
Ensembl (20_)
Havana (00_)
Both (“gold”)
Havana (00_)
![Page 15: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/15.jpg)
15 of 31
Gene Names in Ensembl
• ENSG### Ensembl Gene ID
• ENST### Ensembl Transcript ID
• ENSP### Ensembl Peptide ID
• ENSE### Ensembl Exon ID
• For non-human species a suffix is added:
MUS for M. musculus ENSMUSG###
DAR (Danio rerio) for zebrafish: ENSDARG###
![Page 16: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/16.jpg)
16 of 31
Ensembl Features
• The gene set.
• Comparative analysis
• Variation and regulation
• BioMart (data export)
• Display of external data (DAS)
• Programmatic access via the Perl API
• Open Source
![Page 17: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/17.jpg)
17 of 31
Objectives
What information about a gene can I find?
What about a region of the genome?
How do I navigate the data?
See our coursebook for walk-throughs and
exercises using our browser:
http://www.ensembl.org/info/website/tutorials/coursebook.pdf
![Page 18: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/18.jpg)
• Nucleotide level
• Single nucleotide polymorphism (SNP)
• Small insertions and deletions (InDels)
• Microsatellites (short tandem repeats)
• Structural
• Copy number variations (CNV)
• Large insertions and deletions
Variation
![Page 19: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/19.jpg)
Sequence displays
Gene: Sequence
Transcript: Exons
Transcript:cDNA
![Page 20: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/20.jpg)
Comparative Genomics
69 species in e!67
![Page 21: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/21.jpg)
Ensembl tools
![Page 22: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/22.jpg)
Phenotype for a gene
![Page 23: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/23.jpg)
23 of 31
How is all this information
organised?
• Ensembl Views (Website)
• Ensembl Database (open source)
• BioMart „DataMining tool‟
![Page 24: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/24.jpg)
Help and documentation
• Comments and questions?
• Mailing lists [email protected], [email protected]
• Course online www.ensembl.info/ecourse
• Our tutorials page www.ensembl.org/info/website/tutorials
• YouTube channel www.youtube.com/user/EnsemblHelpdesk
![Page 25: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/25.jpg)
Follow us
• Facebook www.facebook.com/Ensembl.org
• Twitter https://twitter.com/Ensembl
• Come visit our blog! www.ensembl.info
![Page 26: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/26.jpg)
Publications
• Flicek, P. et. al.
Ensembl 2012
Nucleic Acids Res 40:D84-90 (2012)
http://nar.oxfordjournals.org/content/40/D1/D84.long
• Xosé M. Fernández-Suárez and Michael K. Schuster Using the Ensembl Genome Server to Browse Genomic Sequence Data. Current Protocols in Bioinformatics 1.15.1-1.15.48 (2010) www.ncbi.nlm.nih.gov/pubmed/20521244
• Giulietta M Spudich and Xosé M Fernández-Suárez Touring Ensembl: A practical guide to genome browsing BMC Genomics 11:295 (2010) www.biomedcentral.com/1471-2164/11/295
http://www.ensembl.org/info/about/publications.html
![Page 27: Introduction to Genomes with EnsemblEnsembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute) Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella,](https://reader034.vdocuments.net/reader034/viewer/2022042605/5f39b47079011444ac1a7053/html5/thumbnails/27.jpg)
Ensembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute)
Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella, Thomas Maurel, Kieron Taylor
Comparative Genomics
Javier Herrero, Kathryn Beal, Stephen Fitzgerald, Leo Gordon, Matthieu Muffato, Miguel Pignatelli
Regulation Ian Dunham, Ikhlak Ahmed, Nathan Johnson, Thomas Juettemann, Steven Wilder
Variation Fiona Cunningham, Laurent Gil, Sarah Hunt, Will McLaren, Graham Ritchie, Anja Thormann
Analysis and Annotation
Bronwen Aken, Amonida Zadissa, Dan Barrell, Susan Fairley, Carlos Garcίa Girón, Thibaut Hourlier, Andreas Kähäri, Rishi Nag, Magali Ruffier, Simon White
Web Team Anne Parker, Ridwan Amode, Simon Brent, Bethan Pritchard, Harpreet Riat, Dan Sheppard, Steve Trevanion
Outreach Giulietta M. Spudich, Jeff Almeida-King, Denise Carvalho-Silva, Bert Overduin, Michael Schuster
Ensembl Genomes
Paul Kersey, Paul Derwent, Jay Humphrey, Arnaud Kerhornou, Eugene Kulesha, Nick Langridge, Uma Maheswari, Mark McDowall, Michael Nuhn, Helder Pedro, Claudia Rato da Silva, Dan Staines, Iliana Toneva
Ensembl Strategy
Ewan Birney, Richard Durbin, Paul Flicek, Jen Harrow, Tim Hubbard, Glenn Proctor, Steve Searle
Ensembl Team