introduction to genomes with ensembl - tufts...

27
Dr. Giulietta M. Spudich Ensembl Outreach Team Introduction to Genomes with Ensembl

Upload: others

Post on 02-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

1 of 24

Dr. Giulietta M. Spudich

Ensembl Outreach Team

Introduction to Genomes

with Ensembl

Page 2: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

2 of 31

Objectives

What information about a gene can I find?

What about a region of the genome?

How do I navigate the data?

Page 3: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Introduction

1977: 1st genome to be sequenced (5 kb) 2004: finished human sequence (3 gb)

Large amounts of raw DNA sequence data

Page 4: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Fragment

BAC clones

Sequence

Contigs

Assemble

Scaffolds

Assemble

Genome Sequencing

Page 5: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

CGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAG

CTTACTCCGGCCAAAAAAGAACTGCACCTCTGGAGCGGACTTATTTACCAAGCA

TTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATT

GCACTGCTGCGCCTCTGCTGCGCCTCGGGTGTCTTTTGCGGCGGTGGGTCGC

CGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAG

CTTACTCCGGCCAAAAAAGAACTGCACCTCTGGAGCGGACTTATTTACCAAGCA

TTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATT

TTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGATTTAGGACCAATAAGTCTT

AATTGGTTTGAAGAACTTTCTTCAGAAGCTCCACCCTATAATTCTGAACCTGCAG

ACTAAAATGGATCAAGCAGATGATGTTTCCTGTCCACTTCTAAATTCTTGTCTTAG

AAGAATCTGAACATAAAAACAACAATTACGAACCAAACCTATTTAAAACTCCACAA

AGGAAACCATCTTATAATCAGCTGGCTTCAACTCCAATAATATTCAAAGAGCAAG

GGCTGACTCTGCCGCTGTACCAATCTCCTGTAAAAGAATTAGATAAATTCAAATT

AGACTTAGGAAGGAATGTTCCCAATAGTAGACTAAAAGTCTTCGCACAGTGAAAT

CGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAG

CTTACTCCGGCCAAAAAAGAACTGCACCTCTGGAGCGGACTTATTTACCAAGCA

TTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATT

ACTAAAATGGATCAAGCAGATGATGTTTCCTGTCCACTTCTAAATTCTTGTCTTAG

AATTGGTTTGAAGAACTTTCTTCAGAAGCTCCACCCTATAATTCTGAACCTGCAG

TGAAAGTCCTGTTGTTCTACAATGTACACATGTAACACCACAAAGAGATAAGTCA

Genome sequence

Page 6: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

21 May 2012 6

The Ensembl genome browser:

making it interesting

Regulation

Gene

Allele

Conserved

sequence

Figure adapted from the ENCODE project www.nature.com/nature/focus/encode/

• Splice variants, proteins, non-coding RNA

• Small and large scale sequence variation, phenotype associations

• Whole genome alignments, protein trees

• Potential promoters and enhancers, DNA methylation

• User upload, custom data

Page 7: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

7 of 31

Genome Browsers

• Ensembl Genome Browsers

http://www.ensemblgenomes.org

• NCBI Map Viewer

http://www.ncbi.nlm.nih.gov/mapview/

• UCSC Genome Browser

http://genome.ucsc.edu

Page 8: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Ensembl is Used Worldwide

8 of 31

Top users:

UK

US

Canada

China

France

Germany

Italy

Japan

Spain

Page 9: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Data Volume Challenge

• UniProtKB/Swiss-Prot (reviewed)

536,029 (25,871 human) protein sequences

• UniProtKB/TrEMBL

22,128,511 (217,918)

9 of 24 www.uniprot.org

NCBI RefSeq (reviewed)

15,744,232 (24,539) NP_006570

NM_006579

Q8IU82

Page 10: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

10 of 31

A consensus set of protein coding

sequences

• Reaching a consensus coding

sequence set for human and mouse.

• 26,473 (human)

22,187 (mouse) (*as of Sept 2011)

• If you see a “CCDS ID”, the coding

sequence is agreed upon.

Genome Res. 2009 Jul;19(7):1316-23. Epub 2009 Jun 4

Page 11: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

11 of 31

What are the gold transcripts?

UTR Coding Intron

Page 12: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

12 of 31

VEGA/Havana

(human, mouse, z-fish)

• Automatic annotation pipeline: Gene

building all at once (whole genome)

Ensembl

• Manual curation: reviewed by experts

VEGA: Vertebrate Genome Annotation

Havana

Page 13: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

13 of 31

Genes and Transcripts in Ensembl

High Quality:

• CCDS transcripts

• Ensembl/Havana merged (gold)

transcripts

Page 14: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

14 of 31

Ensembl/Havana

• Transcripts are from:

Ensembl

Havana

Ensembl/Havana

Ensembl (20_)

Havana (00_)

Both (“gold”)

Havana (00_)

Page 15: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

15 of 31

Gene Names in Ensembl

• ENSG### Ensembl Gene ID

• ENST### Ensembl Transcript ID

• ENSP### Ensembl Peptide ID

• ENSE### Ensembl Exon ID

• For non-human species a suffix is added:

MUS for M. musculus ENSMUSG###

DAR (Danio rerio) for zebrafish: ENSDARG###

Page 16: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

16 of 31

Ensembl Features

• The gene set.

• Comparative analysis

• Variation and regulation

• BioMart (data export)

• Display of external data (DAS)

• Programmatic access via the Perl API

• Open Source

Page 17: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

17 of 31

Objectives

What information about a gene can I find?

What about a region of the genome?

How do I navigate the data?

See our coursebook for walk-throughs and

exercises using our browser:

http://www.ensembl.org/info/website/tutorials/coursebook.pdf

Page 18: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

• Nucleotide level

• Single nucleotide polymorphism (SNP)

• Small insertions and deletions (InDels)

• Microsatellites (short tandem repeats)

• Structural

• Copy number variations (CNV)

• Large insertions and deletions

Variation

Page 19: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Sequence displays

Gene: Sequence

Transcript: Exons

Transcript:cDNA

Page 20: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Comparative Genomics

69 species in e!67

Page 21: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Ensembl tools

Page 22: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Phenotype for a gene

Page 23: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

23 of 31

How is all this information

organised?

• Ensembl Views (Website)

• Ensembl Database (open source)

• BioMart „DataMining tool‟

Page 24: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Help and documentation

• Comments and questions?

[email protected]

• Mailing lists [email protected], [email protected]

• Course online www.ensembl.info/ecourse

• Our tutorials page www.ensembl.org/info/website/tutorials

• YouTube channel www.youtube.com/user/EnsemblHelpdesk

Page 25: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Follow us

• Facebook www.facebook.com/Ensembl.org

• Twitter https://twitter.com/Ensembl

• Come visit our blog! www.ensembl.info

Page 26: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Publications

• Flicek, P. et. al.

Ensembl 2012

Nucleic Acids Res 40:D84-90 (2012)

http://nar.oxfordjournals.org/content/40/D1/D84.long

• Xosé M. Fernández-Suárez and Michael K. Schuster Using the Ensembl Genome Server to Browse Genomic Sequence Data. Current Protocols in Bioinformatics 1.15.1-1.15.48 (2010) www.ncbi.nlm.nih.gov/pubmed/20521244

• Giulietta M Spudich and Xosé M Fernández-Suárez Touring Ensembl: A practical guide to genome browsing BMC Genomics 11:295 (2010) www.biomedcentral.com/1471-2164/11/295

http://www.ensembl.org/info/about/publications.html

Page 27: Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Ensembl Paul Flicek (EBI), Steve Searle (Wellcome Trust Sanger Institute)

Software Andy Yates, Stephen Keenan, Monika Komorowska, Rhoda Kinsella, Thomas Maurel, Kieron Taylor

Comparative Genomics

Javier Herrero, Kathryn Beal, Stephen Fitzgerald, Leo Gordon, Matthieu Muffato, Miguel Pignatelli

Regulation Ian Dunham, Ikhlak Ahmed, Nathan Johnson, Thomas Juettemann, Steven Wilder

Variation Fiona Cunningham, Laurent Gil, Sarah Hunt, Will McLaren, Graham Ritchie, Anja Thormann

Analysis and Annotation

Bronwen Aken, Amonida Zadissa, Dan Barrell, Susan Fairley, Carlos Garcίa Girón, Thibaut Hourlier, Andreas Kähäri, Rishi Nag, Magali Ruffier, Simon White

Web Team Anne Parker, Ridwan Amode, Simon Brent, Bethan Pritchard, Harpreet Riat, Dan Sheppard, Steve Trevanion

Outreach Giulietta M. Spudich, Jeff Almeida-King, Denise Carvalho-Silva, Bert Overduin, Michael Schuster

Ensembl Genomes

Paul Kersey, Paul Derwent, Jay Humphrey, Arnaud Kerhornou, Eugene Kulesha, Nick Langridge, Uma Maheswari, Mark McDowall, Michael Nuhn, Helder Pedro, Claudia Rato da Silva, Dan Staines, Iliana Toneva

Ensembl Strategy

Ewan Birney, Richard Durbin, Paul Flicek, Jen Harrow, Tim Hubbard, Glenn Proctor, Steve Searle

Ensembl Team