ensembl training materials are protected by a cc by ...€¦ · go terms are hierarchical...

39
http://tinyurl.com/VEPCrete Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publications.html

Upload: others

Post on 16-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Training materials

• Ensembl training materials are protected by a CC BY license • http://creativecommons.org/licenses/by/4.0/• If you wish to re-use these materials, please credit Ensembl for

their creation• If you use Ensembl for your work, please cite our papers • http://www.ensembl.org/info/about/publications.html

Page 2: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

Helen Sparrow

Ensembl Outreach

EMBL-EBI

Exploring Genes and Variants with Ensembl

Page 3: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Today11.00 - 11.45

• Introduction to Genome Browsers

• Genes and Transcripts

• Variation

14.00 - 15.30 or 16.00 - 17.30• Browsing

• Genes and Transcripts

• Variants

• Assembly Converter

Page 4: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Tomorrow: 11 - 11.45

• Variant Effect Predictor

The VEP determines the effects of your variants

(SNPs, insertions, deletions, CNVs or structural

variants) on genes, transcripts, and protein

sequence, as well as regulatory regions.

Page 5: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Objectives

• What is Ensembl?

• What type of data can you get in Ensembl?

• How to navigate the Ensembl browser website

• How to use Ensembl tools

• Where to go for help and documentation

Page 6: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Course materials

tinyurl.com/VEPCrete

• Presentation

• Coursebook (screenshots of demo)

• Exercises and Answers

Page 7: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Introduction

Why do we need genome browsers?

1977: 1st genome to be sequenced (5 kb)

2004: ‘Finished’ human sequence (3 Gb)

2015: Completion of 1000 Genomes project

Page 8: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Why do we need Genome Browsers?CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCATCTGAAATTTCTTGGAAACACGATCACTTTAACGGAATATTGCTGTTTTGGGGAAGTGTTTTACAGCTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACTAAGAATTTAAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTACACGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACATAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCATATTATCACAACATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCATTAAACTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACTAAGAATTTAAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTACACGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACATAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCATATTATCACAACATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCAGATAAATTGCATTAAACTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCAGATAAATTGCATTAAACTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAG

● Buried in the genome sequence is information about:○ Function- such as transcripts and proteins○ Expression- regulatory elements○ Variation- SNPs, structural variants

Page 9: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Why do we need Genome Browsers?CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCATCTGAAATTTCTTGGAAACACGATCACTTTAACGGAATATTGCTGTTTTGGGGAAGTGTTTTACAGCTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACTAAGAATTTAAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTACACGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACATAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCATATTATCACAACATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCATTAAACTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACTAAGAATTTAAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTACACGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACATAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCATATTATCACAACATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCAGATAAATTGCATTAAACTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCAGATAAATTGCATTAAACTATTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAATTATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTATTAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTCCGAAG

http://ensembl.org http://www.ncbi.nlm.nih.gov/mapview

http://genome.ucsc.edu

Page 10: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

● Import Genome Assemblies

● Annotate genomes

- Genes and transcripts

- Variants

- Regulatory features

- Comparative Genomics

● Display and export

● Tools

Ensembl - more than a browser

Page 11: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

• Gene builds for ~70 species

• Gene trees

• Regulatory build (ENCODE)

• Variation display and VEP

• Display of user data

• BioMart (data export)

• Programmatic access via the APIs

• Completely Open Source

Ensembl Features

Page 12: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

GRCh38

● www.ensembl.org (UCSC equivalent is hg38)

● Current, supported

GRCh37 250 gaps

● http://grch37.ensembl.org/ (UCSC hg19)

● Limited data updates

NCBI36 150,000 gaps

● http://may2009.archive.ensembl.org/ (UCSC hg18)● No updates

Reference Genome Assemblies

Page 13: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Reference Genome contigs

CM

IM

AL

BL

BL102

AL476

CM

553IM

768

Page 14: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

EBI is an Outstation of the European Molecular Biology Laboratory.

Genes and Transcripts

Page 15: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCretehttp://tinyurl.com/VEPCrete

Ensembl and Havana annotation

Automatic annotation Manual annotation

Page 16: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Automatic gene annotation

• Genome-wide determination using the Ensembl automated pipeline

• Predictions based on experimental (biological) data

• Predictions based on the genomic sequence (ab initio)

Page 17: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Transcript annotation

Page 18: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Other species

• Infer genes from homology to other species• Eg predict genes in by mapping cDNAs/proteins

from to the genome

• RNAseq data

Page 19: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Manual gene annotation

• Gene determination on a case-by-case basis by a person

• Genome-wide

• Genes list

• vega.sanger.ac.uk

Page 20: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

GENCODE

• The GENCODE gene set is made up of:

• Ensembl automatically annotated genes

• Havana manually annotated genes

• The merged gene set

• default gene set for

• ENCODE

• 1000 genomes

• and lots of other major projects

Page 21: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Merged ‘Golden’ transcripts• Identical annotation

• Higher confidence and quality

CCDS transcripts• Consensus coding DNA sequence set

• Agreement between EBI, WTSI, UCSC and NCBI

Page 22: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Transcript views

Page 23: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Ensembl stable IDs

• ENSG########### Ensembl Gene ID

• ENST########### Ensembl Transcript ID

• ENSP########### Ensembl Peptide ID

• ENSE########### Ensembl Exon ID

• For non-human species a suffix is added:

MUS (Mus musculus) for mouse ENSMUSG###

DAR (Danio rerio) for zebrafish: ENSDARG###

http://www.ensembl.org/info/genome/stable_ids/index.html

Page 24: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

Why Gene Ontology (GO)?

Innate immunity

Non-specific immunity

Phagocyte

Complement Cytokines Natural killer cells

Multiple terms for the same thing

Gene descriptions too specific

Mast cells

Page 25: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

GO terms form a controlled vocabulary

GO:0045087 - innate immune responseInnate immune responses are defense responses mediated by germline encoded components that directly recognise components of potential pathogens.

Page 26: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

GO terms are hierarchical

GO:0045087innate immune response

GO:0006955immune response

GO:0006957complement activation,

alternative pathway

GO:0001867complement activation,

lectin pathway

GO:0009814defence response,

incompatible interaction

GO:0042381hemolymph coagulation

GO:0009682induced systemic

resistance

GO:0002227innate immune response

in mucosa

GO:0035420MAPK cascade involved in innate immune response

GO:0035006melanisation defence

response

GO:0002228natural killer cell

mediated immunity

GO:0045824negative reg of innate

immune response

GO:0009626plant-type

hypersensitive response

GO:0045089positive reg of innate

immune response

GO:0045088regulation of innate immune response

GO:0034341response to

interferon-gamma

GO:0034340response to type I

interferon

GO:0034342response to type II

interferon

GO:0009616virus induced gene

silencing

Page 27: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

Questions?

Page 28: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

EBI is an Outstation of the European Molecular Biology Laboratory.

Variation

Page 29: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Variant sources

http://www.ensembl.org/info/genome/variation/sources_documentation.html#homo_sapiens

Page 30: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Variant types

1) Small scale in one or few nucleotides of a gene

• Small deletions and insertions (DIPs or indels)

• Single nucleotide polymorphism (SNP)

A G A C T T G A C C T G T C T - A A C T G G AT G A C T T G A C - T G T C T G A A C G G G A

2) Large scale (>50bp) in chromosomal structure (structural variant)

• Copy number variants (CNV)

• Large deletions/duplications, insertions, translocations

deletion duplication insertion translocation

Page 31: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Variant consequences

ATG AAAAAAA

Regulatory

3’ UTRIntronic

CODINGMissense

CODINGSynonymous

Splice site5’ Upstream 5’ UTR 3’ Downstream

Page 32: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

http://www.ensembl.org/info/docs/variation/predicted_data.html

SO consequence terms

Page 33: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Reference alleles

IM

CM

AL

BL

BL102

AL476

CM

553IM

768

BL102

AGTCGTAGCTAGCTAGGCCATAGGCGA

Frequency T = 0.05, frequency G = 0.95G is the allele in all primatesT causes disease susceptibility

T is allele in the contig usedso T is the reference alleleand G is the alternate alleleand alleles are T/G

Page 34: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

http://tinyurl.com/VEPCrete

Allele strandAGTCGTAGCTAGCT/GAGGCCATAGGCGA

TCGCCTATGGCCTA/CGCTAGCTACGACT

Exon sequence:TATGGCCTA/CGCTAGC

Alleles in database = T/GAlleles in gene = A/C

Alleles = A/C -ve strand or T/G +ve strand

Alleles = A/C or T/GOften lack further info

Page 35: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

Questions?

Page 36: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

Help and documentationCourse online http://www.ebi.ac.uk/training/online/subjects/11

Tutorials www.ensembl.org/info/website/tutorials

Videos

www.youtube.com/user/EnsemblHelpdesk

Email us [email protected]

Page 37: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

• Invite one of our outreach team to teach at your institution for free (except trainer’s expenses)

• E-mail us: [email protected]

Browser Course

½-2 day course on the Ensembl browser, aimed at wet-lab scientists. 1-2 trainers.

API course

2-4 day course on the Ensembl APIs (Perl or REST) aimed at bioinformaticians. 1-4 trainers.

Host a FREE Workshop!

Page 38: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

AcknowledgementsThe Entire Ensembl Team

Funding

Co-funded by the European Union

Page 39: Ensembl training materials are protected by a CC BY ...€¦ · GO terms are hierarchical GO:0045087 innate immune response GO:0006955 immune response GO:0006957 complement activation,

Training materials

• Ensembl training materials are protected by a CC BY license

• http://creativecommons.org/licenses/by/4.0/• If you wish to re-use these materials, please

credit Ensembl for their creation• If you use Ensembl for your work, please cite our

papers • http://www.ensembl.org/info/about/publication

s.html