ensembl online training series 2016 › training › online › sites ›...
TRANSCRIPT
EBI is an Outstation of the European Molecular Biology Laboratory.
Ensembl online training series
2016
Helen SparrowEnsembl Outreach team
EMBL-EBI
Course Objectives
● What is Ensembl?● What types of data you can get in Ensembl● How to navigate the Ensembl browser website● Where to go for help and documentation
This webinar courseDate Webinar topic Instructor
24th March
Introduction to Ensembl Emily Perry
31st March
Ensembl genes Denise Carvalho-Silva
7th April Data export with BioMart Helen Sparrow
14th April
Variation data in Ensembl and the Ensembl VEP Denise Carvalho-Silva
21st April
Comparing genes and genomes with Ensembl Compara Helen Sparrow
28th April
Finding features that regulate genes – the Ensembl Regulatory Build
Emily Perry
5th May Uploading your data to Ensembl and advanced ways to access Ensembl data
Ben Moore
Questions?
• Use the Chat box in the webinar interface
• My Ensembl colleagues will respond
• There’s no threading so please start responses with @username
Emily Perry Denise Carvalho-Silva
Ben Moore
Structure
Presentation:How we produce/process the data
Demo:Viewing the data
Exercises:On the train online course
EBI is an Outstation of the European Molecular Biology Laboratory.
Module 5:Comparing genes and
genomes with
Ensembl Compara
Outline
• Comparative genomics: applications
• Protein alignments• Gene trees
• Homology predictions
• Whole genome alignments• pairwise
• multiple
• Shared synteny
Applications of comparative genomics
Comparative genomics allows us to understand:
• vertebrate evolution
• differences between species at the genome level
• gene function based on homology
• the distribution of highly conserved regions
Gene Level
● Protein alignment
● Protein/Gene Trees
● Homologues: Orthologues and Paralogues
● Pan-compara
● Gene families
Whole Genome
● Whole genome Alignments
● Syntenic Regions
Comparative Genomics in Ensembl
Gene Trees & Homologues
• Based on protein alignments
• Representative protein of each Ensembl gene
• Blast+
• Multiple protein alignment with M-coffee
• Build phylogenetic tree with TreeBeST
• Reconciliation with species tree (to infer ancestral nodes)
• Orthologue/Paralogue inference
http://www.ensembl.org/info/docs/compara/homology_method.html
all-vs-all blastp + hcluster_sg
Orthologues and Paralogues
Homology relationships
SpeciationDuplication
c1 h1 c2 h2
ParaloguesGenes emerged through a duplication eventc1 and c2h1 and h2
OrthologuesGenes emerged through a speciation eventc1 and h1h2 and mc2 and m
m
One-to-one One-to-many
Pan-taxonomic compara● Gene trees and homologous genes across a wider taxonomic
range of species
● An extended analysis including several vertebrates, protists, plants, bacteria, fungi, and invertebrate metazoa
10 Ensembl Vertebrates
9 Ensembl Plants
7 Ensembl Fungi
18 Ensembl Metazoa
14 Ensembl Protists
137 Ensembl Bacteria
http://ensemblgenomes.org/info/genomes?pan_compara=1
Homologues in BioMart
Dataset Genes
Filters Has
homologues in species
Attributes Homologue
ID, type, ancestor
Results table
In the gene tab:
● We analyse gene families using every Ensembl isoform
● We import additional Uniprot metazoa sequences
● Defined by an HMM library, based on Panther Database
Gene families
Hands on
• We’re going to look at the human BRCA2 gene to find homologues
• Search the ensembl.org homepage for BRCA2 and go to the gene tab
Whole genome alignments
• To identify highly conserved regions• sequences that evolve slowly
• regions likely to be functional
• both coding and non-coding sequences
• To spot trouble gene predictions
• To define syntenic regions
• Types: pairwise and multiple (specified groups)
Whole Genome Alignments
Pairwise alignments
• LASTZ-net
Multi-species Groups
• Pre-selected sets• EPO (Enredo-Pecan-Ortheus) analysis
• (11 fish, 7 sauropsids, 39 eutherian, 8 primates)
• Mercator-Pecan analysis• For 23 amniota vertebrates (mammals+birds)
http://www.ensembl.org/info/genome/compara/analyses.html#pecan
Constrained Elements
• GERP scores: for every nucleotide in a multi-species alignment we calculate how conserved it is
• Peaks show high sequence conservation
• Constrained elements - blocks of high sequence conservation
Shared synteny
http://www.ensembl.org/info/docs/compara/analyses.html
100 kb regions with high sequence conservation and gene order
Hands on
• We will look at a human genomic region 2:176087000-176202000 which contains the HoxD cluster to find alignments and conservation regions.
• The HoxD cluster is involved in limb development and is highly conserved between species.
Next webinar courseDate Webinar topic Instructor
24th March
Introduction to Ensembl Emily Perry
31st March
Ensembl genes Denise Carvalho-Silva
7th April Data export with BioMart Helen Sparrow
14th April
Variation data in Ensembl and the Ensembl VEP Denise Carvalho-Silva
21st April
Comparing genes and genomes with Ensembl Compara Helen Sparrow
28th April
Finding features that regulate genes – the Ensembl Regulatory Build
Emily Perry
5th May Uploading your data to Ensembl and advanced ways to access Ensembl data
Ben Moore
Next webinar – Finding features that regulate genes The Ensembl Regulatory Build
28th April, 4pm BST
The Ensembl Regulatory Build incorporates data from sources
including ENCODE, Roadmap Epigenomics and Blueprint to predict
the positions of features involved in regulating gene expression,
such as promoters and enhancers. Learn about how the build
works and how to find regulatory features on the genome.
Note that these data are currently only available for human and
mouse.
Course exerciseshttp://www.ebi.ac.uk/training/online/course/ensembl-browser-webinar-
series-2016
This text will be replaced by a YouTube (link to YouKu too) video of the webinar
and a pdf of the slides.
The “next page” will be the exercisesA link to exercises and
their solutions will appear in the page
hierarchy
Get help with the exercises
• Use the exercise solutions in the online course
• Join our Facebook group and discuss the exercises with everybody (see the online course for the link)
• Email us: [email protected]
Help and documentationCourse online http://www.ebi.ac.uk/training/online/subjects/11
Tutorials www.ensembl.org/info/website/tutorials
Flash animations
www.youtube.com/user/EnsemblHelpdesk
http://u.youku.com/Ensemblhelpdesk
Email us [email protected]
Ensembl public mailing lists [email protected], [email protected]
Publications
Yates, A. et al
Ensembl 2016
Nucleic Acids Research
http://europepmc.org/articles/4702834
Xosé M. Fernández-Suárez and Michael K. SchusterUsing the Ensembl Genome Server to Browse Genomic Sequence Data.Current Protocols in Bioinformatics 1.15.1-1.15.48 (2010)www.ncbi.nlm.nih.gov/pubmed/20521244
Giulietta M Spudich and Xosé M Fernández-SuárezTouring Ensembl: A practical guide to genome browsingBMC Genomics 11:295 (2010)www.biomedcentral.com/1471-2164/11/295
http://www.ensembl.org/info/about/publications.html