ensembl: a genomic toolset for pigs, poultry, …...ensembl: a genomic toolset for pigs, poultry,...

54
Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey

Upload: others

Post on 27-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators

Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators

Paul Kersey

Page 2: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.20152

A brief history of genome sequencing

• 1995 Haemophilus influenzae 1.8 Mb

• 1996 Saccharomyces cerevisiae 12 Mb

• 1999 Drosophila melanogaster 140 Mb

• 2001 Homo sapiens 3.1 Gb

• Sequencing technology is continuously improving, but (massively parallel) “next generation” techniques really were game-changers

Page 3: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

Cost of Sequencing a Human Genome 2001-2013

Page 4: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.20154

A brief history of genome sequencing

• 2008-2015 1000 genomes project (2500 human genomes)

• 2008-2015 1001 genomes project (1,0001 Arabidopsis genomes)

• 2015-2019 Genomics England (100,000 human genomes)

Page 5: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.20155

What can we do with thousands of genome sequences?• Statistical association of traits with markers

• Increased marker resolution to find causative variants

• Understand population structure and evolutionary processes

• Track epidemics

• Assay for known variation

• Environmental distribution

• Tool for managing crosses

• More genomes…

• More statistical power, find rarer causative alleles

Page 6: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.20156

Thousands of genomes – a tool for breeding

• Characterize germplasm of land races and wild relatives

• Understand what’s actually present in an existing line

• Find alleles associated with traits

• Combine genotyping with various (laboratory, greenhouse, field) phenotyping mechanisms, themselves increasingly automated and high-throughput

• Manage crosses

Page 7: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.20157

Everyone can do their own experiments, but…

• EMBL-EBI would like to maintain a cataologue of reference genomes and variants for all majorly studied species

• Selected lines can be re-phenotyped and analysed against the same reference data

• One major challenge: organising the pan-genome

• No single genome is enough to serve as a reference for many species

• Variants, functional elements present in some strains but not in the reference

• Reference is still a useful concept: but needs to be extended –“choose your own reference” according to need

Page 8: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.20158

Phenotyping data

• Immensely varied

• Dependent on an environment (GxPxE)

• Anything you can measure – from molecular assays to in-field imaging

• Increasing use of structured controlled vocabularies for human readable, inter-operable data summaries

• Meta data is critical

• What has been assayed?

• Where was it assayed?

• How has it been assayed?

Page 9: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.20159

The EBI mission• EMBL-EBI provides freely available data from life

science experiments, performs basic research in computational biology and offers an extensive user training programme, supporting researchers in academia and industry.

• We also coordinated the ELIXIR pilot phase and are hosting the ELIXIR hub

Page 10: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.201510

EBI provides…

• Structured archives (and associated submission services) for most major types of molecular biological data

• e.g. European Nucleotide Archive (part of the ENA-GenBank-DDBJ International Nucleotide Sequence Database Consortium)

• European Variation Archive – now accepting submissions in VCF format

• ArrayExpress, PRIDE, Metabolights

• Integrative, interpreted services providing access to that data in a biologically meaningful context

• e.g. Ensembl

Page 11: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] ELIXIR Innovation and SME Forum Wageningen 18th-19th March 2015

18.03.201511

Ensembl

• A modular suite of software for genome analysis and visualisation developed jointly by the Wellcome Trust Sanger Institute and the European Bioinformatics Institute

• Now used for genomes from across the taxonomic space

• Offers a standard set of interfaces to a wide range of genome-scale data, including:

• Web-based GUI

• Public mySQL server

• Perl and REST-ful APIs

• FTP

• Data mining tool (constructed using BioMart) framework with its own set of interfaces: web GUI, web services, command line and local client

Page 12: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201526th July 2013 [email protected]

vertebrates

metazoaplants

protistsfungibacteria

Page 13: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

• Farm animals

• Crop plants

• Pests

• Vectors

• Pollinators

• Pathogens

• Symbionts and commensuals

Agriculturally relevant species in Ensembl

Page 14: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201514

Page 15: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201515

Page 16: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201516

Page 17: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

18.03.201517

Page 18: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

Gene tree pipeline

O r t h o l o g s & P a r a l o g s

Take canonical protein for each gene belonging to one Ensembl Genomes clade

Cluster: WU-BLASTP + Smith-Waterman all-versus-all, hcluster_sg

Align: multiple aligners consensified by M-Coffee

Build trees: PhyML-WAG + PhyML-HKY + NJ-p + NJ-dN + NJ-dS + species tree → TreeBeST-merge

Infer orthologues and paralogues

Page 19: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

Paralogues:

Any gene pairwise relationship where the ancestor node is a duplication event

Orthologues:

Any gene pairwise relationship where the ancestor node is a speciation event

Orthologues and paralogues

Page 20: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

• ortholog_one2one

• ortholog_one2many

• ortholog_many2many

• apparent_ortholog_one2one

• possible_ortholog (weakly supported duplication node)

• within_species_paralog

• other_paralog (too distant to be in the same tree)

• contiguous_gene_split (artefact)

• putative_gene_split (artefact)

Orthology / paralogy types

Page 21: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201502.02.2015 [email protected]

Page 22: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

• Only for certain combinations of species

• Generated using (B)LASTz-net

Synteny

• Organisms of relatively recent divergence show similar blocks of genes in the same relative positions in the genome

• Shows how the genome is “cut and pasted” in the course of evolution

• Calculated using pairwise whole genome alignments

• Only for certain combinations of species

Pairwise whole genome alignments & synteny

Page 23: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

Ensembl

• Ensembl supports many livestock species

• Ensembl provides automatic gene annotation for these species

• Ensembl works with Havana to support manual annotation in Pigs

• Ensembl provides Variation databases and functional annotation where the data exists

• Ensembl is playing an active role in FAANG and will integrate the functional data generated as it becomes available

Page 24: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

FAANG

• Functional Annotation of Animal Genomes

• High quality transcriptomic and regulatory annotation of Animal Genomes

• Open Data released pre-publication

• Common data and analysis standards

• EBI leading establishment of infrastructure for data sharing and standard

• http://www.faang.org

Page 27: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.201527

The bread wheat genome

• Large – haploid genome size is > 5 Gb

• But in fact, the genome is an alloxhexaploid (triploid genome size ~ 16 Gb)

• Each diploid genome is quite homozygous

Page 28: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected]

Evolution of hexaploid bread wheat

Page 29: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.201529

The bread wheat genome

• Genome has been sequenced by Illumina after chromosome sorting

• Assembly is fragmented, but gene models are broadly comparable to other grasses

• Chromosome 3B has been sequenced BAC-by-BAC

Page 30: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.201530

Wheat in Ensembl Plants

• We represent the IWGSC chromosome survey sequence with the addition of the “finished” 3B sequence.

• We also use PopSeq data (from IPK, Gatersleben) to group scaffolds into bins based on genetic locations

Page 31: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201518.03.201531

1:1 orthology calls over 19 cereals including the three sub-genomes of bread wheat

Page 32: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201518.03.201532

Page 33: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201518.03.201533

Page 34: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.201534

Polymorphism data for bread wheat

• ~900,000 SNPs provided by CerealsDB, as follows:

• The Axiom 820K SNP Array contains 820,000 SNPs of which ~684,000 have been mapped.

• The iSelect 80K Array contains over 80,000 SNP loci of which ~58,000 have been mapped.

• The KASP probeset contains ~3,900 SNP loci of which ~3,100 have been loaded in Ensembl Plants

Page 35: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201518.03.201535

Page 36: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201518.03.201536

Page 38: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.201538

Inter-homoeologous variants

Genome combination

Mismatchlength (in reference genome), bp

Alignment length, bp

% mismatch

B on A 2,881,969 41,739,915 6.90

D on A 2,665,562 43,228,044 6.17

A on B 2,892,005 41,749,951 6.93

D on B 2,739,967 44,238,039 6.34

A on D 2,689,840 43,252,322 6.22

B on D 2,745,993 43,244,065 6.35

Mismatch defined as length on reference not matched in non reference

Page 39: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201518.03.201539

Page 40: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

Bread wheat whole genome alignment

• DNA-DNA pairwise alignments with lastZ

• Brachypodium distachyon: 617,996,145 Mb (14% of bread wheat) in 1,310,922 blocks

• Hordeum vulgare: 423,284,874 Mb (9% of bread wheat) in 2,902,234 blocks

• Oryza sativa Japonica: 312,857,683 Mb out of 4,460,951,632 (7% of bread wheat) in 718,036 blocks

[email protected] 18.03.201540

Page 41: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.201541

Additional alignment data for bread wheat

• Repbase repeats

• Triticeae repeats from TREP

• Wheat RNA-Seq, ESTs, and UniGene datasets have been aligned to the Triticum aestivum genome:

• 454 RNA-seq data for the following INSDC studies: SRP02455 (Akhuvnova et al.), ERP001415 (Brenchley et al.), SRP004502

• Sequences from TriFLDB

• Transcriptome assembly from diploid einkorn wheat Triticum monococcum (Fox et al.)

Page 42: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

Diploid progenitors of bread wheat

• Aegilops tauschii (DD) and Triticum urartu (AA) are also included in Ensembl Plants

• In addition, we have RNA-seq data from Triticum monococcum (AA)

• These genomes have been aligned to rice, and barley

• Relevant RNA-seq reads have been also aligned

ELIXIR Innovation and SME Forum Wageningen 18th-19th March 201519.02.2013 [email protected]

Page 43: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

Bread wheat whole genome alignment

• DNA-DNA pairwise alignments with lastZ

• Brachypodium distachyon: 617,996,145 Mb (14% of bread wheat) in 1,310,922 blocks

• Hordeum vulgare: 423,284,874 Mb (9% of bread wheat) in 2,902,234 blocks

• Oryza sativa Japonica: 312,857,683 Mb out of 4,460,951,632 (7% of bread wheat) in 718,036 blocks

[email protected] 18.03.201543

Page 44: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

Accessing Ensembl Data ProgramaticallyAccessing Ensembl Data Programatically

5 easy methods

Page 45: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

• ftp://ftp.ensemblgenomes.org/pub/

• http://plants.ensembl.org/info/data/ftp/index.html

• Genomic, cDNA and protein sequence (FASTA)

• Annotated sequence (EMBL / GenBank)

• Gene sets (GTF)

• Resequencing alignments individuals / strains (EMF)

• Whole-genome multiple alignments (EMF)

• Gene-based multiple alignments (EMF)

• Constrained elements (BED)

• Database dumps (MySQL)

Access method 1:FTP downloads

Page 46: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] Gramene Workshop, Plant and Animal Genomes XIII18.03.201546

Access method 2: mySQL

• MySQL: an open-source relational database management system (RDBMS)

• Used as the back end to support most Ensembl pipelines and applications

• You get the database from http:///mysql.com and install locally

• On the Ensembl Genomes FTP site, you can download the Ensembl schema as a .sql file.

• You can also download the data files

/data/mysql/bin/mysql -u mysqldba

create database zea_mays_core_24_77_6;

exit;

/data/mysql/bin/mysql -u mysqldba zea_mays_core_24_77_6 < zea_mays_core_24_77_6

/data/mysql/bin/mysqlimport -u mysqldba --fields_escaped_by=\\zea_mays_core_24_77_6 -L *.txt

Page 47: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

Access method 3: Ensembl Perl API

• Mature, fully featured Perl API (Applications Programming Interface) for Ensembl resources

• Perl: a commonly used programming language in bioinformatics, designed to make “easy thing easy and hard things possible”

• Provides access to:

• Genomic sequence

• Genome features e.g. genes, translations

• Annotation e.g. cross-references

• http://http://www.ensembl.org/info/docs/api/index.html

Page 48: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

• REpresentational State Transfer

• is an abstraction of the architecture of the World Wide Web; more precisely, REST is an architectural style consisting of a coordinated set of architectural constraints applied to components, connectors, and data elements, within a distributed hypermedia system. REST ignores the details of component implementation and protocol syntax in order to focus on the roles of components, the constraints upon their interaction with other components, and their interpretation of significant data elements (Wikipedia)

• A style for structuring URLs (i.e. web addresses) according to the content they contain

• RESTful web service or RESTful web API• Allows users to access data simply by invoking the URL

• Often returns a data structure defined in a simple grammar (e.g. JSON) which can be imported into an object in any programming language

Access method 4: REST API

Page 49: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

• A generic tool to facilitate the design and query of data warehouses

• Data warehouses are databases designed to optimise the performance of certain commonly performed queries

• May be less flexible than normalised schema

• Less suitable for maintaining primary data (harder to automatically define constraints due to form of data model)

• Nonetheless, can still be implemented within RDBMS

• BioMart uses mySQL

• We have gene-centric and variant centric BioMarts for all Ensembl divisions

• BioMart comes with its own web interface

Access method 5: BioMart

Page 50: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

BioMart Web UI

Page 51: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] Gramene Workshop, Plant and Animal Genomes XIII18.03.201551

Access Method 6: Virtual Machines

• Download an environment containing all of Ensembl to run on your machine

• In effect, you are downloading/running a model of a computer

• As long as your computer can support running the VM, there should be no problem with library incompatibilities etc. - all the resources Ensembl needs are within the VM

• Increasingly, a model of choice for running web-based services (e.g. in cloud environments) – you don’t deploy into a platform, you deploy a whole platform

• We use OpenBox, an open source virtualisation platform

• http://ensemblgenomes.org/info/access/virtual_machine

Page 52: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.201552

Funding• Ensembl Genomes Funded by

• EMBL

• EU (INFRAVEC, Microme, transPLANT, AllBio)

• BBSRC (PhytoPath, wheat/barley/midge sequencing, UK-US collaboration, RNAcentral)

• Wellcome Trust (PomBase)

• NIH/NIAID (VectorBase)

• NSF (Gramene collaboration)

• Bill and Melinda Gates Foundation (wheat rust)

Page 53: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk

[email protected] 18.03.201553

People• James Allen, Irina Armean, Dan Bolser, Bruce Bolt, Mikkel

Christensen, Paul Davis, Thomas Down, Christoph Grabmueller, Kevin Howe, Arnaud Kerhornou, Julia Khobdova, Eugene Kulesha, Nick Langridge, Dan Lawson, Mark McDowall, Uma Maheswari, Gareth Maslen, Michael Nuhn, Chuang Kee Ong, Michael Paulini, Helder Pedro, Anton Petrov, Dan Staines, Brandon Walts, Gary Williams

• The vertebrate genomics team @ EBI (Paul Flicek)

Page 54: Ensembl: A Genomic Toolset for pigs, poultry, …...Ensembl: A Genomic Toolset for pigs, poultry, plants, pests, pathogens and pollinators Paul Kersey 2 18.03.2015 pkersey@ebi.ac.uk