introduction to gene ontology annotation resources

Post on 24-Feb-2016

41 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Introduction to Gene Ontology annotation resources. Rachael Huntley UniProt -GOA. A Practical Overview of Biomedical Ontologies 17-18 April 2013. Talk Overview. Intro to GO and GO terms. Exercise . Annotating to GO. Accessing GO annotations. Exercise. Practical use of GO. - PowerPoint PPT Presentation

TRANSCRIPT

Introduction to Gene Ontology annotation resourcesRachael Huntley UniProt-GOA

A Practical Overview of Biomedical Ontologies17-18 April 2013

Talk Overview

• Intro to GO and GO terms• Exercise

• Annotating to GO

• Practical use of GO

• Exercise

• Precautions

• Accessing GO annotations

• Exercise

What is GO?

• A way to capture biological knowledge for individual gene productsin a written and computable form

The Gene Ontology

• A set of concepts and their relationships to each other arrangedas a hierarchy

www.ebi.ac.uk/QuickGO

Less specific concepts

More specific concepts

The Concepts in GO

1. Molecular Function

2. Biological Process

3. Cellular Component

An elemental activity or task or job

• protein kinase activity• insulin receptor activity

A commonly recognised series of events

• cell division

Where a gene product is located

• mitochondrion

• mitochondrial matrix

• mitochondrial inner membrane

Anatomy of a GO term

Unique identifier

Term name

DefinitionSynonyms

Cross-references

Ontology structure

• Directed acyclic graph

Terms can have more than one parent

• Terms are linked by relationships

is_apart_ofregulates (and +/- regulates)

www.ebi.ac.uk/QuickGOoccurs_inhas_part

These relationships allow for complex analysis of large datasets

Searching for GO terms

http://www.ebi.ac.uk/QuickGO

Search GO terms or proteins

Exercise

Search for a GO term Exercise 1 (pg.15)

10

Why do we need GO?

Reasons for the Gene Ontology

www.geneontology.org

• Inconsistency in English language

Inconsistency in English languauge

• Same name for different concepts

or

??

Cell

Comparison is difficult – in particular across species or across databases

• Different names for the same concept

Eggplant

Aubergine

Brinjal

Melongene

Same for biological concepts

Reasons for the Gene Ontology

www.geneontology.org

• Inconsistency in English language

• Increasing amounts of biological data available

• Increasing amounts of biological data to come

Search on ‘DNA repair’...get over 70,000 results

Increasing amounts of biological data available

Expansion of sequence information

Reasons for the Gene Ontology

www.geneontology.org

• Inconsistency in English language

• Large datasets need to be interpreted quickly

• Increasing amounts of biological data available

• Increasing amounts of biological data to come

• Compile the ontologies

- currently over 38,000 terms - constantly increasing and improving

• Annotate gene products using ontology terms

- around 30 groups provide annotations

• Provide a public resource of data and tools

- regular releases of annotations - tools for browsing/querying annotations and editing the ontology

Aims of the GO project

Reactome

GO Annotation

UniProt-Gene Ontology Annotation (UniProt-GOA) project at the EBI

• Largest open-source contributor of annotations to GO• Provides annotation for more than 390,000 species• Our priority is to annotate the human proteome

A GO annotation is … …a statement that a gene product;

1. has a particular molecular function or is involved in a particular biological process

or is located within a certain cellular component

2. as determined by a particular method

3. as described in a particular reference

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

Electronic Annotation

Manual Annotation

UniProt-GOA incorporates annotations made using two methods

• Quick way of producing large numbers of annotations• Annotations use less-specific GO terms

• Time-consuming process producing lower numbers of annotations• Annotations tend to use very specific GO terms

• Only source of annotation for many non-model organism species

Electronic annotation methods

GO:0004707: MAP kinase activity

GO:0005634: Nucleus

GO:0009734: Auxin mediated signaling pathway

1. Mapping of external concepts to GO terms

Annotations are high-quality and have an explanation of the method (GO_REF)

Macaque

Mouse

DogCow

Guinea PigChimpanzee Rat

Chicken

Ensembl compara

2. Automatic transfer of manual annotations to orthologs

...and more

e.g. Human

Arabidopsis

Rice

Brachypodium

Maize

Poplar

Grape

…and moreEnsembl compara

Electronic annotation methods

http://www.geneontology.org/cgi-bin/references.cgi

Manual annotation by UniProt-GOA

High–quality, specific annotations made using:

• Full text peer-reviewed papers

• A range of evidence codes to categorise the types of evidence found in a

papere.g. IDA, IMP, IPI

http://www.ebi.ac.uk/GOA

* Includes manual annotations integrated from external model organism and specialist groups

1,259,994Manual annotations*

139,757,414Electronic annotations

April 2013 Statistics

Number of annotations in UniProt-GOA database

How to access and useGO annotation data

Where can you find annotations?UniProtKB

Ensembl

Entrez gene

UniProt vs. QuickGO annotation display

QuickGO

UniProt

GO Consortium website

Gene Association Files17 column files containing all information for each annotation

http://www.ebi.ac.uk/GOA/downloads.html

Numerous species-specific files

UniProt-GOA website

GO browsers

http://www.ebi.ac.uk/QuickGO

The EBI's QuickGO browser

Search GO terms or proteins

Find sets of GO annotations

Exercise

Find annotations to a protein Exercise 2 (pg.15)

Find annotations to a list of proteins Exercise 1 and 2 (pg.20)

Find protein list at:

ftp://ftp.ebi.ac.uk/pub/contrib/goa/Tutorial_Data

• Access gene product functional information

• Analyse high-throughput genomic or proteomic datasets

• Validation of experimental techniques

• Get a broad overview of a proteome

• Obtain functional information for novel gene products 

How scientists use the GO

Some examples…

Term enrichment

• Most popular type of GO analysis

• Determines which GO terms are more often associated with a specified list of genes/proteins compared with a control list or rest of genome

• Many tools available to do this analysis

• User must decide which is best for their analysis

Selec ted Gene Tree: pears on lw n3d ...Branc h c olor c las s ific ation: Set_ LW_ n3d_ 5p_ ...

Co lored by : Copy of Copy o f C5_ RMA (Defa...Gene L is t: all genes (14010)attacked

time

control

Puparial adhesionMolting cycleHemocyanin

Defense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genes

Immune responseToll regulated genes

Amino acid catabolismLipid metobolism

Peptidase activityProtein catabolismImmune response

Selec ted Gene Tree: pears on lw n3d ...Branc h c o lor c la s s ification : Set_ LW_ n3d_ 5p_ ...

Colored by: Copy of Copy o f C5_ RMA (Defa...Gene L is t: a ll genes (14010)

Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

MicroArray data analysis

Analysis of high-throughput genomic datasets

Biological Process GO enrichment of 88 human peroxisome proteinsbefore focused annotation…

Mutowo-Meullenet, Huntley, et al. DATABASE 2013

…and after focused annotation

• More terms• Greater specificity• New processes

Analysis using GO annotationsGO Galaxy http://galaxy.berkeleybop.org/

go-helpdesk@ebi.ac.uk

Analysis using GO annotations

http://neurolex.org/wiki/Category:Resource:Gene_Ontology_Tools

Many more listed at:

Annotating novel sequences

• Can use BLAST queries to find similar sequences with GO annotation which can be transferred to the new sequence

• Two tools currently available;

AmiGO BLAST – searches the GO Consortium database

BLAST2GO – searches the NCBI database

Annotating novel sequences• Can use InterProScan to find GO annotation that is attributed to protein signatures in a submitted protein sequence

Using the GO to provide a functional overview for a large dataset

• Many GO analysis tools use GO slims to give a broad overview of the dataset

• GO slims are cut-down versions of the GO andcontain a subset of the terms in the whole GO

• GO slims usually contain less-specialised GO terms

Slimming the GO using the ‘true path rule’ Many gene products are associated with a large number of descriptive, leaf GO nodes:

Slimming the GO using the ‘true path rule’ …however annotations can be mapped up to a smaller set of parent GO terms:

GO slims

or you can make your own using;

Custom slims are available for download;

http://www.geneontology.org/GO.slims.shtml

• AmiGO's GO slimmer

• QuickGOhttp://www.ebi.ac.uk/QuickGO

http://amigo.geneontology.org/cgi-bin/amigo/slimmer

www.ebi.ac.uk/QuickGO

Map-up annotations with GO slims

The EBI's QuickGO browser

Search GO terms or proteins

Find sets of GO annotations

Exercise

Using GO slims in QuickGO Exercise 1 (pg.27)

Find protein list at:

ftp://ftp.ebi.ac.uk/pub/contrib/goa/Tutorial_Data

Precautions when using GO annotations for analysis

• Recommended that ‘NOT’ annotations are removed before analysis - only ~7000 out of 141 million annotations are ‘NOT’- can confuse the analysis

• The Gene Ontology is always changing and GO annotations are continually being created

- always use a current version of both- if publishing your analyses please report the versions/dates you used

http://www.geneontology.org/GO.cite.shtml

Precautions when using GO annotations for analysis

• Unannotated is not unknown- where there is no evidence in the literature for a process, function orlocation the gene product is annotated to the appropriate ontology’sroot node with an ‘ND’ evidence code (no biological data), thereby distinguishing between unannotated and unknown

• Pay attention to under-represented GO terms- a strong under-representation of a pathway may mean that normalfunctioning of that pathway is necessary for the given condition

The UniProt-GOA group

Curator:

Software developer:

Team leaders:

Rachael Huntley

Email: goa@ebi.ac.uk

http://www.ebi.ac.uk/GOA

Claire O’Donovan

Tony Sawford

Prudence Mutowo

Project leader:

Maria Martin

top related