functional associations of protein in entire genomes sequences 2002. 1. 21. bioinformatics center of...

30
Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Sha nghai Institutes for Biologic al Sciences Bingding Huang

Upload: clifton-fleming

Post on 28-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Functional Associations of Protein in Entire Genomes Sequences

2002. 1. 21. Bioinformatics Center of Shanghai Insti

tutes for Biological SciencesBingding Huang

Page 2: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Contents

Introduction

Methods to prediction

Results and Discussion

How About Next Work?

Page 3: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Introduction

Motivation: Large-scale genome projects generate a rapidly inc

reasing number of sequences, most of them biochemically uncharacterized

Using experimental methods is tedious,labour intensive and inaccurate

Page 4: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Introduction

Key Idea Correlation of sequence similarity with

function similarity A basis for transferring functional

knowledge from a characterized protein to a homologous, but uncharacterized one

Functionally Linked and Proteins interaction

So many programs to do this ...

Page 5: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Introduction

Protein Function linkage Proteins that participate in a common

structural complex or metabolic pathway During evolution,all such functionally linked

proteins tend to preserved or eliminated in a new species.

Page 6: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Introduction

Protein-protein interaction(Gene fusion) Some interacting proteins such as the Gyr A and Gyr B submits

of E Coli DNA gyrase are fused into another organism,in this cas

e in the toposimerase of yeast.

Thus the sequence of Gyr A (804 amino acid residues) and Gyr

B (875) to different seqments of the topoisomerase (1429)migh

t be used to predict that Gyr A and Gyr B intact in E.coli

Page 7: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Traditional Homology search

Phylogenetic Profiles

Rosetta Stone Method

Gene Neighbor Method

Gene Fusion Method

Machine Learning

Structure Prediction

Page 8: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Homology Method The function of a query protein can be deduced from comparison

of the amino-acid sequence of the query protein with those of

homologous proteins of known function

However, it is worth noticing the limitations in predicting function

by homology search. Based on the initial assumption, it cannot

assign "novel" function(s) to the query protein, or "any" function if

you cannot find any sequence homology with known function

from the database. In addition, the sequence identity does not

always match with the functional resemblance

Page 9: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Phylogenetic profiles (Marcotte)

Based the hypothesis that functionally linked proteins evolve in

a correlated fashion,and therefore,they have homologs in the sa

me subset of organisms.

A phygenetic profle describes the pattern of presence or absenc

e of a particalar protein across a set of sequenced organisms.If

two proteins have the same phygenetic profile in all surveyed g

enomes,it is inferred that these two proteins have a function lin

ked.

Pairs of functionally linked proteins have no amino acid sequenc

e similarity with each other and can’t be linked by conventional

sequence-alignment techniques

Page 10: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Page 11: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Table Phylogenetic profiles link protein with similar keywords

Page 12: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Table 2. Phylogenetic profiles link proteins in EcoCyc classes

Page 13: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Page 14: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Gene Fusion method(Enright) T

Page 15: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Domain—Fusion Analysis supported by the observation that a single protein

chain in one organism shows homology with separate interacting proteins in another organism in such a way that the interacting proteins are fused into a single peptide chain .

The detection of gene fusions in one genome (defined as ‘composite’ proteins) allows the prediction of functional associations between homologous genes that remain separate in another genome (defined as ‘component’ protein).

Page 16: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Flowchat of the Diffused Algorithm

Symmetrification &Sequence clustering algorithm

Fusion detect algorithmSmith-WatermanSmith-Waterman

Matrix T Matrix Y

Query genomeBLAST vs

Reference genome

Query genomeBLAST vs

Query genome

Page 17: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Results of detection

Page 18: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Materials and methods Genome Sequence • Complete genome sequences for the 24 species were ob

tained from their original sources Genome comparison 1, 24 genome were filtered using CAST compositional bia

s filtering algorithm 2,Compared against themselves and each other 23 geno

mes using the Blastp with a cut-off E-value 1e-10. 3,Diffused algorithm was then applied to each genome in

turn as a query against the other 23(reference)genomes 4, Using other protein database as reference yields fewer

composite cases

Page 19: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Result Yielded 132,812 component and 66,406

composite proteins in an all-against-all genome genome comparison representing multiple occurrences of the same proteins across species

these,there are are 7,224 component and 2,365 composite unique proteins across the 24 genomes

On average,9% of genes in a given genome appear to code for single-domain,component proteins predicted to be functionally associated .These proteins are detected by an additional 4% of genes that code for fused,composite proteins

Page 20: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Discussion

This approach for the prediction of functional associations or proteins results in robust prediction for physical interaction,pathway involvement, complex formation and other types of functional associations of proteins molecules.

The landscape of gene fusions appears to be a complex one,affected by paralogy,genome size and phylogenetic distance

Page 21: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Gene neighbor Method

If two genes(blue and yellow in the figure) are

found to be neighbors in several genomes,a

functional linkage may be inferred between the

proteins they encoded

.

Page 22: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Discussion This method is most robust for microbial genomes but may

work to some extent even for human genes where operon-li

ke clusters are observed

This method can be powerful in uncovering functional linkag

es in prokaryotes,where operons are common,but also sho

ws promise for analysis interacting proteins in eukaryotes.

Page 23: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Finding Functional Features of Proteins u

sing Machine Learning Techniques Hypothesis:A protein function arises from physical str

uctures of the proteins.since the structures of protein

s are built with physico-chemical interactions among

amino-acids,there might exist some features of amin

o-acid sequences according to the physico-chemical i

nteractions.These features are called ‘functional feat

ures’

Page 24: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Overview of the method

Page 25: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

The procedure of Machine Learning Analogical reasoning

To make a assumptions about functional features Inductive reasoning

To generalize the hypothesis made by analogical reasoning

To decide which localization pattern is most useful to classify protein functions

Deductive reasoning To refine the localization pattern into

classification rules Knowledge about protein functions and structures

are used to make logical description of classification rules

Page 26: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Result and Discussion These features can discriminate different functions of pro

teins that have similar amino-acid sequence Furthermore,the features can recognize same function pro

teins that not similar sequences.

More need to do :

Refine classification rules and integrate three machine learning techniques.

Page 27: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

How to predict protein function more prec

isely?

By three-dimension structure:

Because a protein’s function is determined more dir

ectly by its structure and dynamics than by its sequen

ce

Page 28: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Methods to predict function of protein

Two disadvantages of this method First,three-dimensional structure are available for o

nly a fraction of proteins But this limitation should be reduced by structural genomi

cs within a few years. Second,functional details that can be extracted fro

m structure but not from sequence often depend on the environment,as well as on its dynamics and energetics,all of which are difficult to obtain by existing experimental and theoretical techniques

Page 29: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Results and Discussion

It is conceivable that prediction of protein

functions will be more precise when the above methods are combined

Prediction methods need to be evaluated rigorously and made accessible over internet.

Varied experimental data and theoretical predictions must be integrated because no single experimental or computational approach is likely to result in accurate and complete models of protein assemblies and pathways.

Page 30: Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding

Results and Discussion

System limitations Several errors, but not currently addressed in GeneQuiz False Positives

A transfer is made on the basis of a wrongly inferred homology

Inaccurate Transfer The wrong information is transferred although the homology is c

orrect

Misleading database information The database source is itself misleading