functional associations of protein in entire genomes sequences 2002. 1. 21. bioinformatics center of...
TRANSCRIPT
Functional Associations of Protein in Entire Genomes Sequences
2002. 1. 21. Bioinformatics Center of Shanghai Insti
tutes for Biological SciencesBingding Huang
Contents
Introduction
Methods to prediction
Results and Discussion
How About Next Work?
Introduction
Motivation: Large-scale genome projects generate a rapidly inc
reasing number of sequences, most of them biochemically uncharacterized
Using experimental methods is tedious,labour intensive and inaccurate
Introduction
Key Idea Correlation of sequence similarity with
function similarity A basis for transferring functional
knowledge from a characterized protein to a homologous, but uncharacterized one
Functionally Linked and Proteins interaction
So many programs to do this ...
Introduction
Protein Function linkage Proteins that participate in a common
structural complex or metabolic pathway During evolution,all such functionally linked
proteins tend to preserved or eliminated in a new species.
Introduction
Protein-protein interaction(Gene fusion) Some interacting proteins such as the Gyr A and Gyr B submits
of E Coli DNA gyrase are fused into another organism,in this cas
e in the toposimerase of yeast.
Thus the sequence of Gyr A (804 amino acid residues) and Gyr
B (875) to different seqments of the topoisomerase (1429)migh
t be used to predict that Gyr A and Gyr B intact in E.coli
Methods to predict function of protein
Traditional Homology search
Phylogenetic Profiles
Rosetta Stone Method
Gene Neighbor Method
Gene Fusion Method
Machine Learning
Structure Prediction
Methods to predict function of protein
Homology Method The function of a query protein can be deduced from comparison
of the amino-acid sequence of the query protein with those of
homologous proteins of known function
However, it is worth noticing the limitations in predicting function
by homology search. Based on the initial assumption, it cannot
assign "novel" function(s) to the query protein, or "any" function if
you cannot find any sequence homology with known function
from the database. In addition, the sequence identity does not
always match with the functional resemblance
Methods to predict function of protein
Phylogenetic profiles (Marcotte)
Based the hypothesis that functionally linked proteins evolve in
a correlated fashion,and therefore,they have homologs in the sa
me subset of organisms.
A phygenetic profle describes the pattern of presence or absenc
e of a particalar protein across a set of sequenced organisms.If
two proteins have the same phygenetic profile in all surveyed g
enomes,it is inferred that these two proteins have a function lin
ked.
Pairs of functionally linked proteins have no amino acid sequenc
e similarity with each other and can’t be linked by conventional
sequence-alignment techniques
Methods to predict function of protein
Methods to predict function of protein
Table Phylogenetic profiles link protein with similar keywords
Methods to predict function of protein
Table 2. Phylogenetic profiles link proteins in EcoCyc classes
Methods to predict function of protein
Methods to predict function of protein
Gene Fusion method(Enright) T
Methods to predict function of protein
Domain—Fusion Analysis supported by the observation that a single protein
chain in one organism shows homology with separate interacting proteins in another organism in such a way that the interacting proteins are fused into a single peptide chain .
The detection of gene fusions in one genome (defined as ‘composite’ proteins) allows the prediction of functional associations between homologous genes that remain separate in another genome (defined as ‘component’ protein).
Methods to predict function of protein
Flowchat of the Diffused Algorithm
Symmetrification &Sequence clustering algorithm
Fusion detect algorithmSmith-WatermanSmith-Waterman
Matrix T Matrix Y
Query genomeBLAST vs
Reference genome
Query genomeBLAST vs
Query genome
Methods to predict function of protein
Results of detection
Methods to predict function of protein
Materials and methods Genome Sequence • Complete genome sequences for the 24 species were ob
tained from their original sources Genome comparison 1, 24 genome were filtered using CAST compositional bia
s filtering algorithm 2,Compared against themselves and each other 23 geno
mes using the Blastp with a cut-off E-value 1e-10. 3,Diffused algorithm was then applied to each genome in
turn as a query against the other 23(reference)genomes 4, Using other protein database as reference yields fewer
composite cases
Methods to predict function of protein
Result Yielded 132,812 component and 66,406
composite proteins in an all-against-all genome genome comparison representing multiple occurrences of the same proteins across species
these,there are are 7,224 component and 2,365 composite unique proteins across the 24 genomes
On average,9% of genes in a given genome appear to code for single-domain,component proteins predicted to be functionally associated .These proteins are detected by an additional 4% of genes that code for fused,composite proteins
Methods to predict function of protein
Discussion
This approach for the prediction of functional associations or proteins results in robust prediction for physical interaction,pathway involvement, complex formation and other types of functional associations of proteins molecules.
The landscape of gene fusions appears to be a complex one,affected by paralogy,genome size and phylogenetic distance
Methods to predict function of protein
Gene neighbor Method
If two genes(blue and yellow in the figure) are
found to be neighbors in several genomes,a
functional linkage may be inferred between the
proteins they encoded
.
Methods to predict function of protein
Discussion This method is most robust for microbial genomes but may
work to some extent even for human genes where operon-li
ke clusters are observed
This method can be powerful in uncovering functional linkag
es in prokaryotes,where operons are common,but also sho
ws promise for analysis interacting proteins in eukaryotes.
Methods to predict function of protein
Finding Functional Features of Proteins u
sing Machine Learning Techniques Hypothesis:A protein function arises from physical str
uctures of the proteins.since the structures of protein
s are built with physico-chemical interactions among
amino-acids,there might exist some features of amin
o-acid sequences according to the physico-chemical i
nteractions.These features are called ‘functional feat
ures’
Methods to predict function of protein
Overview of the method
Methods to predict function of protein
The procedure of Machine Learning Analogical reasoning
To make a assumptions about functional features Inductive reasoning
To generalize the hypothesis made by analogical reasoning
To decide which localization pattern is most useful to classify protein functions
Deductive reasoning To refine the localization pattern into
classification rules Knowledge about protein functions and structures
are used to make logical description of classification rules
Methods to predict function of protein
Result and Discussion These features can discriminate different functions of pro
teins that have similar amino-acid sequence Furthermore,the features can recognize same function pro
teins that not similar sequences.
More need to do :
Refine classification rules and integrate three machine learning techniques.
Methods to predict function of protein
How to predict protein function more prec
isely?
By three-dimension structure:
Because a protein’s function is determined more dir
ectly by its structure and dynamics than by its sequen
ce
Methods to predict function of protein
Two disadvantages of this method First,three-dimensional structure are available for o
nly a fraction of proteins But this limitation should be reduced by structural genomi
cs within a few years. Second,functional details that can be extracted fro
m structure but not from sequence often depend on the environment,as well as on its dynamics and energetics,all of which are difficult to obtain by existing experimental and theoretical techniques
Results and Discussion
It is conceivable that prediction of protein
functions will be more precise when the above methods are combined
Prediction methods need to be evaluated rigorously and made accessible over internet.
Varied experimental data and theoretical predictions must be integrated because no single experimental or computational approach is likely to result in accurate and complete models of protein assemblies and pathways.
Results and Discussion
System limitations Several errors, but not currently addressed in GeneQuiz False Positives
A transfer is made on the basis of a wrongly inferred homology
Inaccurate Transfer The wrong information is transferred although the homology is c
orrect
Misleading database information The database source is itself misleading