“the new world”

Post on 22-Feb-2016

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

“The new world”. As presented, global interaction-detection methods have been invented in the last few years: Yeast 2 Hybrid arrays . Mass spectrometry. Correlated mRNA expression profiles. Genetic lethal mutations In silico predictions. And more… - PowerPoint PPT Presentation

TRANSCRIPT

““The new worldThe new world”” As presented, global interaction-detection methods As presented, global interaction-detection methods

have been invented in the last few years:have been invented in the last few years:1.1. Yeast 2 Hybrid arraysYeast 2 Hybrid arrays..2.2. Mass spectrometry.Mass spectrometry.3.3. Correlated mRNA expression profiles.Correlated mRNA expression profiles.4.4. Genetic lethal mutationsGenetic lethal mutations5.5. In silico predictions. In silico predictions. 6.6. And more…And more…

Having understood these methods, our goals are Having understood these methods, our goals are now:now:

1.1. Compare the outputs of these methods.Compare the outputs of these methods.2.2. Use these outputs to extract biological information.Use these outputs to extract biological information.

Vast amounts of interaction data has emerged: for each Vast amounts of interaction data has emerged: for each method a PPI database was created.method a PPI database was created.

Our first goal is to compare these databases:Our first goal is to compare these databases:1.1. AccuracyAccuracy2.2. BiasesBiases3.3. OverlapsOverlaps4.4. ComplementaritiesComplementarities

We’ll present this based on the following article: We’ll present this based on the following article:     Comparative assessment of large-scale data sets of protein-protein interactions. von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields

S, Bork P.

Method evaluationMethod evaluation

Method evaluationMethod evaluation Comparing interaction data is difficult.Comparing interaction data is difficult. However, there is only difficult in bread. However, there is only difficult in bread. To overcome these difficulties, a few decisions are To overcome these difficulties, a few decisions are

made: made: A. The common unit of analysis for this study- binary A. The common unit of analysis for this study- binary

interactions.interactions.B. We will focus on the Yeast proteome B. We will focus on the Yeast proteome C. The reference sets- manually made catalogues of known C. The reference sets- manually made catalogues of known

protein complexes:protein complexes:1.1. YPDYPD2.2. MIPSMIPS

Overlaps and complementaritiesOverlaps and complementarities About 80,000 yeast PPI’s are currently available About 80,000 yeast PPI’s are currently available

from all latest databases combined.from all latest databases combined. Surprisingly, only about 2,400 (~3%) are supported Surprisingly, only about 2,400 (~3%) are supported

by more than one method.by more than one method. Possible explanations:Possible explanations:

1.1. The methods have not reached saturation.The methods have not reached saturation.2.2. Significant amount of false-positives.Significant amount of false-positives.3.3. Complementarities- strengths and weaknesses of each Complementarities- strengths and weaknesses of each

method. method. To illustrate this, look at the following graph… To illustrate this, look at the following graph…

Interaction data by each methodInteraction data by each method

Quality evaluationQuality evaluation Quality of the methods consists of:Quality of the methods consists of:

1.1. CoverageCoverage2.2. AccuracyAccuracy

Comparing the data with a reference set allows Comparing the data with a reference set allows evaluation of these methods.evaluation of these methods.

Accuracy vs.Accuracy vs. CoverageCoverage

Quality evaluationQuality evaluation An independent measure of quality :An independent measure of quality : To what degree do the methods describe PPI’s To what degree do the methods describe PPI’s

between proteins within the same functional group. between proteins within the same functional group.

This is well shown in the first graph:This is well shown in the first graph:

Interaction data by each methodInteraction data by each method

Biases in interaction coverageBiases in interaction coverage None of the methods covers more than 60% of the None of the methods covers more than 60% of the

proteins in the yeast genome.proteins in the yeast genome. Are there common biases as to which proteins are Are there common biases as to which proteins are

covered?covered? Yes! There are areas in the databases where biases are Yes! There are areas in the databases where biases are

found:found:1.1. ““Democracy”-Democracy”-

Common, abundant proteins are “preferred”.Common, abundant proteins are “preferred”.2.2. ““Oligarchy”-Oligarchy”-

Proteins from specific cellular locations are “preferred”.Proteins from specific cellular locations are “preferred”.3.3. ““Monarchy”-Monarchy”- Ancient, conserved proteins are “preferred” over proteins that Ancient, conserved proteins are “preferred” over proteins that

emerged later in evolution. emerged later in evolution.

Bias Bias towards towards various various cecellullular lar

locationslocations

Protein-protein interaction networksProtein-protein interaction networks

Having evaluated our methods, our next Having evaluated our methods, our next goal is to use their outputs- PPI databases.goal is to use their outputs- PPI databases.

How can we organize this data in order to How can we organize this data in order to extract valuable information from it?extract valuable information from it?

Networks !Networks ! 2 general kinds of networks- 2 general kinds of networks-

1.1. Simple PPI’s network.Simple PPI’s network.2.2. Category-divided PPI’s network.Category-divided PPI’s network.

Why networks?Why networks?1.1. Simple networks visualize the amount and type of Simple networks visualize the amount and type of

Interactions that occur for each protein.Interactions that occur for each protein.2.2. Category-divided networks reveal a lot more- to what Category-divided networks reveal a lot more- to what

extent do proteins of different cell locations or different extent do proteins of different cell locations or different functions interact? functions interact?

3.3. Characterizing proteins according to the proteins they Characterizing proteins according to the proteins they interact with.interact with.

Now that we’re convinced, we’ll consultNow that we’re convinced, we’ll consultA network of protein-protein interactions in yeast.Schwikowski B, Uetz P, Fields S.

Protein-protein interactions networksProtein-protein interactions networks

Protein-protein interactions networksProtein-protein interactions networks

2,709 PPI’s were 2,709 PPI’s were analyzed, consisting of analyzed, consisting of 2,039 yeast proteins.2,039 yeast proteins.

A surprising result was A surprising result was discovered:discovered:

Number of Number of networksnetworks

Number of Number of proteins in proteins in

networknetwork1115481548

111919

995-115-11

1931931-41-4

Creating the networkCreating the network

Proteins have been assigned 42 cellular Proteins have been assigned 42 cellular roles, for example-cell structure, mitosis, etc.roles, for example-cell structure, mitosis, etc.

1,485 have been categorized, 39% with 1,485 have been categorized, 39% with more than one role.more than one role.

““cluster”- any 3 or more proteins of the same cluster”- any 3 or more proteins of the same function, separated by no more than 2 other function, separated by no more than 2 other proteins.proteins.

For example- 89% of chromatin proteins are For example- 89% of chromatin proteins are within clusters. within clusters.

PPI networkPPI network

Assessing the quality of the dataAssessing the quality of the data

In order to assess the quality of the In order to assess the quality of the network, we use the following algorithm:network, we use the following algorithm:

For each characterized protein, with at For each characterized protein, with at least one characterized partner:least one characterized partner:

1.1. A list of the functions of its neighbors is made.A list of the functions of its neighbors is made.2.2. If the function of the protein is among the 3 If the function of the protein is among the 3

most common functions in the list, we say it is most common functions in the list, we say it is a correct classification. a correct classification.

Example of assessmentExample of assessment

Results of assessmentResults of assessment

72% were marked correct.72% were marked correct. On random links only 12% were marked On random links only 12% were marked

correct- the network seems valid.correct- the network seems valid. The 28% might be due to-The 28% might be due to-

1.1. False-positives.False-positives.2.2. Incomplete annotationsIncomplete annotations3.3. Cross-talkCross-talk4.4. Unknown biological connections.Unknown biological connections.

Crosstalk between and within Crosstalk between and within functional groupsfunctional groups

Relationships between functional groups Relationships between functional groups might be biologically meaningful.might be biologically meaningful.

65% of the interactions occur between 65% of the interactions occur between proteins with a common function.proteins with a common function.

But, it is the minority which is interesting…But, it is the minority which is interesting…

Crosstalk between functional groupsCrosstalk between functional groups

Crosstalk between and within Crosstalk between and within subcellular compartments subcellular compartments

It is probable that proteins from the same It is probable that proteins from the same cellular area interact (as with same function)cellular area interact (as with same function)

78% of the PPI’s involving proteins with 78% of the PPI’s involving proteins with known localization, occur between proteins known localization, occur between proteins of the same cellular compartment.of the same cellular compartment.

Interaction between groups of different Interaction between groups of different areas are meaningful here as well:areas are meaningful here as well:

Interactions and localizationsInteractions and localizations

Prediction of functionPrediction of function

Of the 2,039 proteins in the data set, 554 have no Of the 2,039 proteins in the data set, 554 have no annotation for “functional role”.annotation for “functional role”.

We would like to predict their role, how?We would like to predict their role, how? Obvious method: interacting partners. But…Obvious method: interacting partners. But…

Prediction of functionPrediction of function Solution: use the network benefits-second degree Solution: use the network benefits-second degree

neighbors, and so on. neighbors, and so on. For example, if:For example, if:

-uncharacterized-uncharacterized

Prediction of function-examplePrediction of function-example

summarysummary

Evaluating PPI detection methods reveals Evaluating PPI detection methods reveals unique accuracy, coverage & biases for unique accuracy, coverage & biases for each method.each method.

There are typical overlaps and There are typical overlaps and complementarities between methods.complementarities between methods.

PPI networks reveal important information PPI networks reveal important information about interaction between protein groups.about interaction between protein groups.

PPI networks assist in predicting protein PPI networks assist in predicting protein functions.functions.

top related