sbvimprover $ subchallenge4:species … · outline 1) deﬁningtheproblem 2) dataprocessing 3)...

sbv Improver Sub-‐Challenge 4: Species Specific Network Inference

Gyan Bhanot, Rutgers U. Michael Biehl, U. Groningen Sahand Hormoz, KITP/UCSB Adel Dayarian, KITP/UCSB

Outline

1)  Defining the problem

2)  Data processing

3)  Modeling the network, voGng algorithm for the edges 4) AddiGon and removal of the edges

Overall Challenge Goals

Given phosphoprotein, gene expression and cytokine data

inference network specific rat and human networks

Noise Curve and SaturaGon effect in Genes

0 5 10 150

0.5

1

Mean GenEx

Std

Gen

Ex

Linearizing the signal – To remove the saturaGon effect, apply F-‐1 to the gene expression signal

0 20 400

10

20

g

F(g)

Designate genes, proteins and cytokines as on/off

•  Chose a sharp threshold of p < 0.01 •  Binarized the data into on = 1, off = 0

Gene Exp:

•  Used the recommended threshold of 3.0 •  Binarize: If > 3, then phos=yes, if < 3 then phos = no Phos. Protein:

•  Similar to proteins, but Binarize: If signal > 1.5, then cyto=yes, if < 1.5 then cyto= no •  Because noise value amongst replicas was almost half of that for proteins •  If either the gene or the corresponding cytokine is ON, turn both nodes ON

Cytokine:

Similar to previous sub-‐challenges

1)  Analyze the sub-‐tree of each sGmulus separately

2)  For each sGmulus, determine whether a node has changed compared to the control (assign ON/OFF)

3) Edge removal -‐-‐ voGng algorithm based on connecGvity of the ON and the OFF nodes.

4) Add edges such that all the ON nodes are included in the

sub-‐tree

General strategy

… … … …

SPmuli

Receptors

Adaptors, Signals 1/5

Trans. Factors

Target/Cytokines …

Analyzing the sub-‐tree of each sGmulus separately

•  Start by sGmulus at the root, go down layer by layer to form the corresponding sub-‐tree •  This divides the nodes into two groups: inside the sub-‐tree (colored) or outside it (grey)

… … … … …

ON/OFF nodes and their relaGon to the sub-‐tree

… … … … …

ON nodes filled with color

•  All the ON nodes should be in the subtree • OFF nodes should not be in

the network

Assump;on:

Modify the network such that the above assumpGons become valid

Goal:

Edge removal: VoGng algorithm across sGmuli

… … …

• PosiPve vote for an edge if it connects two ON nodes

• NegaPve vote for an edge if it connects an ON node to an OFF node • ON node’s layer >= OFF node’s layer. e.g. a signaling protein can affect another signaling protein or a transcripPon factor but it cannot affect a receptor.

Final voGng score for an edge: Sum all the posi;ve and nega;ve votes from all 26 s;muli

…

Edge removal strategy

Want to be conservaGve since: • Real gene/protein networks are complex (non-‐linearity, cooperaGvity, Gme delay, etc.), but our voGng algorithm does not address this complexity.

•  Provided data is in the form of snapshots, which is probably insensiGve to important Gme dependent effects. e.g. For some sPmuli, a cytokine level does not show a significant change, whereas the expression level for the corresponding encoding gene shows a significant change.

•  There are mulGple possible soluGons and there is insufficient informaGon to pick a unique soluGon

Edge removal strategy

Bearing all the above limitaGons in mind, we chose to…

Remove an edge if: -‐ It has only negaPve votes and no posiPve vote from 26 sPmuli

-‐ It is connected on both ends to measured nodes (phosphorylaPon, gene expression or cytokine)

Edge addiGon strategy

Add new edge such that an ON node that does not lie in the sub-‐tree of the s;mulus becomes connected to an ON node that is already included in the sub-‐tree

• There are mulGple soluGons: Can connect the node to any other ON node in the sub-‐tree that is in the same layer or in a higher layer. e.g. can connect a target to a TF, or to a signaling molecule, or to a receptor… •  Calculate the mutual informaGon (like previous sub-‐challenges) between ON node outside sub-‐tree and all ON nodes already in the sub-‐tree. Add an edge to connect to the node with the highest mutual informaGon

What we might have done to improve the results

•  Compute correlaGon between connected nodes. Use simulated annealing to add/remove nodes to generate alternates

•  Use Mutual InformaGon instead of correlaGon in the above and further prune.

•  Use LOO on all of the above to esGmate the FP and FN rates.

•  Use “Gene Centrality” from Bilal et al, Genes Cancer. 2010 Oct;1(10):1063-‐73.

Team members

Sahand Hormoz Michael Biehl

Adel Dayarian

sbvimprover $ subchallenge4:species … · outline 1) deﬁningtheproblem 2) dataprocessing 3)...

Documents