sbvimprover $ subchallenge4:species … · outline 1) definingtheproblem 2) dataprocessing 3)...
TRANSCRIPT
sbv Improver Sub-‐Challenge 4: Species Specific Network Inference
Gyan Bhanot, Rutgers U. Michael Biehl, U. Groningen Sahand Hormoz, KITP/UCSB Adel Dayarian, KITP/UCSB
Outline
1) Defining the problem
2) Data processing
3) Modeling the network, voGng algorithm for the edges 4) AddiGon and removal of the edges
Overall Challenge Goals
Given phosphoprotein, gene expression and cytokine data
inference network specific rat and human networks
Noise Curve and SaturaGon effect in Genes
0 5 10 150
0.5
1
Mean GenEx
Std
Gen
Ex
Linearizing the signal – To remove the saturaGon effect, apply F-‐1 to the gene expression signal
0 20 400
10
20
g
F(g)
Designate genes, proteins and cytokines as on/off
• Chose a sharp threshold of p < 0.01 • Binarized the data into on = 1, off = 0
Gene Exp:
• Used the recommended threshold of 3.0 • Binarize: If > 3, then phos=yes, if < 3 then phos = no Phos. Protein:
• Similar to proteins, but Binarize: If signal > 1.5, then cyto=yes, if < 1.5 then cyto= no • Because noise value amongst replicas was almost half of that for proteins • If either the gene or the corresponding cytokine is ON, turn both nodes ON
Cytokine:
Similar to previous sub-‐challenges
1) Analyze the sub-‐tree of each sGmulus separately
2) For each sGmulus, determine whether a node has changed compared to the control (assign ON/OFF)
3) Edge removal -‐-‐ voGng algorithm based on connecGvity of the ON and the OFF nodes.
4) Add edges such that all the ON nodes are included in the
sub-‐tree
General strategy
… … … …
SPmuli
Receptors
Adaptors, Signals 1/5
Trans. Factors
Target/Cytokines …
Analyzing the sub-‐tree of each sGmulus separately
• Start by sGmulus at the root, go down layer by layer to form the corresponding sub-‐tree • This divides the nodes into two groups: inside the sub-‐tree (colored) or outside it (grey)
… … … … …
ON/OFF nodes and their relaGon to the sub-‐tree
… … … … …
ON nodes filled with color
• All the ON nodes should be in the subtree • OFF nodes should not be in
the network
Assump;on:
Modify the network such that the above assumpGons become valid
Goal:
Edge removal: VoGng algorithm across sGmuli
… … …
• PosiPve vote for an edge if it connects two ON nodes
• NegaPve vote for an edge if it connects an ON node to an OFF node • ON node’s layer >= OFF node’s layer. e.g. a signaling protein can affect another signaling protein or a transcripPon factor but it cannot affect a receptor.
Final voGng score for an edge: Sum all the posi;ve and nega;ve votes from all 26 s;muli
…
Edge removal strategy
Want to be conservaGve since: • Real gene/protein networks are complex (non-‐linearity, cooperaGvity, Gme delay, etc.), but our voGng algorithm does not address this complexity.
• Provided data is in the form of snapshots, which is probably insensiGve to important Gme dependent effects. e.g. For some sPmuli, a cytokine level does not show a significant change, whereas the expression level for the corresponding encoding gene shows a significant change.
• There are mulGple possible soluGons and there is insufficient informaGon to pick a unique soluGon
Edge removal strategy
Bearing all the above limitaGons in mind, we chose to…
Remove an edge if: -‐ It has only negaPve votes and no posiPve vote from 26 sPmuli
-‐ It is connected on both ends to measured nodes (phosphorylaPon, gene expression or cytokine)
Edge addiGon strategy
Add new edge such that an ON node that does not lie in the sub-‐tree of the s;mulus becomes connected to an ON node that is already included in the sub-‐tree
• There are mulGple soluGons: Can connect the node to any other ON node in the sub-‐tree that is in the same layer or in a higher layer. e.g. can connect a target to a TF, or to a signaling molecule, or to a receptor… • Calculate the mutual informaGon (like previous sub-‐challenges) between ON node outside sub-‐tree and all ON nodes already in the sub-‐tree. Add an edge to connect to the node with the highest mutual informaGon
What we might have done to improve the results
• Compute correlaGon between connected nodes. Use simulated annealing to add/remove nodes to generate alternates
• Use Mutual InformaGon instead of correlaGon in the above and further prune.
• Use LOO on all of the above to esGmate the FP and FN rates.
• Use “Gene Centrality” from Bilal et al, Genes Cancer. 2010 Oct;1(10):1063-‐73.