contributed pap e r doublypenalizedlasso...
TRANSCRIPT
CONTRIBUTEDP A P E R
Doubly Penalized LASSOfor Reconstruction ofBiological NetworksThis paper presents a new method for the reconstruction of dynamic biological
networks and presents two case studies.
By Behrang Asadi, Mano Ram Maurya, Daniel M. Tartakovsky, and
Shankar Subramaniam
ABSTRACT | Reconstruction of biological and biochemical
networks is a crucial step in extracting knowledge and causal
information from large biological data sets. This task is partic-
ularly challenging when dealing with time-series data from
dynamic networks. We have developed a new method, called
doubly penalized least absolute shrinkage and selection oper-
ator (DPLASSO), for the reconstruction of dynamic biological
networks. Our method consists of two components: statistical
significance testing of model coefficients and penalized/
constrained optimization. A partial least squares (PLS) with
statistical significance testing acts as a supervisory-level filter
to extract the most informative components of the network
from a data set. Then, LASSO with extra weights on the
smaller parameters identified in the first layer is employed to
retain the main predictors and to set the smallest coefficients
to zero. We present two case studies to compare the relative
performance of DPLASSO and LASSO in terms of several
metrics, such as sensitivity, specificity, and accuracy. The first
case study employs a synthetic data set; it demonstrates that
DPLASSO substantially improves network reconstruction in
terms of accuracy and specificity. The second case study relies
on simulated data sets for cell division cycle of fission yeast
and shows that DPLASSO overall outperforms LASSO and PLS
in terms of sensitivity, specificity, and accuracy. Combining
PLS with statistical significance testing and LASSO provides
good performance with respect to several metrics. DPLASSO
is well suited for reconstruction of networks with low and
moderate density. For high-density networks, high specificity
is desired, making PLS a more suitable approach.
KEYWORDS | Cell cycle; data-driven network reconstruction;
inference; least absolute shrinkage and selection operator
(LASSO); partial least squares (PLS); predictive dynamic model;
sensitivity; specificity; statistical significance testing
I . INTRODUCTION
Inference of network topology from experimental data is
a central endeavor in modern biology, since detailed
knowledge of the contextual pathways is required for un-
derstanding the mechanisms associated with normal and
disease function [1]. Modern high-throughput techniques
facilitate collection of different types of data related to
signaling, gene regulatory, and metabolic networks. Suchdata sets (e.g., cellular readouts obtained through differ-
ent experimental techniques) have a large dynamic
range, distinct observational scales (e.g., high versus low
intensity in microarray data sets), and accuracy. Integra-
tion of different types of data and information can lead
to remarkable improvements in our ability to identify the
connectivity of different segments of networks and to
Manuscript received September 8, 2015; accepted February 20, 2016. Date ofpublication November 22, 2016; date of current version January 18, 2017. This workwas supported by the National Science Foundation (NSF) under Collaborative GrantsDBI-0835541 and STC-0939370; and by National Institutes of Health (NIH) underCollaborative Grants U54GM69338, R01HL106579, HL087375-02, and R01HL108735.(Behrang Asadi and Mano Ram Maurya are co-first authors.) (Corresponding author:Shankar Subramaniam.)B. Asadi was with the Departments of Bioengineering and Mechanical andAerospace Engineering, University of California San Diego, La Jolla, CA 92093 USA.He is now with Danaher Labs, Santa Clara, CA 95054 USA (e-mail: [email protected]).M. R. Maurya is with the San Diego Supercomputer Center and the Department ofBioengineering, University of California San Diego, La Jolla, CA 92093 USA(e-mail: [email protected]).D. M. Tartakovsky is with the Department of Mechanical Aerospace Engineering,University of California San Diego, La Jolla, CA 92093 USA (e-mail: [email protected]).S. Subramaniam is with the Department of Bioengineering, Departments ofComputer Science and Engineering, Cellular and Molecular Medicine and theGraduate Program in Bioinformatics, University of California San Diego, La Jolla, CA92093 USA (e-mail: [email protected]).
Digital Object Identifier: 10.1109/JPROC.2016.2535282
0018-9219 Ó 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Vol. 105, No. 2, February 2017 | Proceedings of the IEEE 319
provide insights about the cellular system. Several recentstudies have used integrative data analytics to reconstruct
biochemical networks and to build predictive models
from large-scale data sets. New algorithms were devel-
oped to reconstruct parsimonious regulatory networks
[2], networks with a large number of intracellular signal-
ing events [3], signaling network using phosphoproteo-
mic data for cytokine release in RAW 264.7 cells [4] and
for human hepatocellular liver carcinoma HepG2 cells[5]. A recent review of the methods used to reconstruct
biochemical networks and models when the network to-
pology is either known (e.g., metabolic networks) or
(partially) unknown (e.g., gene regulatory networks) can
be found in [6].
Network reconstruction techniques include informa-
tion theoretic approaches such as those based on mutual
information [7], [8] and the notion of Granger causalityimplemented through vector autoregressive models and
its variants [9], [10]; optimization-based approaches, e.g.,
least squares and mixed-integer nonlinear optimization
methods [11]; dimensionality reduction methods, e.g.,
statistical significance tests combined with either princi-
pal components regressions (PCRs), or partial least
squares (PLS) [4], [12]; partial-correlation-related analy-
ses [13]; Bayesian networks [3], [14]; hybrid methods,e.g., linear matrix inequalities (LMIs) [15], [16], least
absolute shrinkage and selection operator (LASSO) [2],
[17] and elastic net [18]; and matrix-based algorithms
[19]–[21].
While most established techniques mentioned above
can reconstruct networks from static data, several recent
approaches are designed to take advantage of the dynamic
nature of networks using the concomitant time-seriesdata. For example, Greenfield et al. [22] developed a
framework based on an ordinary differential equation
to reconstruct biological networks from both static
and dynamic data. Weber et al. [23] proposed a heu-
ristic network inference algorithm (NetGenerator v2.0)
to reconstruct dynamic networks from multiple-stimuli–
multiple-experiment data and a prior knowledge of the
network, and Lo et al. [24] combined linear regressionwith a Bayesian model averaging approach to reconstruct
biological networks from time-series data and prior bio-
logical knowledge.
Since by themselves each algorithm is better suited
for one metric (e.g., accuracy or specificity) but not
simultaneously multiple metrics [25], integration of
distinct features of two or more algorithms is likely
to improve the performance of each algorithm. Forexample, optimization-based methods, such as LASSO,
use L1-penalty-based constraint on the model parame-
ters to retain coefficients corresponding to the most
informative predictors. Regression (least squares, PCR,
or PLS) with statistical significance testing-based ap-
proaches use t-test to identify statistically significant
predictors. One of the findings in our previous
comparison of performance of statistical and optimization-based methods with respect to different measures [25]
shows that reconstructing networks from noisy data sets
using an optimization method (such as LASSO) provides
good sensitivity but low specificity whereas statistical
methods (PCR/PLS with significance testing) provide
good specificity albeit low sensitivity. Since sensitivity
and specificity are two distinct but complementary prop-
erties, our proposed algorithm combines PLS (with statis-tical significance testing) and LASSO to potentially
achieve both good sensitivity and specificity. We present
a new method, called doubly penalized least absolute
shrinkage and selection operator (DPLASSO) to recon-
struct dynamic biological networks. Our approach inte-
grates statistical significance testing and constrained/
penalized optimization. This method employs partial
least squares and statistical significance testing as asupervisory-level filter to extract the most informative
connections of the network from a data set. The lower
level reconstruction layer consists of LASSO with extra
weights assigned to small parameters. These weights are
determined via partial least squares in the first-layer filter
to retain all significant inputs and set the remaining small
coefficients to zero. Comparison criteria are the robust-
ness in identifying a ”true” network and the accuracy ofestimated model coefficients with a moderate amount of
noise in the data.
In Section II, we describe DPLASSO for network re-
construction and introduce metrics that are used to com-
pare the relative performance of LASSO and DPLASSO
methods. In Section III, DPLASSO is used to reconstruct
a synthetic network and evaluate the network’s perfor-
mance relative to the network reconstructed withLASSO. We also use DPLASSO to reconstruct a biologi-
cal network from a data set obtained by simulating the
biological system. Next, a discussion of the salient fea-
tures of DPLASSO and the results is presented, followed
by conclusions and directions for future work.
II . METHODOLOGY
DPLASSO includes two parameter-selection layers.
Layer 1 is a supervisory layer, in which the method of
PLS is used to build a linear model while accounting
for correlations among the inputs. Layer 2 applies
LASSO with extra weights on the less (statistically) sig-
nificant parameters computed in layer 1. Both PLS and
LASSO are briefly introduced below, followed by a
detailed description of DPLASSO.
A. Partial Least SquaresThe objective of PLS is to find combinations of the
predictors that have a large covariance with the response
values. The method projects the input and the output of
a linear model onto a space where a multidimensional di-
rection in the input space accounts for the maximum
320 Proceedings of the IEEE | Vol. 105, No. 2, February 2017
Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks
variance in the multidimensional direction in the outputspace. The PLS algorithm is described below.
Let X ðm� nÞ denote the input data that consist of ninputs sampled at m points and/or conditions, and Y
ðm� pÞ denote the output data (p outputs) of a linear
model Y ¼ XB, B being the model parameters. The goal
of the PLS method is to find a linear decomposition of X
and Y, such that X ¼ TPT þ E and Y ¼ UQT þ F, where
T ðm� rÞ and U ðm� rÞ are input scores and outputscores, respectively; P ðn� rÞ and Q ðp� rÞ are input
loadings and output loadings, respectively; and E ðm�nÞ and F ðm� pÞ are residual matrices. This decomposi-
tion is made with the objective of maximizing the covari-
ance between T and U [26]. Several iterative algorithms
such as PLS1 and SIMPLS [27] have been designed to
compute the score and loading matrices. For the sake of
simplicity, we perform the analysis for each output inde-pendently ðp ¼ 1Þ.
B. LASSOLASSO represents the network reconstruction prob-
lem as a constrained optimization problem
b̂j ¼ argmin kek22 ¼ ðyj � XbjÞTðyj � XbjÞn o
subject to kbjk1 � q bLSj
��� ���1
(1)
where vector yj represents output j, matrix X represents
the input data (Section II-A), and e ¼ yj � Xbj repre-
sents the fit error (residual) vector. k:k refers to ‘1- or‘2-norm. The parameter q ð0 � q � 1Þ controls theamount of ”shrinkage” in the estimation of bj (linear
model parameters for output j) and bLSj is the least
squares solution. The constraint in (1) shrinks the abso-
lute values of the parameters. For certain values of q, thealgorithm shrinks the parameters such that some of them
become exactly zero, thus resulting in a parsimonious
model. The constrained optimization problem (1) is
solved by employing a quadratic programming approach(e.g., an interior-point method is used in this work) [17].
C. DPLASSOAs mentioned earlier, DPLASSO takes advantage of
both PLS (linear regression) with statistical significance
testing and LASSO. The DPLASSO procedure is schemat-
ically shown in Fig. 1 and is described below.
Layer 1 [Fig. 1(a) and (b)]: The PLS method is imple-
mented on a data set with a linear model given by (for
one output at a time, e.g., the jth output), with " being
the residual vector
yj ¼ Xb̂j þ ": (2)
The number of latent variables in PLS can be deter-
mined by either cross validation or by specifying the
fraction of the cumulative variance (FCV) captured (say,
0:8 G FCVk G 0:95). To test whether an element of the
parameter vector b̂ is statistically different from zero, a
two-tailed t-test is performed on the PLS regression coef-ficients. First, the standard deviation of the model pa-
rameters �b is obtained by computing the covariance
matrix of model parameters b̂ [21]
covðb̂Þ ¼ �̂2@b̂
@y
@b̂
@y
!T
: (3)
We used PLSDOF package in R [28] to compute �b.
Second, for the ith input and the jth output with k latent
vectors, the ratio rij;k ¼ jbij;kj=�b;ij;k and its average �rijover k such that 0:8 G FCVk G 0:95 are computed. Third,
a significance coefficient sij is assigned to the ith input
with respect to the jth output as follows. Since all the
calculations are performed for one output at a time, we
suppress superscript j for the sake of simplicity.— We define a significance index � as
�i ¼ �ri � tinvð1� �=2; vÞ, where v ¼ DOFPLS[21] and � ¼ 1� c is selected to achieve a confi-
dence level c ¼ 0:8 (80%).
— An appropriate mapping function, referred to as
a scoring function s or significance score, is
used to transform � into a score. The range of
Fig. 1. Flowchart of the DPLASSO approach. Starting with a full
linear model with all coefficients being nonzero (represented by
a complete graph in (a); each edge corresponds to an element in
the coefficient matrix B [see (1)], the algorithm reduces the
network size in two sequential steps: linear regression (e.g., PLS)
with statistical significance testing (b) and LASSO (c). The
reconstructed network is shown in (d).
Vol. 105, No. 2, February 2017 | Proceedings of the IEEE 321
Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks
the score is [0, 1]. A good candidate function for
this purpose is a logistic mapping function
sð�iÞ ¼ 1
1þ eð���iÞ: (4)
This function ensures that the values of the sig-
nificance score for the PLS-significant coeffi-cients ð�i � 0Þ fall in the range between 0.5
and 1 (Fig. 2, continuous curve). The width of
the transition between 0 and 1 is controlled
by the parameter � (larger the value of �,smaller the width of transition; after explora-
tion of variation in �, we used � ¼ 1).
Layer 2 [Fig. 1(c)]: For the chosen output, the signifi-cant coefficients identified by layer 1 are used to modify
the uniform weights in the LASSO module. LASSO is re-
formulated as (subscript j is omitted for simplicity)
b̂ ¼ argmin kek22 ¼ ðy � XbÞTðy � XbÞn o
subject toX
i¼1;...;nwijbij � q
Xi¼1;...;n
wi bLSi
�� �� (5)
where wi is the weight for the ith input. It is computed
from the significance score sð�iÞ as
wi ¼ 1� sð�iÞ ¼ sð��iÞ: (6)
For significant coefficients (predictors), �i 9 0 whichimplies that 0:5 G sð�iÞ G 1 and 0 G wi G 0:5. For insig-
nificant coefficients, �i G 0 so that 0 G sð�iÞ G 0:5 and
0:5 G wi G 1. � ¼ 0 results in equal weight for all predic-
tors and � ¼ 1 results in 0 weight (no constraint) on
significant predictors. The tuning parameter q [to be de-
termined by cross validation on root-mean-squared error
(RMSE) of the prediction] plays the same role as in the
constrained optimization problem (1). Next, the modifiedLASSO is used to set the small coefficients in the matrix
B to zero (which is equivalent to deleting the correspond-
ing edges in the network) while retaining the significant
predictors. The final model is composed of the inputs
that are retained (i.e., are nonzero) at layer 2 [Fig. 1(d)].
The DPLASSO approach is applicable to both static
and dynamic systems. For static systems, X and Y corre-
spond to input and output data, respectively. For dy-namic systems, Y is constructed from the data on the
time derivatives of the state variables [see (10)].
D. Performance MetricsWe use two metrics to evaluate the performance of
DPLASSO. Each metric assesses different aspect of thereconstruction performance.
The first metric is an aggregate of several measures
of model evaluation (i.e., accuracy, sensitivity, and speci-
ficity) that rely on the definition of positive and negative
occurrences in the reconstructed network (with respect
to one output at the time). These are defined as
accuracy ¼ TNþ TP
n
sensitivity ¼ TP
TPþ FN
specificity ¼ TN
TNþ FP
lift ¼ TP
TPþ FN
n
TPþ FP(7)
where TP, TN, FP, and FN denote the number of true
positives, true negatives, false positives, and false nega-tives, respectively. n ¼ TPþ FPþ TNþ FN is the num-
ber of inputs.
The second metric is RMSE of the prediction. RMSE
is computed as
RMSE ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1
m
Xmi¼1ðyi � yi;pÞ2
s(8)
where yi and yi;p denote the actual value and the pre-
dicted value, respectively, of the ith measurement of the
chosen output variable.
Fig. 2. Variation of scoring and weighting functions with respect
to significance index variable �. The plot shows that smaller
weights are assigned to more significant coefficients.
322 Proceedings of the IEEE | Vol. 105, No. 2, February 2017
Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks
III . RESULTS
A. Case Study I: Synthetic DataWe apply DPLASSO to analyze a synthetic linear sys-
tem. The relative performance of DPLASSO is evaluated
with respect to the metrics defined in (7) and (8).
1) Generation of Synthetic Data: We generate a well-
sampled (in the sense of Nyquist–Shannon) syntheticdata set for a linear stable dynamic system
dx
dt¼ Axþ e: (9)
The state vector x consists of n ¼ 9 state variables, A is
the system/coefficient matrix, and e is Gaussian noisevector with zero mean and equal standard deviation
(0.01) for all state variables.
The system matrix A is generated as follows.
1) An n � n random matrix Atemp is generated.
2) A predefined percentage of the coefficients (ele-
ments in system matrix Atemp) are set to zero
from off-diagonal elements of matrix Atemp to
represent negatives in the synthesized network.3) The largest negative eigenvalue � of Atemp is var-
ied to generate stable systems A with different
time scales. For instance, to move the location
of the largest pole of an input-free system to
zero, one may set A ¼ Atemp � c� In�n where cis the largest eigenvalue of the original system
Atemp and I is the identity matrix.
Since the synthetic data set represents an input-freedynamic system, a set of initial conditions for the system
states is required. For the system to be stable around the
origin ðxss ¼ �0Þ, the initial state xðt ¼ 0Þ is chosen to be
inside a small region around origin. Initial conditions are
chosen from a Gaussian distribution excluding 0, e.g.,
N ð0; 20Þ � f0g, to prevent the presence of stationary
states.
For the network reconstruction, given the time seriesdata xðtÞ, we discretize (9) as
xjtk � xjtk�1
�t¼ Axjtk�1 þ e; k ¼ 1; 2; 3; . . . ;m (10)
where m is the last time-index used. e represents noise.Since matrix A does not change with time, we transpose
(10) and then stack the resulting rows on both sides to
form yj and X of (2). �t is equal to the Nyquist sample
time 1=ð2�Þ.In order to make our synthetic networks representa-
tive of experimental data sets, we consider several fac-
tors, such as noise variance, granularity, and sparse
network topology. Synthetic data sets, which are gener-ated with a linear stable model whose parameters are
known, facilitate the assessment of the overall accuracy
of DPLASSO. A portion (about a third) of true parame-
ters of the synthetic system is set to 0 to test the ability
of the method to identify them as insignificant. In all of
the Monte Carlo simulations, involving � and �, we use
the following procedure to generate synthetic data sets
and evaluate the performance of the methods.
for �i, i ¼ 1; 2; 3; . . .
for �j, j ¼ 1; 2; 3; . . .
for k ¼ 1; 2; . . . ; 10 (Monte Carlo runs)
½y;X� generate data set
Apply the chosen method on ½y;X� toreconstruct the network.
Evaluate the metrics.
end
Compute the average of the performancemetrics of 10 Monte Carlo runs
end
end
2) Simulation Results:Estimation of the LASSO parameter q: An optimal
value of the DPLASSO parameters is identified by a grid
search with respect to the confidence level c in signif-
icance testing and the LASSO parameter q (cross vali-
dation). This procedure yields 0:8 G FCV G 0:95,confidence level of 0.8 in the PLS (layer 1), and the
LASSO tuning parameter q ¼ 0:2 (layer 2). Synthetic
data sets were also generated from a zero-input response
of a stable linear system with 9 state variables (nonzero
initial states) and 100 time points (with fixed sampling
time of 0.1 s and settling time range of 8 s G ts G 15 s).
The noise level in (9) was set to 0.1 [�10% on an average
over a time span from t ¼ 0 until 3=�min (approximatesettling time)]. Network reconstruction with LASSO was
used as the baseline against which the performance of
both DPLASSO and PLS with significance testing was
compared.
The shrinkage parameter q was estimated using ten-
fold cross validation as follows. The magnitude of q is in-creased from 0 to 1 in steps. For each value of q, thetenfold cross-validation error is computed. In each fold,90% of the data are used for training and 10% of the
data are used for prediction. This procedure is repeated
ten times to cover all the data points. The mean of the
RMSE for the tenfold and the variability (standard error
of mean) across the tenfold are shown in Fig. 3. Next,
the minimum error is recorded and the optimal value of
q is chosen so that the corresponding prediction RMSE
Vol. 105, No. 2, February 2017 | Proceedings of the IEEE 323
Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks
is no more than the minimum error plus the standard
error of the mean. In Fig. 3, the optimal q value isbetween 0.2 and 0.3, which is close to the actual frac-
tional connectivity/density of the network, 20%.
Performance comparison with respect to network den-sity: As the size of a biological network increases, its den-
sity (fractional connectivity) generally decreases [29].
We compare the performance of DPLASSO against that
of both linear regression with significance testing (la-
beled as PLS) and LASSO for networks with different
densities. The comparison is carried out in terms of accu-
racy, sensitivity, and specificity. Figs. 4–6 demonstratethat DPLASSO provides a good compromise between
LASSO and PLS for different network densities. In Fig. 4,
the accuracy of the three methods is plotted as a function
of the time-scale parameter � and the DPLASSO parame-
ter �. Since LASSO and PLS do not involve the parameter
�, the variation along the �-axis is solely due to a small
number of Monte Carlo runs [see (10)]. Fig. 4 reveals
Fig. 4. Accuracy of DPLASSO, LASSO, and PLS for different
network densities. The quantities plotted are the averages over
ten Monte Carlo runs. In each run, the network has 20 nodes and
5% noise is considered in the state–space model of (9). The four
panels correspond to different network densities.
Fig. 5. Sensitivity of DPLASSO, LASSO, and PLS methods for
different network densities. The quantities plotted are the
averages over ten Monte Carlo runs. In each run, the network has
20 nodes and 5% noise is considered in the state–space model of
(9). The four panels correspond to different network densities.
Fig. 6. Specificity of DPLASSO, LASSO, and PLS methods for
different network densities. The quantities plotted are the
averages over ten Monte Carlo runs. In each run, the network has
20 nodes and 5% noise is considered in the state–space model of
(9). The four panels correspond to different network densities.
Fig. 3. RMSE in validation versus shrinkage parameter q for a
network with 20% connection density and node size 9 (for
tenfold cross validation). RMSE was normalized by the standard
deviation of the noise (0.1). Plot shows that when q is between
0.2 and 0.3, the RMSE is within the standard deviation of the
minimum RMSE.
324 Proceedings of the IEEE | Vol. 105, No. 2, February 2017
Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks
that the accuracy of DPLASSO is between those of PLS(best) and LASSO (worst), but closer to that of PLS. This
behavior is consistent across network densities ranging
between 5% [Fig. 4(d)] and 50% [Fig. 4(a)]. The perfor-
mance of both DPLASSO and PLS is more stable than
that of LASSO as the network density is increased. For
example, the accuracy of LASSO increases from about 0.2
to about 0.5 when the network density is increased from
0.2 to 0.5, whereas the accuracy of PLS and DPLASSOstays in the range of 0.4–0.6. This shows that DPLASSO
retains the robustness feature of the PLS approach (re-
gression with significance testing).
Fig. 5 shows the sensitivity of the three methods as a
function of � and �. The performance of DPLASSO is
again a compromise between that of PLS (worst) and
LASSO (best). As the network density (fractional connec-
tivity) increases from 5% to 50%, the sensitivity ofDPLASSO becomes more similar to that for PLS, espe-
cially at lower values of �. This means that at low den-
sity (representative of biological networks) DPLASSO
retains the high sensitivity of LASSO. DPLASSO’s sensi-
tivity degrades slowly as network density increases. Fi-
nally, LASSO provides the best sensitivity for both low
and high network densities and is stable around 0.8.
Fig. 6 exhibits the specificity of the three methods asa function of � and �. Similar to the sensitivity and accu-
racy, the specificity of DPLASSO falls between that of
LASSO (the worst specificity) and PLS (the best specific-
ity). The specificity of LASSO is especially poor for large
� (faster systems) and for low-density (5%–10%) net-
works. The specificity of DPLASSO is closer to that of
PLS at various network densities.
Finally, we compare the predictive capability ofDPLASSO, LASSO, and PLS in terms of their prediction
RMSE. Table 1 lists RMSE in addition to accuracy, sensi-
tivity, specificity, and lift for DPLASSO, LASSO, and
PLS. The ratio std (RMSE)/mean (RMSE) serves as a
measure of the robustness of the RMSE. All the data re-
ported in Table 1 represent averages over ten indepen-
dent (Monte Carlo) realizations, each of which is
generated with a new coefficient matrix A but the samefraction of the nonzero coefficients (66%). Table 1 shows
that, overall, DPLASSO outperforms both LASSO and
PLS if all four performance metrics are considered im-portant. Another observation is that the performance of
DPLASSO is closer to that of PLS than LASSO. This is
because during the LASSO stage, the statistically signifi-
cant predictors (at PLS stage) are restricted to a lesser
degree ð0 G wi G 0:5Þ as compared to the statistically in-
significant predictors ð0:5 G wi G 1Þ.
B. Case Study II: A Dynamic Biological NetworkThe results presented above demonstrate the ability
of DPLASSO to reconstruct dynamic networks with a lin-
ear input-to-output mapping. In this section, we useDPLASSO to reconstruct a real biological network from
data generated from a well-studied nonlinear model of a
cell division cycle of fission yeast [30]. This model en-
ables us to run virtual experiments and to evaluate the
performance of different methods in a realistic nonlinear
setting. We generated the data set by numerically inte-
grating the system of nine differential and seven alge-
braic equations given in [30]. The noise-free system ofdifferential equations is solved for the time span of
130 s, with a fixed sampling time of 1 s.
The results of implementing DPLASSO are highly sen-
sitive to the time span (phase of the cell cycle) on which a
model is run [31]. Accordingly, we chose a slice of the
temporal response of the system corresponding to S/G2
phases of the cell division cycle (20 s G t G 120 s) wherein
most of the states are excited. The structure of the dynam-ical system for network reconstruction is identical to (9).
These equations were solved subject to the initial condi-
tions ½0:59; 0:02; 0:59; 2:12; 2:12; 0:92; 0:04; 0:53; 1�T . Theparameters of DPLASSO were set to achieve 0.8 signifi-
cance level in the PLS layer, and the amount of shrinkage
in the LASSO layer was chosen to be 0.8 (by cross valida-
tion). The true and false connections identified by
DPLASSO are listed in Table 2. We correctly identify sev-eral important connections such as M! M and
M! Cdc13. Our method was not able to identify the con-
nection Cdc13! Cdc13 which an LMI-based method was
able to identify [31].
The performances of DPLASSO, LASSO, and PLS are
reported in Table 3 in terms of accuracy, sensitivity,
specificity, lift, and normalized variation in RMSE. The
Table 1 Performance of the Three Methods on Synthetic Data. Results Are Based on Average of Ten Runs of Networks With Size 20, and Represent Aver-
age Values over � 2 ½0; 1:5� and � 2 ½�4;0�. Abbreviation: SD, standard deviation
Vol. 105, No. 2, February 2017 | Proceedings of the IEEE 325
Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks
model’s nonlinearity, as opposed to the linear model
given by (2) or (9), precludes the possibility of direct
comparison of the fractional error in estimating the
model coefficients. Table 3 demonstrates that the perfor-mance of DPLASSO is qualitatively similar to that ob-
served for the synthetic data set (Case Study I).
DPLASSO is 17% more efficient and 33% more accurate
than LASSO and has the same SD RMSE/Mean RMSE
ratio as PLS but lower (by 12%) accuracy. DPLASSO has
about the same sensitivity as LASSO but outperforms
PLS by 12%. Finally, DPLASSO shows considerably bet-
ter specificity than either LASSO or PLS (Table 3). Thelow specificity in the reconstruction of networks by all
three methods arises due to the low number of missing
connections identified (true negatives) in the network.
Fig. 7 provides a graphical representation of the yeast
cell division cycle network reconstructed with DPLASSO,
LASSO, and PLS. The true network, inferred from the
structure of the model equations, is shown in Fig. 7(a).
DPLASSO identified most of the true connections, witha relatively small number of false connections (e.g., non-
existent connection from Cdc13 T to Rum1 T). However,DPLASSO also missed some true connections, e.g., the
stimulation of Slp1 by preMPF. LASSO and PLS identi-
fied most of the true connections, but both have a rela-
tively large number of false positives.
IV. DISCUSSION
DPLASSO provides an attractive compromise between
LASSO and PLS if all the commonly used performance
metrics are considered as equally important. Below, we
analyze the three methods further in order to identify a
method most suited to a particular goal and to improvethe performance of DPLASSO.
A. Selection of a Methodology Based on theDesired Goals of Reconstruction
Table 1 suggests that the selection of an efficient net-
work reconstruction method depends on the desired
Table 2 Connections in the Cell Division Network Reconstructed With DPLASSO. True Positives Are Denoted by TP, and False Positives by FP. An Edge Is
Denoted by Link From a Node in the Row to a Node in the Column
Table 3 Performance of the Three Network Reconstruction Methods for
Cell Division Cycle Averaged Over �
Fig. 7. True network (a) and its reconstructions with DPLASSO
(b), LASSO (c) and PLS (d). The performance metrics are listed in
Table 3.
326 Proceedings of the IEEE | Vol. 105, No. 2, February 2017
Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks
outcome (e.g., better sensitivity or better specificity) andthe properties of the network. Sensitivity represents a
method’s ability to detect true positives in the network,
and specificity provides an indicator of a method’s ability
to identify true negatives (absence of connections).
Table 1 suggests that LASSO is more effective in finding
the true positives (has high sensitivity), while PLS is
more reliable in identification of the true negatives (has
high specificity). DPLASSO provides a compromise be-tween these two methods, which is attractive either in
the absence of sufficient information about the network
density or when both criteria (sensitivity and specificity)
are equally important. DPLASSO also ensures robust pre-
diction of the behavior of the system.
B. Adjustment of � Based on Receiver OperatingCharacteristic (ROC) Curve
Performance of any network reconstruction method
depends on a system’s dynamics (e.g., on the location ofa dominant pole in a linear system). We generated ROC
curves, which are a graphical illustration of the perfor-
mance of a classifier (in this case identification of the ex-
istence of a link in a network) with respect to variation
of an algorithmic parameter (in this case, �).The ROC curves for the three methods are shown in
Fig. 8. The left panel represents the ROC curves gener-
ated by varying � (averaged over several values of �). Asmaller marker size indicates a faster system (larger ab-
solute value of �). The performance of LASSO is the
worst overall, whereas the performance of DPLASSO and
PLS is comparable. As � increases (the system becomes
faster), the specificity of PLS increases substantially
more than decrease in its sensitivity. A comparable in-
crease in � reduces the specificity of DPLASSO at a
slower rate than the gain in sensitivity. LASSO is rela-tively insensitive to changes in �.
Fig. 8(b) exhibits the ROC curve for varying � (aver-
aged over several values of �). Changes in � do not affect
the ROC curves for PLS and LASSO, which is to be ex-
pected since the parameter � is not used in these two
methods. As � increases, the specificity of DPLASSO de-
creases, while its sensitivity increases. These results im-
ply that if one can identify the fastest time scale of thesystem (e.g., by using time series analysis), the parame-
ter � can be tuned to achieve a better performance. In
other words, if a system’s dynamics is identifiable, one
can identify a value of � that enables DPLASSO to out-
perform PLS and LASSO in terms of sensitivity and spec-
ificity, without degrading other metrics significantly.
C. Role of the Network Structure in the Selectionof a Method
The specificity of DPLASSO and LASSO is lower than
that of PLS for both the synthetic (Table 1) and simu-
lated biological (Table 3) data sets. This behavior is due
to topological properties of the networks and the recon-
struction objective function. Specifically, we optimized
the shrinkage factor q by minimizing cross validation
RMSE in order to achieve a higher accuracy. These pa-rameters (including significance level, shrinkage factor,
and level of variance captured in addition to parameters
� and �) can be adjusted to achieve higher scores in
terms of another metric, such as sensitivity, specificity,
and RMSE. Selection of the desired metric to be opti-
mized can vary depending upon the structure of the net-
work. In practice, to reconstruct sparse networks, it is
desirable to optimize the sensitivity metric whereas fornetworks with moderate density, accuracy is relatively
more important. For example, in cell cycle, the number
of true connections is 22 against 45 (9 � 10/2) possible
connections. This makes it a moderately dense network.
Hence, accuracy could serve as an appropriate measure
to evaluate the reconstruction performance. In other
words, since sensitivity and accuracy are more important
for networks with low and moderate density, respec-tively, DPLASSO is suitable for them despite its low
specificity. For high-density networks, PLS is likely to be
a better approach due to its higher specificity.
V. SUMMARY AND CONCLUSION
We have developed a new method, DPLASSO, to recon-
struct static and dynamic biological networks. The pri-mary basis of DPLASSO is the integration of statistical
significance testing and L1-penalty/optimization ap-
proaches to achieve a good performance with respect to
several metrics simultaneously. DPLASSO uses two
layers of analysis. Layer 1 employs linear regression (PLS
in this case) with statistical significance testing as a
supervisory-level filter to extract the statistically
Fig. 8. Performance of DPLASSO, LASSO, and PLS on ROC curve
(for network density of 20% positives and size 50 with ten
Monte Carlo runs). (a) Comparison of DPLASSO with LASSO and
PLS for different � in the range [�2, 0]. The results are averaged
over several values of the � parameter in the range [0, 2].
(b) Effect of change in value of � on performance of the three
methods (averaged over several � in the range [�2, 0]). In (a), a
smaller marker size indicates a faster system (smaller time
constant or larger absolute value of �).
Vol. 105, No. 2, February 2017 | Proceedings of the IEEE 327
Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks
significant components of a network. Layer 2 utilizesLASSO with extra weights on the statistically less signifi-
cant parameters obtained in layer 1 to retain the main
predictors and set the remaining small coefficients to
zero. Layers 1 and 2 are connected via a logistic function
with a tuning parameter �. We applied this approach to
two case studies. In the first case study, a set of linear
synthetic networks was created to evaluate the relative
performance of DPLASSO, PLS with statistical signifi-cance testing, and LASSO. Our simulation results show
that DPLASSO provides good compromise between
LASSO and PLS for different network densities and sizes.
In the second study, we applied DPLASSO to simulated
data sets for cell division cycle of fission yeast. The re-
sults of evaluation of the performance of DPLASSO quali-tatively confirm the findings for the synthetic system.
The analysis on ROC curve showed that performance of
DPLASSO can be improved as compared to that for PLS
and LASSO by tuning the parameters � and q.From the results, we conclude that combining PLS
(regression) with statistical significance testing and
LASSO (optimization with L1-penalty) provides good
performance with respect to several metrics. We alsoconclude that DPLASSO is well suited for reconstruc-
tion of networks with low and moderate density. For
high-density networks, higher specificity (high true
negatives and low false positives) is desired, making
PLS a more suitable approach. h
REFERENCES
[1] M. R. Maurya and S. Subramaniam,“Computational challenges in systemsbiology,” in Systems Biomedicine: Conceptsand Perspectives, E. Liu andD. Lauffenburger, Eds., 1st ed. San Diego,CA, USA: Academic, 2009, pp. 177–223.
[2] R. Bonneau et al., “The Inferelator: Analgorithm for learning parsimoniousregulatory networks from systems-biologydata sets de novo,” Genome Biol., vol. 7,p. R36, 2006, doi: 10.1186/gb-2006-7-5-r36.
[3] K. A. Janes et al., “Systems model ofsignaling identifies a molecular basis set forcytokine-induced apoptosis,” Science,vol. 310, pp. 1646–1653, Dec. 9 2005.
[4] S. Pradervand, M. R. Maurya, andS. Subramaniam, “Identification of signalingcomponents required for the prediction ofcytokine release in RAW 264.7macrophages,” Genome Biol., vol. 7, p. R11,2006, doi: 10.1186/gb-2006-7-2-r11.
[5] R. J. Prill, J. Saez-Rodriguez,L. G. Alexopoulos, P. K. Sorger, andG. Stolovitzky, “Crowdsourcing networkinference: The DREAM predictive signalingnetwork challenge,” Science Signal., vol. 4,p. mr7, Sep. 6, 2011, doi: 10.1126/scisignal.2002212.
[6] I. C. Chou and E. O. Voit, “Recentdevelopments in parameter estimation andstructure identification of biochemical andgenomic systems,” Math Biosci., vol. 219,pp. 57–83, Jun. 2009.
[7] F. Farhangmehr, M. R. Maurya,D. M. Tartakovsky, and S. Subramaniam,“Information theoretic approach to complexbiological network reconstruction:Application to cytokine release in RAW264.7 macrophages,” BMC Syst. Biol., vol. 8,p. 77, 2014, doi: 10.1186/1752-0509-8-77.
[8] A. A. Margolin et al., “ARACNE: Analgorithm for the reconstruction of generegulatory networks in a mammaliancellular context,” BMC Bioinf., vol. 7,p. S7, 2006, doi: 10.1186/1471-2105-7-S1-S7.
[9] C. W. J. Granger, “Investigating causalrelations by econometric models andcross-spectral methods,” Econometrica,vol. 37, p. 424–438, 1969.
[10] M. Masnadi-Shirazi, M. R. Maurya, andS. Subramaniam, “Time-varying causalinference from phosphoproteomicmeasurements in macrophage cells,” IEEE
Trans. Biomed. Circuits Syst., vol. 8,pp. 74–86, Feb. 2014.
[11] M. R. Maurya, S. J. Bornheimer,V. Venkatasubramanian, and S.Subramaniam, “Mixed-integer nonlinearoptimisation approach to coarse-grainingbiochemical networks,” IET—Syst. Biol.,vol. 3, pp. 24–39, 2009.
[12] S. Gupta, M. R. Maurya, andS. Subramaniam, “Identification of crosstalkbetween phosphoprotein signalingpathways in RAW 264.7 macrophage cells,”PLoS Comput. Biol., vol. 6, 2010,Art. no. e1000654, doi: 10.1371/journal.pcbi.1000654.
[13] D. Camacho, P. Vera Licona, P. Mendes,and R. Laubenbacher, “Comparison ofreverse-engineering methods using anin silico network,” Ann. NY Acad. Sci.,vol. 1115, pp. 73–89, Dec. 2007.
[14] K. Sachs, O. Perez, D. Pe’er,D. A. Lauffenburger, and G. P. Nolan,“Causal protein-signaling networks derivedfrom multiparameter single-cell data,”Science, vol. 308, pp. 523–529,Aug. 19, 2005.
[15] C. Cosentino et al., “Linear matrixinequalities approach to reconstruction ofbiological networks,” IET Syst. Biol., vol. 1,pp. 164–173, May 2007.
[16] F. Montefusco, C. Cosentino, andF. Amato, “CORE-Net: Exploiting priorknowledge and preferential attachment toinfer biological interaction networks,” IETSyst. Biol., vol. 4, pp. 296–310, 2010.
[17] R. Tibshirani, “Regression shrinkage andselection via the LASSO,” J. Roy. Stat. Soc.B, Methodol., vol. 58, pp. 267–288, 1996.
[18] H. Zou and T. Hastie, “Regularization andvariable selection via the elastic net,” J. Roy.Stat. Soc. B, Stat. Methodol., vol. 67,pp. 301–320, 2005.
[19] I. Famili, R. Mahadevan, andB. O. Palsson, “k-cone analysis: Determiningall candidate values for kinetic parameterson a network scale,” Biophys. J., vol. 88,pp. 1616–1625, Mar. 2005.
[20] A. V. Karnaukhov, E. V. Karnaukhova, andJ. R. Williamson, “Numerical matricesmethod for nonlinear system identificationand description of dynamics of biochemicalreaction networks,” Biophys. J., vol. 92,pp. 3459–3473, May 10, 2007.
[21] N. Kramer and M. Sugiyama, “The degreesof freedom of partial least squares
regression,” J. Amer. Stat. Assoc., vol. 106,pp. 697–705, Jun. 2011.
[22] A. Greenfield, A. Madar, H. Ostrer, andR. Bonneau, “DREAM4: Combining geneticand dynamic information to identifybiological networks and dynamical models,”Plos One, vol. 5, 2010, Art. no. e13397,doi: 10.1371/journal.pone.0013397.
[23] M. Weber et al., “Inference of dynamicalgene-regulatory networks based ontime-resolved multi-stimuli multi-experiment data applying NetGeneratorV2.0,” BMC Syst. Biol., vol. 7, p. 1,Jan. 2, 2013, doi:10.1186/1752-0509-7-1.
[24] K. Lo et al., “Integrating external biologicalknowledge in the construction of regulatorynetworks from time-series expression data,”BMC Syst. Biol., vol. 6, p. 101, 2012,doi: 10.1186/1752-0509-6-101.
[25] B. Asadi, M. R. Maurya,D. M. Tartakovsky, and S. Subramaniam,“Comparison of statistical and optimisation-based methods for data-driven networkreconstruction of biochemical systems,” IETSyst. Biol., vol. 6, pp. 155–163, Oct. 2012.
[26] S. Maitra and J. Yan, “Principle componentanalysis and partial least squares: Twodimension reduction techniques forregression,” Discussion Paper Program,Casualty Actuarial Society, pp. 79–90, 2008.
[27] S. Dejong, “SIMPLS—An alternativeapproach to partial least-squares regression,”Chemometrics Intell. Lab. Syst., vol. 18,pp. 251–263, Mar. 1993.
[28] N. Kraemer and M. Braun, “Degrees offreedom and statistical inference for partialleast squares regression (0.2–2 ed.),”[R package], 2011. [Online]. Available:http://www.inside-r.org/node/69413
[29] E. Ravasz, A. L. Somera, D. A. Mongru,Z. N. Oltvai, and A. L. Barbasi,“Hierarchical organization of modularity inmetabolic networks,” Science, vol. 297,pp. 1551–1555, Aug. 30 2002.
[30] B. Novak, Z. Pataki, A. Ciliberto, andJ. J. Tyson, “Mathematical model of the celldivision cycle of fission yeast,” Chaos,vol. 11, pp. 277–286, Mar. 2001.
[31] C. Cosentino, W. Curatola, M. Bansal,D. di Bernardo, and F. Amato, “Piecewiseaffine approach to inferring cell cycleregulatory network in fission yeast,” Biomed.Signal Process. Control, vol. 2, pp. 208–216,Jul. 2007.
328 Proceedings of the IEEE | Vol. 105, No. 2, February 2017
Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks
ABOUT THE AUTHORS
Behrang Asadi received the B.S. degree in me-
chanical engineering from Sharif University of
Technology, Tehran, Iran, the M.S. degree in con-
trol systems from Clemson University, Clemson,
SC, USA, and the Ph.D. degree from the Univer-
sity of California San Diego, La Jolla, CA, USA.
He is currently Head of Advanced Analytics at
Danaher Labs, Santa Clara, CA, USA. He has held
several leadership positions in data science in fi-
nancial services, internet, and consulting firms.
His research interest is focused on application of machine learning in
ad hoc disruptive computing challenges.
Mano Ram Maurya received the B.Tech. degree
in chemical engineering from Indian Institute of
Technology Bombay, Bombay, India, in 1998, the
M.E. degree in chemical engineering from City
College of New York, New York, NY, USA, in
1999, and the Ph.D. degree in chemical engineer-
ing from Purdue University, West Lafayette, IN,
USA, in 2003.
From 2003 to 2006, he was a Postdoctoral
Researcher in the Department of Bioengineering
and the San Diego Supercomputer Center at the University of California
San Diego (UCSD), La Jolla, CA, USA. Then, he worked in the Depart-
ment of Bioengineering as an Assistant Scientist from October 2006 to
November 2010. Since then, he has been a Research Scientist at the
San Diego Supercomputer Center and the Department of Bioengineer-
ing at UCSD. He is the author of about 40 papers in peer-reviewed jour-
nals and conference proceedings. His current research interests include
the study of complex biochemical processes and pathways using sys-
tems engineering/biology, machine learning, information theory and
bioinformatics approaches, and their applications to biomedicine.
Dr. Maurya received the Best Paper Award in 2005 jointly with
his Ph.D. advisor Dr. Venkat Venkatasubramanian and coadvisor
Dr. Raghunathan Rengaswamy for their paper published in the Journal
of Engineering Applications of Artificial Intelligence in 2004. In August
2011, he joined the Editorial Board of ISRN Biophysics.
Daniel M. Tartakovsky received the M.S. degree
from Kazan State University, Kazan, Russia, in
1991 and the Ph.D. degree from the University of
Arizona, Tucson, AZ, USA, in 1996.
From 1998 to 2005, he was a Technical Staff
Member at Los Alamos National Laboratory, Los
Alamos, NM, USA, first, in the Scientific Comput-
ing Group (CCS-3), and then, in the Mathematical
Modeling and Analysis Group (T-7). In 2005, he
moved to the Mechanical and Aerospace Engi-
neering Department, University of California San Diego (UCSD), La Jolla,
CA, USA, where he is currently a Professor. He has published over
150 articles. His research interests include uncertainty quantification,
stochastic partial differential equations, inverse modeling, and mathe-
matical biology.
Prof. Tartakovsky is on the editorial board of several journals.
Shankar Subramaniam received the B.S. and
M.S. degrees from Osmania University, Hyderabad,
India and the Ph.D. degree in chemistry from
Indian Institute of Technology Kanpur, Kanpur,
India, in 1982.
From 1991 to 1999, he was a Professor of Bio-
physics, Biochemistry, Molecular and Integrative
Physiology, Chemical Engineering and Electrical
and Computer Engineering at the University of
Illinois at Urbana-Champaign, Urbana, IL, USA,
prior to moving to the University of California San Diego (UCSD),
La Jolla, CA, USA. Currently, he is the Joan and Irwin Jacobs Endowed
Chair in Bioengineering and Systems Biology at UCSD and a Professor
of Bioengineering, Bioinformatics and Systems Biology, Cellular and
Molecular Medicine, Computer Science and Engineering, and Nanoengi-
neering. He was the Founding Director of the Bioinformatics and
Systems Biology Program at UCSD. He has published over 250 articles.
Research in his laboratory spans several areas of bioinformatics,
systems biology, and medicine.
Dr. Subramaniam is on the editorial board of several journals.
Vol. 105, No. 2, February 2017 | Proceedings of the IEEE 329
Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks