contributed pap e r doublypenalizedlasso...

CONTRIBUTEDP A P E R

Doubly Penalized LASSOfor Reconstruction ofBiological NetworksThis paper presents a new method for the reconstruction of dynamic biological

networks and presents two case studies.

By Behrang Asadi, Mano Ram Maurya, Daniel M. Tartakovsky, and

Shankar Subramaniam

ABSTRACT | Reconstruction of biological and biochemical

networks is a crucial step in extracting knowledge and causal

information from large biological data sets. This task is partic-

ularly challenging when dealing with time-series data from

dynamic networks. We have developed a new method, called

doubly penalized least absolute shrinkage and selection oper-

ator (DPLASSO), for the reconstruction of dynamic biological

networks. Our method consists of two components: statistical

significance testing of model coefficients and penalized/

constrained optimization. A partial least squares (PLS) with

statistical significance testing acts as a supervisory-level filter

to extract the most informative components of the network

from a data set. Then, LASSO with extra weights on the

smaller parameters identified in the first layer is employed to

retain the main predictors and to set the smallest coefficients

to zero. We present two case studies to compare the relative

performance of DPLASSO and LASSO in terms of several

metrics, such as sensitivity, specificity, and accuracy. The first

case study employs a synthetic data set; it demonstrates that

DPLASSO substantially improves network reconstruction in

terms of accuracy and specificity. The second case study relies

on simulated data sets for cell division cycle of fission yeast

and shows that DPLASSO overall outperforms LASSO and PLS

in terms of sensitivity, specificity, and accuracy. Combining

PLS with statistical significance testing and LASSO provides

good performance with respect to several metrics. DPLASSO

is well suited for reconstruction of networks with low and

moderate density. For high-density networks, high specificity

is desired, making PLS a more suitable approach.

KEYWORDS | Cell cycle; data-driven network reconstruction;

inference; least absolute shrinkage and selection operator

(LASSO); partial least squares (PLS); predictive dynamic model;

sensitivity; specificity; statistical significance testing

I . INTRODUCTION

Inference of network topology from experimental data is

a central endeavor in modern biology, since detailed

knowledge of the contextual pathways is required for un-

derstanding the mechanisms associated with normal and

disease function [1]. Modern high-throughput techniques

facilitate collection of different types of data related to

signaling, gene regulatory, and metabolic networks. Suchdata sets (e.g., cellular readouts obtained through differ-

ent experimental techniques) have a large dynamic

range, distinct observational scales (e.g., high versus low

intensity in microarray data sets), and accuracy. Integra-

tion of different types of data and information can lead

to remarkable improvements in our ability to identify the

connectivity of different segments of networks and to

Manuscript received September 8, 2015; accepted February 20, 2016. Date ofpublication November 22, 2016; date of current version January 18, 2017. This workwas supported by the National Science Foundation (NSF) under Collaborative GrantsDBI-0835541 and STC-0939370; and by National Institutes of Health (NIH) underCollaborative Grants U54GM69338, R01HL106579, HL087375-02, and R01HL108735.(Behrang Asadi and Mano Ram Maurya are co-first authors.) (Corresponding author:Shankar Subramaniam.)B. Asadi was with the Departments of Bioengineering and Mechanical andAerospace Engineering, University of California San Diego, La Jolla, CA 92093 USA.He is now with Danaher Labs, Santa Clara, CA 95054 USA (e-mail: [email protected]).M. R. Maurya is with the San Diego Supercomputer Center and the Department ofBioengineering, University of California San Diego, La Jolla, CA 92093 USA(e-mail: [email protected]).D. M. Tartakovsky is with the Department of Mechanical Aerospace Engineering,University of California San Diego, La Jolla, CA 92093 USA (e-mail: [email protected]).S. Subramaniam is with the Department of Bioengineering, Departments ofComputer Science and Engineering, Cellular and Molecular Medicine and theGraduate Program in Bioinformatics, University of California San Diego, La Jolla, CA92093 USA (e-mail: [email protected]).

Digital Object Identifier: 10.1109/JPROC.2016.2535282

0018-9219 Ó 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Vol. 105, No. 2, February 2017 | Proceedings of the IEEE 319

provide insights about the cellular system. Several recentstudies have used integrative data analytics to reconstruct

biochemical networks and to build predictive models

from large-scale data sets. New algorithms were devel-

oped to reconstruct parsimonious regulatory networks

[2], networks with a large number of intracellular signal-

ing events [3], signaling network using phosphoproteo-

mic data for cytokine release in RAW 264.7 cells [4] and

for human hepatocellular liver carcinoma HepG2 cells[5]. A recent review of the methods used to reconstruct

biochemical networks and models when the network to-

pology is either known (e.g., metabolic networks) or

(partially) unknown (e.g., gene regulatory networks) can

be found in [6].

Network reconstruction techniques include informa-

tion theoretic approaches such as those based on mutual

information [7], [8] and the notion of Granger causalityimplemented through vector autoregressive models and

its variants [9], [10]; optimization-based approaches, e.g.,

least squares and mixed-integer nonlinear optimization

methods [11]; dimensionality reduction methods, e.g.,

statistical significance tests combined with either princi-

pal components regressions (PCRs), or partial least

squares (PLS) [4], [12]; partial-correlation-related analy-

ses [13]; Bayesian networks [3], [14]; hybrid methods,e.g., linear matrix inequalities (LMIs) [15], [16], least

absolute shrinkage and selection operator (LASSO) [2],

[17] and elastic net [18]; and matrix-based algorithms

[19]–[21].

While most established techniques mentioned above

can reconstruct networks from static data, several recent

approaches are designed to take advantage of the dynamic

nature of networks using the concomitant time-seriesdata. For example, Greenfield et al. [22] developed a

framework based on an ordinary differential equation

to reconstruct biological networks from both static

and dynamic data. Weber et al. [23] proposed a heu-

ristic network inference algorithm (NetGenerator v2.0)

to reconstruct dynamic networks from multiple-stimuli–

multiple-experiment data and a prior knowledge of the

network, and Lo et al. [24] combined linear regressionwith a Bayesian model averaging approach to reconstruct

biological networks from time-series data and prior bio-

logical knowledge.

Since by themselves each algorithm is better suited

for one metric (e.g., accuracy or specificity) but not

simultaneously multiple metrics [25], integration of

distinct features of two or more algorithms is likely

to improve the performance of each algorithm. Forexample, optimization-based methods, such as LASSO,

use L1-penalty-based constraint on the model parame-

ters to retain coefficients corresponding to the most

informative predictors. Regression (least squares, PCR,

or PLS) with statistical significance testing-based ap-

proaches use t-test to identify statistically significant

predictors. One of the findings in our previous

comparison of performance of statistical and optimization-based methods with respect to different measures [25]

shows that reconstructing networks from noisy data sets

using an optimization method (such as LASSO) provides

good sensitivity but low specificity whereas statistical

methods (PCR/PLS with significance testing) provide

good specificity albeit low sensitivity. Since sensitivity

and specificity are two distinct but complementary prop-

erties, our proposed algorithm combines PLS (with statis-tical significance testing) and LASSO to potentially

achieve both good sensitivity and specificity. We present

a new method, called doubly penalized least absolute

shrinkage and selection operator (DPLASSO) to recon-

struct dynamic biological networks. Our approach inte-

grates statistical significance testing and constrained/

penalized optimization. This method employs partial

least squares and statistical significance testing as asupervisory-level filter to extract the most informative

connections of the network from a data set. The lower

level reconstruction layer consists of LASSO with extra

weights assigned to small parameters. These weights are

determined via partial least squares in the first-layer filter

to retain all significant inputs and set the remaining small

coefficients to zero. Comparison criteria are the robust-

ness in identifying a ”true” network and the accuracy ofestimated model coefficients with a moderate amount of

noise in the data.

In Section II, we describe DPLASSO for network re-

construction and introduce metrics that are used to com-

pare the relative performance of LASSO and DPLASSO

methods. In Section III, DPLASSO is used to reconstruct

a synthetic network and evaluate the network’s perfor-

mance relative to the network reconstructed withLASSO. We also use DPLASSO to reconstruct a biologi-

cal network from a data set obtained by simulating the

biological system. Next, a discussion of the salient fea-

tures of DPLASSO and the results is presented, followed

by conclusions and directions for future work.

II . METHODOLOGY

DPLASSO includes two parameter-selection layers.

Layer 1 is a supervisory layer, in which the method of

PLS is used to build a linear model while accounting

for correlations among the inputs. Layer 2 applies

LASSO with extra weights on the less (statistically) sig-

nificant parameters computed in layer 1. Both PLS and

LASSO are briefly introduced below, followed by a

detailed description of DPLASSO.

A. Partial Least SquaresThe objective of PLS is to find combinations of the

predictors that have a large covariance with the response

values. The method projects the input and the output of

a linear model onto a space where a multidimensional di-

rection in the input space accounts for the maximum

320 Proceedings of the IEEE | Vol. 105, No. 2, February 2017

Asadi et al. : Doubly Penalized LASSO for Reconstructionof Biological Networks

variance in the multidimensional direction in the outputspace. The PLS algorithm is described below.

Let X ðm� nÞ denote the input data that consist of ninputs sampled at m points and/or conditions, and Y

ðm� pÞ denote the output data (p outputs) of a linear

model Y ¼ XB, B being the model parameters. The goal

of the PLS method is to find a linear decomposition of X

and Y, such that X ¼ TPT þ E and Y ¼ UQT þ F, where

T ðm� rÞ and U ðm� rÞ are input scores and outputscores, respectively; P ðn� rÞ and Q ðp� rÞ are input

loadings and output loadings, respectively; and E ðm�nÞ and F ðm� pÞ are residual matrices. This decomposi-

tion is made with the objective of maximizing the covari-

ance between T and U [26]. Several iterative algorithms

such as PLS1 and SIMPLS [27] have been designed to

compute the score and loading matrices. For the sake of

simplicity, we perform the analysis for each output inde-pendently ðp ¼ 1Þ.

B. LASSOLASSO represents the network reconstruction prob-

lem as a constrained optimization problem

b̂j ¼ argmin kek22 ¼ ðyj � XbjÞTðyj � XbjÞn o

subject to kbjk1 � q bLSj

�� 1

(1)

where vector yj represents output j, matrix X represents

the input data (Section II-A), and e ¼ yj � Xbj repre-

sents the fit error (residual) vector. k:k refers to ‘1- or‘2-norm. The parameter q ð0 � q � 1Þ controls theamount of ”shrinkage” in the estimation of bj (linear

model parameters for output j) and bLSj is the least

squares solution. The constraint in (1) shrinks the abso-

lute values of the parameters. For certain values of q, thealgorithm shrinks the parameters such that some of them

become exactly zero, thus resulting in a parsimonious

model. The constrained optimization problem (1) is

solved by employing a quadratic programming approach(e.g., an interior-point method is used in this work) [17].

C. DPLASSOAs mentioned earlier, DPLASSO takes advantage of

both PLS (linear regression) with statistical significance

testing and LASSO. The DPLASSO procedure is schemat-

ically shown in Fig. 1 and is described below.

Layer 1 [Fig. 1(a) and (b)]: The PLS method is imple-

mented on a data set with a linear model given by (for

one output at a time, e.g., the jth output), with " being

the residual vector

yj ¼ Xb̂j þ ": (2)

The number of latent variables in PLS can be deter-

mined by either cross validation or by specifying the

fraction of the cumulative variance (FCV) captured (say,

0:8 G FCVk G 0:95). To test whether an element of the

parameter vector b̂ is statistically different from zero, a

two-tailed t-test is performed on the PLS regression coef-ficients. First, the standard deviation of the model pa-

rameters �b is obtained by computing the covariance

matrix of model parameters b̂ [21]

covðb̂Þ ¼ �̂2@b̂

@y

@b̂

@y

!T

: (3)

We used PLSDOF package in R [28] to compute �b.

Second, for the ith input and the jth output with k latent

vectors, the ratio rij;k ¼ jbij;kj=�b;ij;k and its average �rijover k such that 0:8 G FCVk G 0:95 are computed. Third,

a significance coefficient sij is assigned to the ith input

with respect to the jth output as follows. Since all the

calculations are performed for one output at a time, we

suppress superscript j for the sake of simplicity.— We define a significance index � as

�i ¼ �ri � tinvð1� �=2; vÞ, where v ¼ DOFPLS[21] and � ¼ 1� c is selected to achieve a confi-

dence level c ¼ 0:8 (80%).

— An appropriate mapping function, referred to as

a scoring function s or significance score, is

used to transform � into a score. The range of

Fig. 1. Flowchart of the DPLASSO approach. Starting with a full

linear model with all coefficients being nonzero (represented by

a complete graph in (a); each edge corresponds to an element in

the coefficient matrix B [see (1)], the algorithm reduces the

network size in two sequential steps: linear regression (e.g., PLS)

with statistical significance testing (b) and LASSO (c). The

reconstructed network is shown in (d).



the score is [0, 1]. A good candidate function for

this purpose is a logistic mapping function

sð�iÞ ¼ 1

1þ eð��iÞ: (4)

This function ensures that the values of the sig-

nificance score for the PLS-significant coeffi-cients ð�i � 0Þ fall in the range between 0.5

and 1 (Fig. 2, continuous curve). The width of

the transition between 0 and 1 is controlled

by the parameter � (larger the value of �,smaller the width of transition; after explora-

tion of variation in �, we used � ¼ 1).

Layer 2 [Fig. 1(c)]: For the chosen output, the signifi-cant coefficients identified by layer 1 are used to modify

the uniform weights in the LASSO module. LASSO is re-

formulated as (subscript j is omitted for simplicity)

b̂ ¼ argmin kek22 ¼ ðy � XbÞTðy � XbÞn o

subject toX

i¼1;...;nwijbij � q

Xi¼1;...;n

wi bLSi

�� (5)

where wi is the weight for the ith input. It is computed

from the significance score sð�iÞ as

wi ¼ 1� sð�iÞ ¼ sð��iÞ: (6)

For significant coefficients (predictors), �i 9 0 whichimplies that 0:5 G sð�iÞ G 1 and 0 G wi G 0:5. For insig-

nificant coefficients, �i G 0 so that 0 G sð�iÞ G 0:5 and

0:5 G wi G 1. � ¼ 0 results in equal weight for all predic-

tors and � ¼ 1 results in 0 weight (no constraint) on

significant predictors. The tuning parameter q [to be de-

termined by cross validation on root-mean-squared error

(RMSE) of the prediction] plays the same role as in the

constrained optimization problem (1). Next, the modifiedLASSO is used to set the small coefficients in the matrix

B to zero (which is equivalent to deleting the correspond-

ing edges in the network) while retaining the significant

predictors. The final model is composed of the inputs

that are retained (i.e., are nonzero) at layer 2 [Fig. 1(d)].

The DPLASSO approach is applicable to both static

and dynamic systems. For static systems, X and Y corre-

spond to input and output data, respectively. For dy-namic systems, Y is constructed from the data on the

time derivatives of the state variables [see (10)].

D. Performance MetricsWe use two metrics to evaluate the performance of

DPLASSO. Each metric assesses different aspect of thereconstruction performance.

The first metric is an aggregate of several measures

of model evaluation (i.e., accuracy, sensitivity, and speci-

ficity) that rely on the definition of positive and negative

occurrences in the reconstructed network (with respect

to one output at the time). These are defined as

accuracy ¼ TNþ TP

n

sensitivity ¼ TP

TPþ FN

specificity ¼ TN

TNþ FP

lift ¼ TP

TPþ FN

n

TPþ FP(7)

where TP, TN, FP, and FN denote the number of true

positives, true negatives, false positives, and false nega-tives, respectively. n ¼ TPþ FPþ TNþ FN is the num-

ber of inputs.

The second metric is RMSE of the prediction. RMSE

is computed as

RMSE ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

m

Xmi¼1ðyi � yi;pÞ2

s(8)

where yi and yi;p denote the actual value and the pre-

dicted value, respectively, of the ith measurement of the

chosen output variable.

Fig. 2. Variation of scoring and weighting functions with respect

to significance index variable �. The plot shows that smaller

weights are assigned to more significant coefficients.



III . RESULTS

A. Case Study I: Synthetic DataWe apply DPLASSO to analyze a synthetic linear sys-

tem. The relative performance of DPLASSO is evaluated

with respect to the metrics defined in (7) and (8).

1) Generation of Synthetic Data: We generate a well-

sampled (in the sense of Nyquist–Shannon) syntheticdata set for a linear stable dynamic system

dx

dt¼ Axþ e: (9)

The state vector x consists of n ¼ 9 state variables, A is

the system/coefficient matrix, and e is Gaussian noisevector with zero mean and equal standard deviation

(0.01) for all state variables.

The system matrix A is generated as follows.

1) An n � n random matrix Atemp is generated.

2) A predefined percentage of the coefficients (ele-

ments in system matrix Atemp) are set to zero

from off-diagonal elements of matrix Atemp to

represent negatives in the synthesized network.3) The largest negative eigenvalue � of Atemp is var-

ied to generate stable systems A with different

time scales. For instance, to move the location

of the largest pole of an input-free system to

zero, one may set A ¼ Atemp � c� In�n where cis the largest eigenvalue of the original system

Atemp and I is the identity matrix.

Since the synthetic data set represents an input-freedynamic system, a set of initial conditions for the system

states is required. For the system to be stable around the

origin ðxss ¼ �0Þ, the initial state xðt ¼ 0Þ is chosen to be

inside a small region around origin. Initial conditions are

chosen from a Gaussian distribution excluding 0, e.g.,

N ð0; 20Þ � f0g, to prevent the presence of stationary

states.

For the network reconstruction, given the time seriesdata xðtÞ, we discretize (9) as

xjtk � xjtk�1

�t¼ Axjtk�1 þ e; k ¼ 1; 2; 3; . . . ;m (10)

where m is the last time-index used. e represents noise.Since matrix A does not change with time, we transpose

(10) and then stack the resulting rows on both sides to

form yj and X of (2). �t is equal to the Nyquist sample

time 1=ð2�Þ.In order to make our synthetic networks representa-

tive of experimental data sets, we consider several fac-

tors, such as noise variance, granularity, and sparse

network topology. Synthetic data sets, which are gener-ated with a linear stable model whose parameters are

known, facilitate the assessment of the overall accuracy

of DPLASSO. A portion (about a third) of true parame-

ters of the synthetic system is set to 0 to test the ability

of the method to identify them as insignificant. In all of

the Monte Carlo simulations, involving � and �, we use

the following procedure to generate synthetic data sets

and evaluate the performance of the methods.

for �i, i ¼ 1; 2; 3; . . .

for �j, j ¼ 1; 2; 3; . . .

for k ¼ 1; 2; . . . ; 10 (Monte Carlo runs)

½y;X� generate data set

Apply the chosen method on ½y;X� toreconstruct the network.

Evaluate the metrics.

end

Compute the average of the performancemetrics of 10 Monte Carlo runs

end

end

2) Simulation Results:Estimation of the LASSO parameter q: An optimal

value of the DPLASSO parameters is identified by a grid

search with respect to the confidence level c in signif-

icance testing and the LASSO parameter q (cross vali-

dation). This procedure yields 0:8 G FCV G 0:95,confidence level of 0.8 in the PLS (layer 1), and the

LASSO tuning parameter q ¼ 0:2 (layer 2). Synthetic

data sets were also generated from a zero-input response

of a stable linear system with 9 state variables (nonzero

initial states) and 100 time points (with fixed sampling

time of 0.1 s and settling time range of 8 s G ts G 15 s).

The noise level in (9) was set to 0.1 [�10% on an average

over a time span from t ¼ 0 until 3=�min (approximatesettling time)]. Network reconstruction with LASSO was

used as the baseline against which the performance of

both DPLASSO and PLS with significance testing was

compared.

The shrinkage parameter q was estimated using ten-

fold cross validation as follows. The magnitude of q is in-creased from 0 to 1 in steps. For each value of q, thetenfold cross-validation error is computed. In each fold,90% of the data are used for training and 10% of the

data are used for prediction. This procedure is repeated

ten times to cover all the data points. The mean of the

RMSE for the tenfold and the variability (standard error

of mean) across the tenfold are shown in Fig. 3. Next,

the minimum error is recorded and the optimal value of

q is chosen so that the corresponding prediction RMSE



is no more than the minimum error plus the standard

error of the mean. In Fig. 3, the optimal q value isbetween 0.2 and 0.3, which is close to the actual frac-

tional connectivity/density of the network, 20%.

Performance comparison with respect to network den-sity: As the size of a biological network increases, its den-

sity (fractional connectivity) generally decreases [29].

We compare the performance of DPLASSO against that

of both linear regression with significance testing (la-

beled as PLS) and LASSO for networks with different

densities. The comparison is carried out in terms of accu-

racy, sensitivity, and specificity. Figs. 4–6 demonstratethat DPLASSO provides a good compromise between

LASSO and PLS for different network densities. In Fig. 4,

the accuracy of the three methods is plotted as a function

of the time-scale parameter � and the DPLASSO parame-

ter �. Since LASSO and PLS do not involve the parameter

�, the variation along the �-axis is solely due to a small

number of Monte Carlo runs [see (10)]. Fig. 4 reveals

Fig. 4. Accuracy of DPLASSO, LASSO, and PLS for different

network densities. The quantities plotted are the averages over

ten Monte Carlo runs. In each run, the network has 20 nodes and

5% noise is considered in the state–space model of (9). The four

panels correspond to different network densities.

Fig. 5. Sensitivity of DPLASSO, LASSO, and PLS methods for

different network densities. The quantities plotted are the

averages over ten Monte Carlo runs. In each run, the network has

20 nodes and 5% noise is considered in the state–space model of

(9). The four panels correspond to different network densities.

Fig. 6. Specificity of DPLASSO, LASSO, and PLS methods for

different network densities. The quantities plotted are the

averages over ten Monte Carlo runs. In each run, the network has

20 nodes and 5% noise is considered in the state–space model of

(9). The four panels correspond to different network densities.

Fig. 3. RMSE in validation versus shrinkage parameter q for a

network with 20% connection density and node size 9 (for

tenfold cross validation). RMSE was normalized by the standard

deviation of the noise (0.1). Plot shows that when q is between

0.2 and 0.3, the RMSE is within the standard deviation of the

minimum RMSE.



that the accuracy of DPLASSO is between those of PLS(best) and LASSO (worst), but closer to that of PLS. This

behavior is consistent across network densities ranging

between 5% [Fig. 4(d)] and 50% [Fig. 4(a)]. The perfor-

mance of both DPLASSO and PLS is more stable than

that of LASSO as the network density is increased. For

example, the accuracy of LASSO increases from about 0.2

to about 0.5 when the network density is increased from

0.2 to 0.5, whereas the accuracy of PLS and DPLASSOstays in the range of 0.4–0.6. This shows that DPLASSO

retains the robustness feature of the PLS approach (re-

gression with significance testing).

Fig. 5 shows the sensitivity of the three methods as a

function of � and �. The performance of DPLASSO is

again a compromise between that of PLS (worst) and

LASSO (best). As the network density (fractional connec-

tivity) increases from 5% to 50%, the sensitivity ofDPLASSO becomes more similar to that for PLS, espe-

cially at lower values of �. This means that at low den-

sity (representative of biological networks) DPLASSO

retains the high sensitivity of LASSO. DPLASSO’s sensi-

tivity degrades slowly as network density increases. Fi-

nally, LASSO provides the best sensitivity for both low

and high network densities and is stable around 0.8.

Fig. 6 exhibits the specificity of the three methods asa function of � and �. Similar to the sensitivity and accu-

racy, the specificity of DPLASSO falls between that of

LASSO (the worst specificity) and PLS (the best specific-

ity). The specificity of LASSO is especially poor for large

� (faster systems) and for low-density (5%–10%) net-

works. The specificity of DPLASSO is closer to that of

PLS at various network densities.

Finally, we compare the predictive capability ofDPLASSO, LASSO, and PLS in terms of their prediction

RMSE. Table 1 lists RMSE in addition to accuracy, sensi-

tivity, specificity, and lift for DPLASSO, LASSO, and

PLS. The ratio std (RMSE)/mean (RMSE) serves as a

measure of the robustness of the RMSE. All the data re-

ported in Table 1 represent averages over ten indepen-

dent (Monte Carlo) realizations, each of which is

generated with a new coefficient matrix A but the samefraction of the nonzero coefficients (66%). Table 1 shows

that, overall, DPLASSO outperforms both LASSO and

PLS if all four performance metrics are considered im-portant. Another observation is that the performance of

DPLASSO is closer to that of PLS than LASSO. This is

because during the LASSO stage, the statistically signifi-

cant predictors (at PLS stage) are restricted to a lesser

degree ð0 G wi G 0:5Þ as compared to the statistically in-

significant predictors ð0:5 G wi G 1Þ.

B. Case Study II: A Dynamic Biological NetworkThe results presented above demonstrate the ability

of DPLASSO to reconstruct dynamic networks with a lin-

ear input-to-output mapping. In this section, we useDPLASSO to reconstruct a real biological network from

data generated from a well-studied nonlinear model of a

cell division cycle of fission yeast [30]. This model en-

ables us to run virtual experiments and to evaluate the

performance of different methods in a realistic nonlinear

setting. We generated the data set by numerically inte-

grating the system of nine differential and seven alge-

braic equations given in [30]. The noise-free system ofdifferential equations is solved for the time span of

130 s, with a fixed sampling time of 1 s.

The results of implementing DPLASSO are highly sen-

sitive to the time span (phase of the cell cycle) on which a

model is run [31]. Accordingly, we chose a slice of the

temporal response of the system corresponding to S/G2

phases of the cell division cycle (20 s G t G 120 s) wherein

most of the states are excited. The structure of the dynam-ical system for network reconstruction is identical to (9).

These equations were solved subject to the initial condi-

tions ½0:59; 0:02; 0:59; 2:12; 2:12; 0:92; 0:04; 0:53; 1�T . Theparameters of DPLASSO were set to achieve 0.8 signifi-

cance level in the PLS layer, and the amount of shrinkage

in the LASSO layer was chosen to be 0.8 (by cross valida-

tion). The true and false connections identified by

DPLASSO are listed in Table 2. We correctly identify sev-eral important connections such as M! M and

M! Cdc13. Our method was not able to identify the con-

nection Cdc13! Cdc13 which an LMI-based method was

able to identify [31].

The performances of DPLASSO, LASSO, and PLS are

reported in Table 3 in terms of accuracy, sensitivity,

specificity, lift, and normalized variation in RMSE. The

Table 1 Performance of the Three Methods on Synthetic Data. Results Are Based on Average of Ten Runs of Networks With Size 20, and Represent Aver-

age Values over � 2 ½0; 1:5� and � 2 ½�4;0�. Abbreviation: SD, standard deviation



model’s nonlinearity, as opposed to the linear model

given by (2) or (9), precludes the possibility of direct

comparison of the fractional error in estimating the

model coefficients. Table 3 demonstrates that the perfor-mance of DPLASSO is qualitatively similar to that ob-

served for the synthetic data set (Case Study I).

DPLASSO is 17% more efficient and 33% more accurate

than LASSO and has the same SD RMSE/Mean RMSE

ratio as PLS but lower (by 12%) accuracy. DPLASSO has

about the same sensitivity as LASSO but outperforms

PLS by 12%. Finally, DPLASSO shows considerably bet-

ter specificity than either LASSO or PLS (Table 3). Thelow specificity in the reconstruction of networks by all

three methods arises due to the low number of missing

connections identified (true negatives) in the network.

Fig. 7 provides a graphical representation of the yeast

cell division cycle network reconstructed with DPLASSO,

LASSO, and PLS. The true network, inferred from the

structure of the model equations, is shown in Fig. 7(a).

DPLASSO identified most of the true connections, witha relatively small number of false connections (e.g., non-

existent connection from Cdc13 T to Rum1 T). However,DPLASSO also missed some true connections, e.g., the

stimulation of Slp1 by preMPF. LASSO and PLS identi-

fied most of the true connections, but both have a rela-

tively large number of false positives.

IV. DISCUSSION

DPLASSO provides an attractive compromise between

LASSO and PLS if all the commonly used performance

metrics are considered as equally important. Below, we

analyze the three methods further in order to identify a

method most suited to a particular goal and to improvethe performance of DPLASSO.

A. Selection of a Methodology Based on theDesired Goals of Reconstruction

Table 1 suggests that the selection of an efficient net-

work reconstruction method depends on the desired

Table 2 Connections in the Cell Division Network Reconstructed With DPLASSO. True Positives Are Denoted by TP, and False Positives by FP. An Edge Is

Denoted by Link From a Node in the Row to a Node in the Column

Table 3 Performance of the Three Network Reconstruction Methods for

Cell Division Cycle Averaged Over �

Fig. 7. True network (a) and its reconstructions with DPLASSO

(b), LASSO (c) and PLS (d). The performance metrics are listed in

Table 3.



outcome (e.g., better sensitivity or better specificity) andthe properties of the network. Sensitivity represents a

method’s ability to detect true positives in the network,

and specificity provides an indicator of a method’s ability

to identify true negatives (absence of connections).

Table 1 suggests that LASSO is more effective in finding

the true positives (has high sensitivity), while PLS is

more reliable in identification of the true negatives (has

high specificity). DPLASSO provides a compromise be-tween these two methods, which is attractive either in

the absence of sufficient information about the network

density or when both criteria (sensitivity and specificity)

are equally important. DPLASSO also ensures robust pre-

diction of the behavior of the system.

B. Adjustment of � Based on Receiver OperatingCharacteristic (ROC) Curve

Performance of any network reconstruction method

depends on a system’s dynamics (e.g., on the location ofa dominant pole in a linear system). We generated ROC

curves, which are a graphical illustration of the perfor-

mance of a classifier (in this case identification of the ex-

istence of a link in a network) with respect to variation

of an algorithmic parameter (in this case, �).The ROC curves for the three methods are shown in

Fig. 8. The left panel represents the ROC curves gener-

ated by varying � (averaged over several values of �). Asmaller marker size indicates a faster system (larger ab-

solute value of �). The performance of LASSO is the

worst overall, whereas the performance of DPLASSO and

PLS is comparable. As � increases (the system becomes

faster), the specificity of PLS increases substantially

more than decrease in its sensitivity. A comparable in-

crease in � reduces the specificity of DPLASSO at a

slower rate than the gain in sensitivity. LASSO is rela-tively insensitive to changes in �.

Fig. 8(b) exhibits the ROC curve for varying � (aver-

aged over several values of �). Changes in � do not affect

the ROC curves for PLS and LASSO, which is to be ex-

pected since the parameter � is not used in these two

methods. As � increases, the specificity of DPLASSO de-

creases, while its sensitivity increases. These results im-

ply that if one can identify the fastest time scale of thesystem (e.g., by using time series analysis), the parame-

ter � can be tuned to achieve a better performance. In

other words, if a system’s dynamics is identifiable, one

can identify a value of � that enables DPLASSO to out-

perform PLS and LASSO in terms of sensitivity and spec-

ificity, without degrading other metrics significantly.

C. Role of the Network Structure in the Selectionof a Method

The specificity of DPLASSO and LASSO is lower than

that of PLS for both the synthetic (Table 1) and simu-

lated biological (Table 3) data sets. This behavior is due

to topological properties of the networks and the recon-

struction objective function. Specifically, we optimized

the shrinkage factor q by minimizing cross validation

RMSE in order to achieve a higher accuracy. These pa-rameters (including significance level, shrinkage factor,

and level of variance captured in addition to parameters

� and �) can be adjusted to achieve higher scores in

terms of another metric, such as sensitivity, specificity,

and RMSE. Selection of the desired metric to be opti-

mized can vary depending upon the structure of the net-

work. In practice, to reconstruct sparse networks, it is

desirable to optimize the sensitivity metric whereas fornetworks with moderate density, accuracy is relatively

more important. For example, in cell cycle, the number

of true connections is 22 against 45 (9 � 10/2) possible

connections. This makes it a moderately dense network.

Hence, accuracy could serve as an appropriate measure

to evaluate the reconstruction performance. In other

words, since sensitivity and accuracy are more important

for networks with low and moderate density, respec-tively, DPLASSO is suitable for them despite its low

specificity. For high-density networks, PLS is likely to be

a better approach due to its higher specificity.

V. SUMMARY AND CONCLUSION

We have developed a new method, DPLASSO, to recon-

struct static and dynamic biological networks. The pri-mary basis of DPLASSO is the integration of statistical

significance testing and L1-penalty/optimization ap-

proaches to achieve a good performance with respect to

several metrics simultaneously. DPLASSO uses two

layers of analysis. Layer 1 employs linear regression (PLS

in this case) with statistical significance testing as a

supervisory-level filter to extract the statistically

Fig. 8. Performance of DPLASSO, LASSO, and PLS on ROC curve

(for network density of 20% positives and size 50 with ten

Monte Carlo runs). (a) Comparison of DPLASSO with LASSO and

PLS for different � in the range [�2, 0]. The results are averaged

over several values of the � parameter in the range [0, 2].

(b) Effect of change in value of � on performance of the three

methods (averaged over several � in the range [�2, 0]). In (a), a

smaller marker size indicates a faster system (smaller time

constant or larger absolute value of �).



significant components of a network. Layer 2 utilizesLASSO with extra weights on the statistically less signifi-

cant parameters obtained in layer 1 to retain the main

predictors and set the remaining small coefficients to

zero. Layers 1 and 2 are connected via a logistic function

with a tuning parameter �. We applied this approach to

two case studies. In the first case study, a set of linear

synthetic networks was created to evaluate the relative

performance of DPLASSO, PLS with statistical signifi-cance testing, and LASSO. Our simulation results show

that DPLASSO provides good compromise between

LASSO and PLS for different network densities and sizes.

In the second study, we applied DPLASSO to simulated

data sets for cell division cycle of fission yeast. The re-

sults of evaluation of the performance of DPLASSO quali-tatively confirm the findings for the synthetic system.

The analysis on ROC curve showed that performance of

DPLASSO can be improved as compared to that for PLS

and LASSO by tuning the parameters � and q.From the results, we conclude that combining PLS

(regression) with statistical significance testing and

LASSO (optimization with L1-penalty) provides good

performance with respect to several metrics. We alsoconclude that DPLASSO is well suited for reconstruc-

tion of networks with low and moderate density. For

high-density networks, higher specificity (high true

negatives and low false positives) is desired, making

PLS a more suitable approach. h

REFERENCES

[1] M. R. Maurya and S. Subramaniam,“Computational challenges in systemsbiology,” in Systems Biomedicine: Conceptsand Perspectives, E. Liu andD. Lauffenburger, Eds., 1st ed. San Diego,CA, USA: Academic, 2009, pp. 177–223.

[2] R. Bonneau et al., “The Inferelator: Analgorithm for learning parsimoniousregulatory networks from systems-biologydata sets de novo,” Genome Biol., vol. 7,p. R36, 2006, doi: 10.1186/gb-2006-7-5-r36.

[3] K. A. Janes et al., “Systems model ofsignaling identifies a molecular basis set forcytokine-induced apoptosis,” Science,vol. 310, pp. 1646–1653, Dec. 9 2005.

[4] S. Pradervand, M. R. Maurya, andS. Subramaniam, “Identification of signalingcomponents required for the prediction ofcytokine release in RAW 264.7macrophages,” Genome Biol., vol. 7, p. R11,2006, doi: 10.1186/gb-2006-7-2-r11.

[5] R. J. Prill, J. Saez-Rodriguez,L. G. Alexopoulos, P. K. Sorger, andG. Stolovitzky, “Crowdsourcing networkinference: The DREAM predictive signalingnetwork challenge,” Science Signal., vol. 4,p. mr7, Sep. 6, 2011, doi: 10.1126/scisignal.2002212.

[6] I. C. Chou and E. O. Voit, “Recentdevelopments in parameter estimation andstructure identification of biochemical andgenomic systems,” Math Biosci., vol. 219,pp. 57–83, Jun. 2009.

[7] F. Farhangmehr, M. R. Maurya,D. M. Tartakovsky, and S. Subramaniam,“Information theoretic approach to complexbiological network reconstruction:Application to cytokine release in RAW264.7 macrophages,” BMC Syst. Biol., vol. 8,p. 77, 2014, doi: 10.1186/1752-0509-8-77.

[8] A. A. Margolin et al., “ARACNE: Analgorithm for the reconstruction of generegulatory networks in a mammaliancellular context,” BMC Bioinf., vol. 7,p. S7, 2006, doi: 10.1186/1471-2105-7-S1-S7.

[9] C. W. J. Granger, “Investigating causalrelations by econometric models andcross-spectral methods,” Econometrica,vol. 37, p. 424–438, 1969.

[10] M. Masnadi-Shirazi, M. R. Maurya, andS. Subramaniam, “Time-varying causalinference from phosphoproteomicmeasurements in macrophage cells,” IEEE

Trans. Biomed. Circuits Syst., vol. 8,pp. 74–86, Feb. 2014.

[11] M. R. Maurya, S. J. Bornheimer,V. Venkatasubramanian, and S.Subramaniam, “Mixed-integer nonlinearoptimisation approach to coarse-grainingbiochemical networks,” IET—Syst. Biol.,vol. 3, pp. 24–39, 2009.

[12] S. Gupta, M. R. Maurya, andS. Subramaniam, “Identification of crosstalkbetween phosphoprotein signalingpathways in RAW 264.7 macrophage cells,”PLoS Comput. Biol., vol. 6, 2010,Art. no. e1000654, doi: 10.1371/journal.pcbi.1000654.

[13] D. Camacho, P. Vera Licona, P. Mendes,and R. Laubenbacher, “Comparison ofreverse-engineering methods using anin silico network,” Ann. NY Acad. Sci.,vol. 1115, pp. 73–89, Dec. 2007.

[14] K. Sachs, O. Perez, D. Pe’er,D. A. Lauffenburger, and G. P. Nolan,“Causal protein-signaling networks derivedfrom multiparameter single-cell data,”Science, vol. 308, pp. 523–529,Aug. 19, 2005.

[15] C. Cosentino et al., “Linear matrixinequalities approach to reconstruction ofbiological networks,” IET Syst. Biol., vol. 1,pp. 164–173, May 2007.

[16] F. Montefusco, C. Cosentino, andF. Amato, “CORE-Net: Exploiting priorknowledge and preferential attachment toinfer biological interaction networks,” IETSyst. Biol., vol. 4, pp. 296–310, 2010.

[17] R. Tibshirani, “Regression shrinkage andselection via the LASSO,” J. Roy. Stat. Soc.B, Methodol., vol. 58, pp. 267–288, 1996.

[18] H. Zou and T. Hastie, “Regularization andvariable selection via the elastic net,” J. Roy.Stat. Soc. B, Stat. Methodol., vol. 67,pp. 301–320, 2005.

[19] I. Famili, R. Mahadevan, andB. O. Palsson, “k-cone analysis: Determiningall candidate values for kinetic parameterson a network scale,” Biophys. J., vol. 88,pp. 1616–1625, Mar. 2005.

[20] A. V. Karnaukhov, E. V. Karnaukhova, andJ. R. Williamson, “Numerical matricesmethod for nonlinear system identificationand description of dynamics of biochemicalreaction networks,” Biophys. J., vol. 92,pp. 3459–3473, May 10, 2007.

[21] N. Kramer and M. Sugiyama, “The degreesof freedom of partial least squares

regression,” J. Amer. Stat. Assoc., vol. 106,pp. 697–705, Jun. 2011.

[22] A. Greenfield, A. Madar, H. Ostrer, andR. Bonneau, “DREAM4: Combining geneticand dynamic information to identifybiological networks and dynamical models,”Plos One, vol. 5, 2010, Art. no. e13397,doi: 10.1371/journal.pone.0013397.

[23] M. Weber et al., “Inference of dynamicalgene-regulatory networks based ontime-resolved multi-stimuli multi-experiment data applying NetGeneratorV2.0,” BMC Syst. Biol., vol. 7, p. 1,Jan. 2, 2013, doi:10.1186/1752-0509-7-1.

[24] K. Lo et al., “Integrating external biologicalknowledge in the construction of regulatorynetworks from time-series expression data,”BMC Syst. Biol., vol. 6, p. 101, 2012,doi: 10.1186/1752-0509-6-101.

[25] B. Asadi, M. R. Maurya,D. M. Tartakovsky, and S. Subramaniam,“Comparison of statistical and optimisation-based methods for data-driven networkreconstruction of biochemical systems,” IETSyst. Biol., vol. 6, pp. 155–163, Oct. 2012.

[26] S. Maitra and J. Yan, “Principle componentanalysis and partial least squares: Twodimension reduction techniques forregression,” Discussion Paper Program,Casualty Actuarial Society, pp. 79–90, 2008.

[27] S. Dejong, “SIMPLS—An alternativeapproach to partial least-squares regression,”Chemometrics Intell. Lab. Syst., vol. 18,pp. 251–263, Mar. 1993.

[28] N. Kraemer and M. Braun, “Degrees offreedom and statistical inference for partialleast squares regression (0.2–2 ed.),”[R package], 2011. [Online]. Available:http://www.inside-r.org/node/69413

[29] E. Ravasz, A. L. Somera, D. A. Mongru,Z. N. Oltvai, and A. L. Barbasi,“Hierarchical organization of modularity inmetabolic networks,” Science, vol. 297,pp. 1551–1555, Aug. 30 2002.

[30] B. Novak, Z. Pataki, A. Ciliberto, andJ. J. Tyson, “Mathematical model of the celldivision cycle of fission yeast,” Chaos,vol. 11, pp. 277–286, Mar. 2001.

[31] C. Cosentino, W. Curatola, M. Bansal,D. di Bernardo, and F. Amato, “Piecewiseaffine approach to inferring cell cycleregulatory network in fission yeast,” Biomed.Signal Process. Control, vol. 2, pp. 208–216,Jul. 2007.



ABOUT THE AUTHORS

Behrang Asadi received the B.S. degree in me-

chanical engineering from Sharif University of

Technology, Tehran, Iran, the M.S. degree in con-

trol systems from Clemson University, Clemson,

SC, USA, and the Ph.D. degree from the Univer-

sity of California San Diego, La Jolla, CA, USA.

He is currently Head of Advanced Analytics at

Danaher Labs, Santa Clara, CA, USA. He has held

several leadership positions in data science in fi-

nancial services, internet, and consulting firms.

His research interest is focused on application of machine learning in

ad hoc disruptive computing challenges.

Mano Ram Maurya received the B.Tech. degree

in chemical engineering from Indian Institute of

Technology Bombay, Bombay, India, in 1998, the

M.E. degree in chemical engineering from City

College of New York, New York, NY, USA, in

1999, and the Ph.D. degree in chemical engineer-

ing from Purdue University, West Lafayette, IN,

USA, in 2003.

From 2003 to 2006, he was a Postdoctoral

Researcher in the Department of Bioengineering

and the San Diego Supercomputer Center at the University of California

San Diego (UCSD), La Jolla, CA, USA. Then, he worked in the Depart-

ment of Bioengineering as an Assistant Scientist from October 2006 to

November 2010. Since then, he has been a Research Scientist at the

San Diego Supercomputer Center and the Department of Bioengineer-

ing at UCSD. He is the author of about 40 papers in peer-reviewed jour-

nals and conference proceedings. His current research interests include

the study of complex biochemical processes and pathways using sys-

tems engineering/biology, machine learning, information theory and

bioinformatics approaches, and their applications to biomedicine.

Dr. Maurya received the Best Paper Award in 2005 jointly with

his Ph.D. advisor Dr. Venkat Venkatasubramanian and coadvisor

Dr. Raghunathan Rengaswamy for their paper published in the Journal

of Engineering Applications of Artificial Intelligence in 2004. In August

2011, he joined the Editorial Board of ISRN Biophysics.

Daniel M. Tartakovsky received the M.S. degree

from Kazan State University, Kazan, Russia, in

1991 and the Ph.D. degree from the University of

Arizona, Tucson, AZ, USA, in 1996.

From 1998 to 2005, he was a Technical Staff

Member at Los Alamos National Laboratory, Los

Alamos, NM, USA, first, in the Scientific Comput-

ing Group (CCS-3), and then, in the Mathematical

Modeling and Analysis Group (T-7). In 2005, he

moved to the Mechanical and Aerospace Engi-

neering Department, University of California San Diego (UCSD), La Jolla,

CA, USA, where he is currently a Professor. He has published over

150 articles. His research interests include uncertainty quantification,

stochastic partial differential equations, inverse modeling, and mathe-

matical biology.

Prof. Tartakovsky is on the editorial board of several journals.

Shankar Subramaniam received the B.S. and

M.S. degrees from Osmania University, Hyderabad,

India and the Ph.D. degree in chemistry from

Indian Institute of Technology Kanpur, Kanpur,

India, in 1982.

From 1991 to 1999, he was a Professor of Bio-

physics, Biochemistry, Molecular and Integrative

Physiology, Chemical Engineering and Electrical

and Computer Engineering at the University of

Illinois at Urbana-Champaign, Urbana, IL, USA,

prior to moving to the University of California San Diego (UCSD),

La Jolla, CA, USA. Currently, he is the Joan and Irwin Jacobs Endowed

Chair in Bioengineering and Systems Biology at UCSD and a Professor

of Bioengineering, Bioinformatics and Systems Biology, Cellular and

Molecular Medicine, Computer Science and Engineering, and Nanoengi-

neering. He was the Founding Director of the Bioinformatics and

Systems Biology Program at UCSD. He has published over 250 articles.

Research in his laboratory spans several areas of bioinformatics,

systems biology, and medicine.

Dr. Subramaniam is on the editorial board of several journals.



contributed pap e r doublypenalizedlasso...

Documents