energy-efficient ultra-dense network using lstm-based deep
TRANSCRIPT
1
Energy-Efficient Ultra-Dense Network Using
LSTM-based Deep Neural Network
Seungnyun Kim, Junwon Son and Byonghyo Shim
Seoul National University, Seoul, Korea
Email: {snkim, jwson, bshim}@islab.snu.ac.kr
Abstract
As a means to achieve thousand-fold throughput improvements of future wireless communications,
ultra-dense network (UDN) where a large number of small cells are densely deployed on top of the
macro cells has received great deal of attention in recent years. Despite a variety of benefits that UDN
offers, intensive deployment of small cells may pose a serious concern in the energy consumption. Over
the years, to reduce the energy consumption of UDN, an approach that turns off the lightly loaded
base stations (BSs) has been proposed. However, determining the proper on/off modes of BSs is a
challenging problem due to the huge computational overhead and inefficiency caused by the delayed
decision. An aim of this paper is to propose a deep neural network (DNN)-based framework to achieve
reduction of energy consumption in UDN. By cascading the long short-term memory (LSTM) to extract
the temporally correlated features from the channel information and the feedforward network to make
BS on/off mode decision, we can control the on/off modes of BSs, thereby achieving a considerable
reduction of the cumulative energy consumption. From the extensive simulations, we demonstrate that
the proposed technique is effective in reducing the energy consumption of UDN.
This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded
by the Korea government(MSIT) (No.2018-0-01410, Development of Radio Transmission Technologies for High Capacity and
Low Cost in Ultra Dense Networks).
Parts of this paper will be appeared at the WCNC, 2020 [1].
2
Macro cell
Micro cell
Femto cell
Mobile
Digital Unit
Fig. 1: Illustration of UDN: the BSs are connected to digital unit (DU) via backhaul link.
I. INTRODUCTION
In recent years, ultra-dense network (UDN) where a large number of small cells (i.e., pico
cell and femto cell) are densely deployed (more than 103 cells/km2) on top of the macro cells
has received a great deal of attention as a means to improve the network capacity of future
wireless communication systems [2], [3]. UDN shortens the physical distance between the base
station (BS) and the mobile device, resulting in a reduction of the path loss caused by wireless
transmission, in particular for the millimeter wave (mmWave) signal transmission. Further, it
enables an aggressive reuse of frequency resources, thereby achieving a significant improvement
in the quality of service (QoS) of wireless networks [4]. One potential problem of UDN is that
an intensive deployment of small cells will increase the energy consumption of the network [5].
In fact, surge of energy consumption due to the use of a large number of small and macro cells
is a heavy burden for the network operators since it increases the operational expense (OPEX) to
a large extent [6]. Pursuing an enhancement in the energy efficiency is, therefore, an inevitable
option to ensure the sustainability of wireless network [7], [8].
Among several factors contributing to the energy consumption of the cellular systems, by
3
far the most dominant one is the BS (around 60% of the energy consumption [9]). Over the
years, a technique so-called sleep mode technique that turns off the lightly loaded BSs has been
proposed to reduce the energy consumption of BS [10]–[12]. Main idea of this approach is to
selectively switch off the underutilized BSs. In [11], an approach to randomly shut off the BSs
while guaranteeing the coverage of a network has been proposed. In [12], an approach that
iteratively turns off the BSs one after another while satisfying the required mobile rate has been
proposed. Well-known drawback of this approach is that the computational complexity increases
exponentially with the number of BSs, limiting the effectiveness of this approach.
The primary purpose of this paper is to put forth an entirely new approach based on deep
neural network (DNN) to achieve a reduction of energy consumption in UDN. While the
conventional approaches aim to minimize the instantaneous energy consumption using a heuristic
approach and thus incur a waste of energy due to the frequent mode transition (on to off
and vice versa), the proposed scheme, henceforth referred to as deep learning-based energy-
efficient network (DLEEN), achieves a reduction of the total energy consumption over a long-
term operational period. Among various DNN techniques, we use the long short-term memory
(LSTM), a technique specialized for extracting features from the sequential data as a main
engine [13]. We use the LSTM to extract temporally correlated features (e.g., angle of departure
(AoD), delay spread, and path gain) from the channel state information (CSI). Using the extracted
features, DLEEN makes a fast yet accurate on/off decision of BSs minimizing the cumulative
energy consumption [14]. As a result, the baseband processing block as well as the analog block
(i.e., PA and RF filter) can be switched off, achieving a substantial energy saving in UDN.
From the simulation results in the realistic UDN environment, we demonstrate that the pro-
posed DLEEN technique satisfies the QoS of UDN with only a small portion of active BSs,
reducing the energy consumption substantially. For example, DLEEN scheme saves around 30%
of energy against the full-association scenario where all the BSs are turned on. Even when
compared to the conventional approach based on the optimization technique, energy saving of
DLEEN is substantial since it takes into account the transition power of the switching BSs.
The rest of this paper is organized as follows. In Section II, we briefly introduce the system
model for UDN. In Section III, we provide the basics of LSTM and the LSTM-based framework
for energy-efficient UDN. In Section IV, we provide the practical issues for the implementation
of DLEEN. In Section V, we present the simulation results of the proposed DLEEN and conclude
the paper in Section VI.
4
Notations: Lower and upper case symbols are used to denote vectors and matrices, respectively.
The superscripts (·)T and (·)H denote transpose and Hermitian transpose, respectively. ⊗ and
◦ denote the Kronecker product and the Hadamard product, respectively. ‖x‖ is used as the
Euclidean norm of a vector x. Also, diag (X1,X2) denotes a block diagonal matrix whose
diagonal elements are X1 and X2. em = [0, · · · , 1]T is an M × 1 vector whose m-th element is
one and otherwise zero. 1+(x) is an indicator function whose value is 1 if x > 0 and otherwise
0. In addition, 0K and 1K denote K × 1 zero vector and one vector, respectively.
II. ULTRA-DENSE NETWORK SYSTEM
A. System Model of Ultra-Dense Network
In this subsection, we discuss the UDN system model. When compared to the conventional
cellular networks in which a single BS serves whole mobiles in a cell, a group of BSs coopera-
tively serves mobiles in UDN. We consider the downlink transmission where M BSs equipped
with a single transmit antennas cooperatively serve K mobiles equipped with a single antenna1.
The sets of BSs and mobiles are denoted as B = {1, · · · ,M} and U = {1, · · · , K}, respectively.
Note that B consists of Bon (set of turned on BSs) and Boff (set of turned off BSs). To indicate
the on/off modes of BSs at time slot l, we use a binary vector αl = [αl1, · · · , αlM ]T defined as
αlm =
1 if BS m is active
0 otherwise., m ∈ B. (1)
We use a block-fading channel model where the downlink channel vector hlm,k ∈ C between
the BS m and the mobile k at time slot l is given by
hlm,k =√βlm,kg
lm,k, (2)
where βlm,k ∈ R is the large-scale fading coefficient reflecting the path-loss and the shadowing
effect and glm,k ∈ C is the small-scale fading coefficient vector. We assume that the small
scale fading coefficients are independent and identically distributed (i.i.d.) complex Gaussian
random variables (i.e., glm,k ∼ CN (0, 1)). Note that the sequence of large-scale fading coefficients
{βlm,k}Ll=1 is temporally correlated since the change of βlm,k, determined primarily by the path-loss
and shadowing effect, is induced by the mobility of a mobile [16].
1Note that the proposed scheme can be easily extended to the multi-antenna systems [15].
5
In this setting, the transmit signal xlm ∈ C of the BS m at time slot l is
xlm =K∑k=1
√plm,ks
lk, (3)
where plm,k ∈ R+ ∪ {0} is the power weight between the BS m and the mobile k and slk ∈ C is
the data symbol for the mobile k. The received signal ylk ∈ C of the mobile k at time slot l is
ylk =M∑m=1
hl,Hm,kxlm + nlk (4)
=M∑m=1
√plm,kh
l,Hm,ks
lk +
K∑j 6=k
M∑m=1
√plm,jh
l,Hm,ks
lj + nlk, (5)
where nlk ∼ CN (0, σ2n) is the additive Gaussian noise at time slot l. The corresponding rate of
the mobile k at time slot l is
Rlk = log2
1 +
E[∣∣∣∑M
m=1
√plm,kh
l,Hm,k
∣∣∣2]∑K
j 6=k E[∣∣∣∑M
m=1
√plm,jh
l,Hm,k
∣∣∣2]+ σ2n
(6)
= log2
1 +
∑Mm=1 p
lm,kE
[∣∣hlm,k∣∣2]∑Kj 6=k∑M
m=1 plm,jE
[∣∣hlm,k∣∣2]+ σ2n
(7)
= log2
(1 +
∑Mm=1 p
lm,kβ
lm,k∑K
j 6=k∑M
m=1 plm,jβ
lm,k + σ2
n
). (8)
B. Power Consumption Model and Energy Minimization Problem Formulation
In this subsection, we explain the BS power consumption model and then formulate the energy
minimization problem. The power consumption at the BS consists of three major elements: 1)
transmission power P tx,lm consumed by the power amplifier and the RF circuitry, 2) maintenance
power P on,lm consumed by the power supply and the air conditioning, and 3) mode transition power
P trans,lm consumed when the BS is switched on and off [17]. Thus, the total power consumption
of the BS m at time slot l is P lm = P tx,l
m + P on,lm + P trans,l
m .
To be specific, the transmission power P tx,lm of the BS m at time slot l is
P tx,lm =
1
ηmE[∣∣xlm∣∣2] (9)
=1
ηm
K∑k=1
plm,k, (10)
6
where ηm ∈ [0, 1] the amplifier efficiency of the BS m. In general, depending on the amplifier
efficiency, only 40 ∼ 50% of the total transmission power is transmitted and the rest is dissipated
by the heat [18]. In a micro cell, for example, around 50% of the total power is consumed by
the transmission power [19].
Second, the maintenance power P on,lm of the BS m at time slot l, consumed for the air
conditioning and the power supply, is expressed as
P on,lm =
ρonm if BS m is active
ρoffm otherwise.
(11)
= αlmρonm + (1− αlm)ρoff
m , (12)
where ρonm and ρoff
m are the power consumption when the BS m is turned on and off, respectively.
One might guess that P on,lm is small, but in fact almost 35% of the total power is consumed at
the BS for the maintenance [10].
Lastly, the mode transition power P trans,lm of the BS m at time slot t, consumed to switch on
and off BS, is
P trans,lm =
ρtransm if BS m switches the mode (on to off and vice versa)
0 otherwise.(13)
=(αlm − αl−1m
)2ρtransm , (14)
where ρtransm is the power consumed by the mode transition at the BS m. In general, about 15%
of the total power is consumed by the mode transition [20]. Combining (10), (12), and (14), the
total power consumption P lm of the BS m at time slot l can be expressed as
P lm =
1
ηm
K∑k=1
plm,k +(αlmρ
onm + (1− αlm)ρoff
m
)+(αlm − αl−1m
)2ρtransm . (15)
While many of the conventional BS sleep mode techniques focused on the reduction of
the instantaneous power consumption [10], we pursue a reduction of the cumulative energy
consumption over a long-term period. To this end, we minimize the sum of power consumption
of every BSs during L time slots:
Etotal =L∑l=1
M∑m=1
P lm. (16)
7
The corresponding energy minimization problem is formulated as
P1 : min{αlm, plm,k}
Etotal (17a)
s.t. Rlk ≥ Rl
k,min, k ∈ U , l ∈ L (17b)
P tx,lm ≤ αlmPm,max, m ∈ B, l ∈ L (17c)
αlm ∈ {0, 1}, m ∈ B, l ∈ L (17d)
plm,k ≥ 0, m ∈ B, k ∈ U , l ∈ L (17e)
where L = {1, · · · , L}, Rlk,min is the rate requirement of the mobile k at time slot l, and Pm,max
is the maximum transmission power of the BS m. Note that αlm is applied to the maximum
transmission power in (17c) so that only the active BSs transmit the signal.
It is worth mentioning that αl is a binary vector and thus, P1 is classified as a mixed-integer
programming problem. To solve the problem, we basically need to explore humongous decision
combinations. For example, if there are 20 coordinated BSs, the size of decision set would be
220 ≈ 106 so that the computational overhead to find out the optimal on/off modes of BSs would
be prohibitive. Further, the optimal solution of P1 can be found only when the CSIs of whole
L time slots are available, which is not realistic due to the causality issue.
III. DEEP LEARNING-BASED ENERGY-EFFICIENT NETWORK USING LONG SHORT-TERM
MEMORY
The primary goal of the proposed DLEEN scheme is to find out the BS on/off modes {αlm} and
the power weights between active BSs and mobiles {plm,k} minimizing the energy consumption.
Since the transmission power is allocated only for the active BSs, we cannot determine the
on/off mode and the power weight concurrently. To address this issue, DLEEN uses a two-stage
processing where the BS on/off mode is determined through the DNN first and then the power
weight for the active BSs is obtained by solving the convex optimization problem.
In the BS on/off mode decision, DLEEN exploits the combination of LSTM and feedforward
neural network to learn the complicated nonlinear mapping between the CSI, the required
mobile rate, and the optimal BS on/off mode decision. To be specific, the input is the com-
bination of the CSI hl = [βl1,1, · · · , βlM,K ]T ∈ RMK and the rate requirements of mobiles
8
LSTM-based BS on/off mode
decision
Transmission power allocation
DEENInput data (i.e., CSI �
required data rate ����)
Control command
(i.e., on/off mode �,
power weight �)
On/off mode �
Fig. 2: System structure of DLEEN. The BSs collect CSI and rate requirement and use them for
the input data of DLEEN to determine the on/off modes of BSs and the transmission power.
rlmin = [Rl1,min, · · · , Rl
K,min]T ∈ RK and the output is the BS on/off mode decision vector
αl ∈ RM . Thus, a mapping function f describing the input-output relationship is given by
αl = f(h1, · · · ,hl, rlmin, · · · , rlmin;θ), (18)
where θ is the set of training parameters (weights and biases) of DLEEN. In the training process,
we find out the optimal f ∗, parameterized by θ∗, minimizing the energy consumption while
satisfying the rate and power constraints (we will say more on this in Section III.D).
The overall process of DLEEN consists of the following steps: 1) the BSs send the collected
CSI and the rate requirement to the digital unit (DU) through the backhaul link, 2) in DU, DLEEN
generates the BS on/off mode decision vector αl from the input data, 3) DLEEN calculates the
power weight pl for the active BSs using the convex optimization technique, and 4) DU sends
αl and pl back to the BSs (see Fig. 2).
A. Basics of the LSTM
In this subsection, we briefly explain the LSTM network. The key ingredients of LSTM block
are cell state cl and three gates, viz., forget gate f l, input gate il, and output gate ol (see Fig.
3). The cell state, serving as a memory to store information of past inputs, sequentially passes
through the forget, input, and output gates. Based on the input vector xl and the previous output
vector zl−1, each gate removes, writes, and reads the information in the cell state to generate
the output vector zl.
9
Fig. 3: Block structure of LSTM network consisting of forget gate, input gate, and output gate.
To be specific, the forget gate f l determines the amount of the previous cell state cl−1 to be
delivered to the current cell state cl. After the forget gate f l, the input gate il decides whether
the information of the input vector xl and the previous output vector zl−1 is transferred to the
cell state cl or not. Lastly, by using the cell state cl, the output gate ol produces the output
vector zl. The forget gate f l, input gate il, and output gate ol are given, respectively, by
f l = σg(Wfx
l + Ufzl−1 + bf
)(19)
il = σg(Wix
l + Uizl−1 + bi
)(20)
ol = σg(Wox
l + Uozl−1 + bo
), (21)
where Wf, Wi, Wo and Uf, Ui, Uo are the weights associated with xl and zl−1, respectively.
Also, bf, bi, and bo are the biases and σg(x) = fsig(x) = 11+e−x
is the sigmoid activation function.
10
Then, the cell state cl is determined by these gates as
cl = f l ◦ cl−1 + il ◦ tanh(Wcx
l + Uczl−1 + bc
), (22)
where Wc and Uc are the weights and bc is the bias. Finally, the output vector zl obtained by
the cell state cl and the output gate vector ol is given by
zl = ol ◦ tanh(cl), (23)
where tanh(x) = ex−e−xex+e−x
is the hyperbolic tangent function.
The forget, input, and output gates are trained through the backpropagation process to extract
the correlation feature of input data. For example, when the mobility of a mobile is high, a
temporal correlation of input data would not be high so that it is desirable to focus on the latest
data rather than the old one. In this case, the forget gate vector f l would be trained to be close
to zero vector (i.e., f l = [0, · · · , 0]T) and the input gate vector il would be trained to be close
to one vector (i.e., il = [1, · · · , 1]T), meaning that the previous cell state information would not
be delivered to the current cell state. On the other hand, when the mobility of mobile is low, a
temporal correlation of input dataset is high and thus the previous data will be used to identify
the correlation structure. In this case, the forget gate would be close to the one vector and the
input gate vector would be trained close to zero vector.
B. DLEEN Architecture
Since the LSTM network is effective in capturing the temporal correlation of input dataset
(i.e., CSI and mobile rate requirement), we use it as a main ingredient of DLEEN. After passing
through the LSTM network, the extracted feature is transformed to the BS on/off mode decision
vector via the feedforward network. Once the on/off mode of BSs is decided, we check the
feasibility of the acquired on/off mode to prevent the scenario where the rate requirement of
mobile is violated. Once the on/off mode passes the feasibility test, meaning that the on/off
mode can satisfy the rate requirement, we allocate the transmission power for the active BSs by
using the linear programming (LP). Detailed architecture of DLEEN is described in Fig. 4.
In the LSTM network, the input vector xl is a composite of CSI hl and rate requirement of
mobile rlmin:
xl =
hl
rlmin
. (24)
11
Fig. 4: Detailed architecture of DLEEN
A consecutive input dataset x1, · · · ,xl passes through the LSTM cells to generate the output
vector zl (see section III.A). Then, the output vector zl passes through the feedforward network.
In each component (i.e., fully connected (FC) layer) of this network, input-output relationship
is
zl = Wdzl + bd, (25)
where Wd and bd are the weight and bias, respectively. After the FC layer, a nonlinear activation
function is applied to zl to determine whether the information is delivered to the next layer or
not. To this end, we use the rectified linear unit (ReLU) function fReLU(x) = max(0, x). The
output zl of the ReLU function zl is
zl = fReLU(zl). (26)
After that, zl passes through the sigmoid layer, generating the output vector αl:
αl = [αl1, · · · , αlM ]T =[fsig(z
l1), · · · , fsig(z
lM)]T, (27)
12
where fsig(x) = 11+e−x
. Since αlm is the output of the sigmoid function for zlm (i.e., αlm =
fsig(zlm)), the range of αlm is from 0 to 1. If αlm is greater than the pre-defined threshold τ , we
set αlm = 1 and turn on the BS m. Otherwise, we set αlm = 0 and turn off the BS m.
C. Feasibility Test and Transmission Power Allocation via Linear Programming
Once the on/off mode vector αl is determined, we allocate the transmission power for the
active BSs using the convex optimization approach. This procedure consists of two main steps:
1) feasibility test to check whether αl satisfies the mobile’s rate requirement and 2) transmission
power allocation. In this subsection, we skip the time slot index l for notational simplicity.
1) Feasibility Test: In this step, we check whether the rate requirement of a mobile is satisfied
for a given α. In case the rate requirement is not satisfied, we measure the degree of infeasibility
dfb, a maximum violation of rate constraint:
dfb = maxk∈U
(Rk,min −Rk
). (28)
Since dfb is the maximum rate constraint violation for all mobiles, one can see that α is feasible
if dfb ≤ 0 and infeasible otherwise. When the rate requirement is violated for a given α, we
update α in the direction of reducing dfb.
The feasibility test problem to minimize dfb for a given α is formulated as
Pfeas : d∗fb = min{pm,k}
maxk∈U
(Rk,min −Rk
)(29a)
s.t. P txm ≤ αmPm,max, m ∈ B (29b)
pm,k ≥ 0, m ∈ B, k ∈ U . (29c)
Note that by concatenating the variables into a vector form (i.e., pk = [p1,k, · · · , pM,k]T, p =
[pT1 , · · · ,pT
K ]T), the rate constraint can be transformed to
Rk ≥ Rk,min ⇐⇒ log2
(1 +
∑Mm=1 pm,kβm,k∑K
j 6=k∑M
m=1 pm,jβm,k + σ2n
)≥ Rk,min (30)
⇐⇒ βTk
(pk −
(2Rk,min − 1
) K∑j 6=k
pj
)≥ σ2
n
(2Rk,min − 1
)(31)
⇐⇒ βTk(d
Tk ⊗ IM)p ≥ σ2
n
(2Rk,min − 1
), (32)
13
where βk = [β1,k, · · · , βM,k]T and dk is a K × 1 vector whose k-th element is one and others
are −(2Rk,min − 1). Using this, Pfeas is reformulated as
Pfeas : d∗fb = minp
maxk∈U
(σ2n
(2Rk,min − 1
)− βT
k(dTk ⊗ IM)p
)(33a)
s.t. (1TK ⊗ eT
m)p ≤ αmηmPm,max, m ∈ B (33b)
p � 0MK , (33c)
where (33b) is a vector form expression of the power constraint. Let dfb = maxk∈U
(σ2n
(2Rk,min −
1)− βT
k(dTk ⊗ IM)p
), then we have
Pfeas : d∗fb = minp,dfb
dfb (34a)
s.t. σ2n
(2Rk,min − 1
)− βT
k(dTk ⊗ IM)p ≤ dfb, k ∈ K (34b)
(1TK ⊗ eT
m)p ≤ αmηmPm,max, m ∈ B (34c)
p � 0MK . (34d)
Since the objective function and constraints are all linear functions of p, Pfeas is an LP and thus
can be easily solved by a convex optimization tool (e.g., CVX [21]).
2) Transmission Power Allocation: If α passes the feasibility test (i.e., d∗fb ≤ 0), we next
find out the optimal power weight {pm,k} for the active BSs. The transmission power allocation
problem Ppa is given by
Ppa : min{pm,k}
M∑m=1
P txm (35a)
s.t. Rk ≥ Rk,min, k ∈ U (35b)
P txm ≤ αmPm,max, m ∈ B. (35c)
pm,k ≥ 0, m ∈ B, k ∈ K. (35d)
Since the maintenance power P onm and the mode transition power P trans
m are functions of α, they
are constant once the on/off mode vector α is decided. Thus, what we need to minimize in Ppa
14
is the transmission power. Using the vectorized expressions (34b) and (34c), we have
Ppa : minp
(1K ⊗ η)Tp (36a)
s.t. βTk(d
Tk ⊗ IM)p ≥ σ2
n
(2Rk,min − 1
), k ∈ U (36b)
(1TK ⊗ eT
m)p ≤ αmηmPm,max, m ∈ B (36c)
p � 0MK , (36d)
where η = [ 1η1, · · · , 1
ηM]T. Since Ppa is an LP, we can easily find out the optimal power weight
p∗.
D. Training DLEEN using Unsupervised Learning
Essential part of DLEEN to satisfy our goal is the training process optimizing the network
parameters θ = {W,U,b}. In the training phase, the network parameters are updated iteratively
to minimize the loss function J(θ) (i.e., θ∗ = arg minθ J(θ)). When J(θ) is differentiable,
network parameters can be updated by the stochastic gradient descent (SGD) method. The update
equation of SGD is
θt+1 = θt − ε∇J(θt), (37)
where ε > 0 is the learning rate determining the step size at each iteration. While computing
the gradients of a large number of parameters of multiple layers is very difficult, thanks to the
backpropagation scheme that sequentially computes the gradient of loss function using the chain
rule, the gradient computation process can be greatly simplified [22]. In this work, we employ
the backpropagation through time (BPTT), a scheme specialized for the gradient computation of
sequential dataset [23]. While the network parameters of the current time slot are updated in the
conventional backpropagation scheme, those of the current and the past time slots are updated
simultaneously in the BPTT. This is because the LSTM cell takes not only the current input
vector xl but also the output vector of the past LSTM cell zl−1 as the input data. By exploiting
the chain rule in the gradient computation, the network parameters of the past and the current
time slots can be updated simultaneously.
1) Loss Function Design: Intriguing feature of the proposed scheme is to use the unsupervised
learning to train the network parameter. Note that the supervised learning requires a large number
of input dataset (CSI and required mobile rate) and labelled output dataset (in our case, optimal
15
BS on/off mode). Unfortunately, obtaining such large dataset is very difficult since it requires
huge data transmission (pilot signal transmission) for the training and also a exhaustive search to
find out the optimal BS on/off mode. We can avoid this hassle using the unsupervised learning
but we need to design the loss function and weight update mechanism carefully.
In this work, we set up the loss function J(θ) as a weighted sum of loss terms for energy
consumption Jon, J trans, J tx, degree of infeasibility J fb, and integer property J int as
J(θ) = Jon + J trans + J tx︸ ︷︷ ︸loss term for energy consumption
+ λfbJfb + λintJ
int︸ ︷︷ ︸loss term for constraints
, (38)
where λfb and λint are the regularization weights. Details are as follows:
• Jon is the loss term for the maintenance power given by
Jon =L∑l=1
M∑m=1
P on,lm (39)
=L∑l=1
M∑m=1
(αlmρ
onm + (1− αlm
)ρoffm )). (40)
• J trans is the loss term for the mode transition power given by
J trans =L∑l=2
M∑m=1
P trans,lm (41)
=L∑l=2
M∑m=1
(αlm − αl−1m
)2ρtransm . (42)
• J tx is the loss term for transmission power given by
J tx =L∑l=1
M∑m=1
P tx,lm (43)
=L∑l=1
M∑m=1
K∑k=1
1
ηmplm,k. (44)
• J fb is the loss term for degree of infeasibility given by
J fb =L∑l=1
1+(dlfb)dlfb, (45)
where dlfb is the degree of infeasibility corresponding to αl. Since the degree of infeasibility
is a measure of rate constraint violation, it should be considered only when αl is infeasible
(i.e., dlfb > 0). To this end, we use the indicator function 1+(dlfb) which has a non-zero
value only if dlfb > 0.
16
• J int is the loss term to enforce αl to be the integer value vector (i.e., αlm ∈ {0, 1}):
J int =L∑l=1
M∑m=1
(αlm − αlm
)2. (46)
It is worth mentioning that each term in the loss function plays a complementary yet crucial
role in accomplishing the mission. For example, if the rate requirement of a mobile is not satisfied
for the current on/off mode decision, J fb will enforce DLEEN to turn on more BSs to satisfy
the rate requirements. In contrast, Jon, J trans, and J tx will enforce DLEEN to turn off more BSs
to minimize the energy consumption. Also, J int will enforce αl to be an integer vector. At the
end of the training process, the harmonized loss terms will pursue an optimization of the energy
consumption while satisfying the rate requirements of mobiles.
2) Network Parameter Training: In order to use the SGD method, we need to ensure that the
loss function is a differentiable function of training parameters. While the update equation of
Jon, J trans, and J int can be easily obtained, such is not the case for J tx and J fb since they are
constructed from the outputs of LP (i.e., pl and dlfb). Note that the optimal solution (and the
corresponding cost) of LP is not a differentiable function of constraint parameters [24].
To address this issue, we use the notion of subgradient, a generalized concept of gradient for
convex nonsmooth function [25].
Definition 1. Let f : X → R be a real-valued convex function defined on a convex open set
X ⊆ RN . Then a vector v ∈ RN is called a subgradient at x0 ∈ X if
∀x ∈ X , f(x)− f(x0) ≥ vT(x− x0). (47)
Also, the set of all subgradients at x0 is called a subdifferential at x0 and is denoted as ∂f(x0).
When the loss function J is not differentiable, we can use the subgradient instead of gradient
to minimize J in a similar way to the gradient descent method [26]. In the following theorem,
we present the subdifferentials of J tx and J fb with respect to θ.
Theorem 1. The subdifferentials of J tx and J fb with respect to θ are given by
∂J tx(θ) =
{−
L∑l=1
λTΠP∂αl
∂θ
∣∣∣λ is the dual solution of Ppa associated with (36c)
}(48)
∂J fb(θ) =
{−
L∑l=1
1+(dlfb)µTΠP
∂αl
∂θ
∣∣∣µ is the dual solution of Pfeas associated with (34c)
},
(49)
17
TABLE I Training process of DLEENInput: CSI {hl}, required mobile rate {rlmin}, LSTM-based BS on/off decision network f ,
learning rate ε, on/off threshold τ , number of time slot L
Initialization: t = 0, θt = θini
Iteration:
1: while θt does not converge do
2: for l = 1, · · · , L do
3: xl = [hl,T, rl,Tmin]T
4: Obtain αl by passing x1, · · · ,xl through f parameterized by θt
5: Discritized αl into αl as αl = 1+sgn(αl−τ)2
6: Compute ∂αl
∂θ
∣∣θ=θt
by using the backpropagation algorithm
7: Solve the feasibility test Pfeas for αl to obtain the dual solution µ
8: Solve the feasibility test Pfeas for αl to obtain the degree of infeasibility dlfb9: Solve the power allocation problem Ppa to obtain the optimal power weight pl and
the dual solution λ
10: end for
11: Compute ∂Jon
∂θ, ∂J trans
∂θ, and ∂J int
∂θ
12: ∂J tx
∂θ= −
∑Ll=1 λ
TΠP∂αl
∂θ
13: ∂J fb
∂θ= −
∑Ll=1 1+(dlfb)µ
TΠP∂αl
∂θ
14: ∇J(θt) = ∂Jon
∂θ
∣∣θ=θt
+ ∂J trans
∂θ
∣∣θ=θt
+ ∂J tx
∂θ
∣∣θ=θt
+ ∂J fb
∂θ
∣∣θ=θt
+ ∂J int
∂θ
∣∣θ=θt
15: θt+1 = θt − ε∇J(θt)
16: t = t+ 1
17: end while
where Π = diag(η1, · · · , ηM) and P = diag(P1,max, · · · , PM,max).
Proof. See Appendix A.
Once the gradient of total loss function ∇J(θ) is obtained, we can update θ by using the
SGD method (see (37)). The training process of DLEEN is summarized in Table 1.
18
IV. PRACTICAL ISSUES FOR THE DLEEN IMPLEMENTATION
In this section, we discuss two practical issues when we use the DLEEN scheme. We first
discuss the training data acquisition issue and then described the offline training issue.
A. Training Data Acquisition
In order to find out an optimal on/off mode decision of BSs minimizing the energy consumption
of UDN, huge amount of training data (i.e., CSI and mobile rate requirement) is needed. Note that
the network trained with insufficient amount of dataset might not converge or can be overfitted
to the specific dataset, limiting the energy saving gain of UDN severely. In practice, however,
acquisition of the training dataset is very difficult since the data (channel estimate extracted
from the received pilot signal) should be collected from a large number of small cells for a long
period of time to cover wide variety of wireless environments.
To circumvent this issue, we use the synthetically generated training dataset in this work.
Specifically, we generate the UDN simulator where the small cells and mobiles are randomly
distributed. The channel between each BS and mobile is generated using this simulator. One
might concern that the synthetically generated channels might be different from the real channels.
While it is true, fortunately, this issue is not so critical since the large-scale fading coefficient,
a function of path loss and shadowing, depends heavily on the communication distance. By
applying a realistic path loss model for the synthetic data generation (we will say more on this
in Section V), we can mitigate the mismatch between the real and the synthetically generated
datasets.
B. Offline Training Process
Another important issue when applying the DNN technique for the wireless systems is a huge
computational overhead occurring in the training process. Note that the DNN includes multiple
hidden layers consisting of weight matrices and bias vectors. Since these large number of network
parameters are updated simultaneously through the backpropagation, computation time and costs
of training process are considerable2. This issue is even more serious in the proposed DLEEN
2Even when using the deep learning server for the network training, the training process takes several hours. Since most of
the decisions in wireless systems should be made in a few milliseconds, online training process is not suitable for the deep
learning-based wireless systems.
19
scheme since the network parameters of the previous and current time slots are updated at the
same time through BPTT. To deal with this issue, we use the offline training scheme together
with the synthetically generated dataset. Basically, we train the multiple DLEEN instances, each
of which is designed for distinct settings in terms of BS number, mobile number, and noise
power. In doing so, we can obtain several DLEEN instances optimized for each communication
scenario. In real applications, we simply choose the pre-trained network matching to the real
UDN environment.
V. SIMULATIONS AND DISCUSSIONS
A. Simulation Setup
We consider the UDN scenario where M small cells cooperatively serve K mobiles. The small
cells are randomly distributed in a square area of D ×D km2 and the mobiles are moving at a
constant speed v. In the fading channel model, the small-scale fading coefficient gm,k is generated
according to the complex Gaussian distribution (i.e., gm,k ∼ CN (0, 1)) and the large-scale fading
coefficient βm,k accounting for the path loss and the shadow fading is given by
βlm,k = PLlm,k × 10σshzm,k
10 , (50)
where PLlm,k represents the path loss at the time slot l and 10σshzm,k
10 represents the shadow
fading where σsh is the standard deviation and zm,k ∼ N (0, 1). As for the path loss, we use a
three-slope path loss model given by [27]
PLlm,k =
−L− 35 log10(d
lm,k) if dlm,k > d1
−L− 15 log10(d1)− 20 log10(dlm,k) if d0 < dlm,k ≤ d1
−L− 15 log10(d1)− 20 log10(d0) if dm,k ≤ d0
(51)
where dlm,k is the distance between the BS m and the mobile k at time slot l and
L = 46.3 + 33.9 log10(f)− 13.82 log10(hB)− (1.1 log10(f)− 0.7)hU + (1.56 log10(f)− 0.8),
(52)
where f is the carrier frequency (in MHz), hB and hU are the heights of BS and mobile,
respectively. The system parameters are summarized in Table II [28].
The proposed DLEEN consists of 3 LSTM layers followed by the feedforward network
consisting of 5 stacks of FC, ReLU layers (see Fig. 4). In each hidden layer, we set the width
20
TABLE II: System parameters
Parameters Values
Carrier frequency (f ) 1.9GHz
BS height (hB) 15m
Mobile height (hU ) 1.65m
Service area radius (D) 200m
Path loss variable (d0) 10m
Path loss variable (d1) 50m
Shadow fading deviation (σsh) 3 dB
Number of small cells (M ) 8
Number of mobiles (K) 4
Number of time slots (L) 50
Amplifier efficiency (η) 0.25
On mode BS power (ρon) 6.8W
Off mode BS power (ρoff) 4.3W
Maximum transmission power (Pmax) 1W
Mode transition power (ρtrans) 3W
Mobile mobility (v) 0 ∼ 6m/s
W = 512 and the sequence length 5. Also, when discretizing the output of neural network αl to
the BS on/off mode vector αl, we use the threshold value τ = 0.5. For the network parameter
training, we use the Adam optimizer, a well-known optimization tool to guarantee the robustness
of learning process [29]. We compare the proposed DLEEN scheme with four conventional BS
on/off strategies: 1) full association strategy where all the BSs are turned on, 2) traffic load-
based on/off strategy where the BSs with low traffic load are turned off [30], 3) sequential on/off
strategy where the BS having the minimum impact on the energy consumption is turned off one
after another until it reaches to point where the mobile’s rate requirement is violated [12], and 4)
mixed-integer linear programming (MILP)-based on/off strategy where the BS on/off mode and
the corresponding transmission power are optimized simultaneously at each time slot. In each
point of the simulation figure, we plot the average power consumption of L = 50 time slots.
21
0 5 10 15
Training epoch 104
46
47
48
49
50
51
52
53
Va
lue
of
loss f
un
ctio
n
43
43.5
44
44.5
45
45.5
46
46.5
47
47.5
Ave
rag
e p
ow
er
co
nsu
mp
tio
n
Total loss function
Average power consumption
Fig. 5: Average power consumption, degree of infeasibility, and total loss function as a function
of training epoch (M = 8, K = 4, L = 50, SNR = 10 dB, and Rmin = 0.5 bps/Hz).
B. Simulation Results
In Fig. 5, we plot the average power consumption and loss function value of DLEEN as
a function of the training epoch3 when SNR = 10 dB and Rmin = 0.5 bps/Hz. We observe
that as the training epoch increases, both the loss function and the average power consumption
decrease simultaneously, which implies that the proposed training process is working properly
and effective in reducing the energy consumption of the network.
In Fig. 6, we plot the average power consumption as a function of mobile’s rate requirement
when SNR = 10 dB. We observe that the proposed DLEEN achieves a significant energy saving
over the conventional on/off schemes. For example, when Rmin = 0.3 bps/Hz, DLEEN saves
more than 24% energy over the full association scheme, 27% over the traffic-load based on/off
scheme, and 12.4% over the sequential on/off scheme. Even when compared to the MILP-based
on/off scheme, DLEEN saves around 11% of energy. This is because the conventional on/off
schemes control the instantaneous power (i.e., maintenance power P onm , P off
m and transmission
power P txm) only but the proposed DLEEN scheme controls the instantaneous power as well as
3Epoch is a measure of training time in DNN. One epoch means that the network has been trained once using the given
dataset
22
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6
Rate requirement (bps/Hz)
35
40
45
50
55
60A
vera
ge p
ow
er
consum
ption (
W/tim
e s
lot)
Proposed DLEEN
MILP-based on/off
Sequential on/off
Traffic load-based on/off
Full association
Fig. 6: Average power consumption as a function
of rate requirement Rmin (M = 8, K = 4, L =
50, and SNR = 10 dB)
6 7 8 9 10 11 12 13 14
SNR (dB)
35
40
45
50
55
60
Avera
ge p
ow
er
consum
ption (
W/tim
e s
lot)
Proposed DLEEN
MILP-based on/off
Sequential on/off
Traffic load-based on/off
Full association
Fig. 7: Average power consumption as a function
of SNR (M = 8, K = 4, L = 50, and Rmin =
0.3 bps/Hz)
the transition power P transm . Note that the traffic load-based on/off scheme consumes even more
energy than the full association since it produces considerable amount of energy caused by the
frequent mode transition.
In Fig. 7, we plot the average power consumption as a function of signal-to-noise ratio (SNR).
In this figure, we set the rate requirement of mobile into Rmin = 0.3 bps/Hz. We observe that
the energy saving of the proposed DLEEN scheme increases with the SNR. For instance, the
energy saving of DLEEN over the full association scheme is around 6% when SNR = 6 dB but
it increases to 30% when SNR = 14 dB. This is because only a small number of BSs is required
to serve the mobiles in the high SNR regime (see. Fig. 8), and thus the energy saving obtained
by turning off the unnecessary BSs is improving at high SNR. Interestingly, the energy saving
of DLEEN over the conventional on/off schemes also increases with the SNR. For example, the
energy saving of DLEEN over the MILP-based on/off scheme increases from 4% to 8% when
the SNR increases from 6 dB to 14 dB. Note that when the SNR is high, the conventional on/off
schemes usually choose active BSs near the mobile. Hence, when the mobile is moving and thus
changing the location, the active BS set is also changing, causing an increase in the transition
power.
In Fig. 8, we plot the percentage of active BSs as a function of rate requirement when
23
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6
Rate requirement (bps/Hz)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1P
erc
enta
ge o
f active b
ase s
tations
Proposed DLEEN
MILP-based on/off
Sequential on/off
Traffic load-based on/off
Full association
Fig. 8: Percentage of active BSs as a function
of the rate requirement Rmin (M = 8, K = 4,
L = 50, and SNR = 10 dB)
6 6.5 7 7.5 8 8.5 9 9.5 10
Number of base stations
30
35
40
45
50
55
60
65
70
75
Avera
ge p
ow
er
consum
ption (
W/tim
e s
lot) Proposed DLEEN
MILP-based on/off
Sequential on/off
Traffic load-based on/off
Full association
Fig. 9: Average power consumption as a function
of number of BSs M (K = 4, L = 50, SNR =
10 dB, and Rmin = 0.3 bps/Hz).
SNR = 10 dB. Note that the proposed DLEEN scheme turns on more BSs than the conventional
on/off schemes. For instance, when the rate requirement is 0.5 bps/Hz, DLEEN turns on 40%
of BSs while the MILP-based on/off scheme turns on only 30% of BSs. Since the saving of
transition power outweighs the maintenance power of turning on slightly more BSs, DLEEN
consumes 11% less energy than the MILP-based on/off scheme (see Fig. 6). We also note that
even in the case where the number of active BSs is similar (i.e., Rmin = 0.3 bps/Hz), DLEEN
saves more than 7% of energy over the MILP-based on/off scheme and 9% of energy over the
sequential on/off scheme, which implies that DLEEN picks better set of active BSs minimizing
the cumulative energy consumption.
In Fig. 9, we change the number of BSs and then plot the average power consumption when
SNR = 10 dB and Rmin = 0.3 bps/Hz. We observe that as the number of BSs increases, the
energy saving of DLEEN over the full association scheme increases. To be specific, when the
number of BSs increases from 6 to 14, the energy saving of DLEEN over the full association
scheme increases from 22% to 27%. Since we only need to turn on a few BSs to serve the
mobile, the energy saving obtained from the BS sleeping increases when the number of BSs is
large. Moreover, since we choose the active BSs from increased number of total BSs, we can
obtain the additional gain from the cooperation diversity. In fact, when the number of BSs is
24
10, the energy consumption of DLEEN is more or less similar to that of full association scheme
when the number of BSs is 7, which demonstrates that the proposed DLEEN is effective in
reducing the energy consumption in UDN.
VI. CONCLUSION
In this paper, we proposed a DNN-based framework to save the energy consumption in UDN.
In the proposed deep learning-based energy-efficient network (DLEEN), the BS on/off mode is
determined through the DNN first and then the transmission power is allocated to the active
BSs via the convex optimization technique. Key ingredient of DLEEN is the LSTM network
that exploits the temporally correlated feature of CSIs for the BS on/off mode decision. DLEEN
controls not only the instantaneous power but also the mode transition power, resulting in a
substantial reduction of cumulative energy consumption. From the simulations on realistic UDN
environments, we observe that the proposed DLEEN scheme saves a considerable amount of
power over the conventional on/off techniques. In this paper, we restricted our attention to the
BS sleep mode technique but we believe that there are many interesting research extensions of
the proposed scheme such as cognitive radio access, user scheduling, and resource allocation.
APPENDIX A
PROOF OF THEOREM 1
Note that the total transmission power P tx,l =∑M
m=1 Ptx,lm =
∑Mm=1
∑Kk=1
1ηmplm,k and the
degree of infeasibility dlfb are the optimal values of Ppa and Pfeas, respectively. Also, αl is the
coefficients of right-hand side (RHS) vector of power constraints. It is well-known that for a
bounded feasible LP, the optimal value is a convex function of the RHS vector [31, Theorem 2].
Also, it is well-known that the subdifferential of optimal value of LP with respect to RHS vector
is equal to the set of dual solutions [32, Theorem 1]. Using these, we obtain the subgradients
of P tx,l and dlfb with respect to αl as
∂P tx,l
∂αl=∂P tx,l
∂b
∂b
∂αl= −λTΠP (53)
∂dlfb∂αl
=∂dlfb∂b
∂b
∂αl= −µTΠP, (54)
25
where b = [αl1η1P1,max, · · · , αlMηMPM,max] is the common RHS vector of Ppa and Pfeas related
to the power constraints. From (53), we obtain the subgradient of J tx as
∂J tx
∂θ=
∂
∂θ
L∑l=1
(M∑m=1
K∑k=1
1
ηmplm,k
)(55)
=L∑l=1
∂P tx,l
∂θ(56)
=L∑l=1
∂P tx,l
∂αl
∂αl
∂θ(57)
= −L∑l=1
λTΠP∂αl
∂θ. (58)
Also, from (54), the subgradient of J fb is computed as
∂J fb
∂θ=
∂
∂θ
L∑l=1
1+(dlfb)dlfb (59)
=L∑l=1
1+(dlfb)∂dlfb∂θ
(60)
=L∑l=1
1+(dlfb)∂dlfb∂αl
∂αl
∂θ(61)
= −L∑l=1
1+(dlfb)µTΠP
∂αl
∂θ. (62)
REFERENCES
[1] J. Son, S. Kim, and B. Shim, “Energy efficient ultra-dense network using long short-term memory,” to appear in IEEE
Wireless Commun. and Networking Conf. (WCNC), April. 2020.
[2] X. Ge, S. Tu, G. Mao, C.-X. Wang, and T. Han, “5G ultra-dense cellular networks,” IEEE Wireless Commun., vol. 23,
no. 1, pp. 72–79, 2016.
[3] S. Kim, J. W. Choi, and B. Shim, “Downlink pilot precoding and compressed channel feedback for FDD-based cell-free
systems,” to appear in IEEE Trans. on Wireless Commun.
[4] M. Kamel, W. Hamouda, and A. Youssef, “Ultra-dense networks: A survey,” IEEE Commun. Surveys & Tutorials, vol. 18,
no. 4, pp. 2522–2545, 2016.
[5] I. Chih-Lin, C. Rowell, S. Han, Z. Xu, G. Li, and Z. Pan, “Toward green and soft: a 5G perspective,” IEEE Commun.
Mag., vol. 52, no. 2, pp. 66–73, 2014.
[6] A. P. Bianzino, C. Chaudet, D. Rossi, J.-L. Rougier et al., “A survey of green networking research,” IEEE Commun.
Surveys & Tutorials, vol. 14, no. 1, pp. 3–20, 2012.
26
[7] G. Y. Li, Z. Xu, C. Xiong, C. Yang, S. Zhang, Y. Chen, and S. Xu, “Energy-efficient wireless communications: tutorial,
survey, and open issues,” IEEE Wireless Commun., vol. 18, no. 6, pp. 28–35, 2011.
[8] D. Feng, C. Jiang, G. Lim, L. J. Cimini, G. Feng, and G. Y. Li, “A survey of energy-efficient wireless communications,”
IEEE Commun. Surveys & Tutorials, vol. 15, no. 1, pp. 167–178, 2013.
[9] C. Han, T. Harrold, S. Armour, I. Krikidis, S. Videv, P. M. Grant, H. Haas, J. S. Thompson, I. Ku, C.-X. Wang et al.,
“Green radio: radio techniques to enable energy-efficient wireless networks,” IEEE Commun. Mag., vol. 49, no. 6, 2011.
[10] J. Wu, Y. Zhang, M. Zukerman, and E. K.-N. Yung, “Energy-efficient base-stations sleep-mode techniques in green cellular
networks: A survey,” IEEE Commun. Surveys & Tutorials, vol. 17, no. 2, pp. 803–826, 2015.
[11] C. Liu, B. Natarajan, and H. Xia, “Small cell base station sleep strategies for energy efficiency,” IEEE Trans. Veh. Technol.,
vol. 65, no. 3, pp. 1652–1661, 2015.
[12] E. Oh and K. Son, “A unified base station switching framework considering both uplink and downlink traffic,” IEEE
Wireless Commun. Letters, vol. 6, no. 1, pp. 30–33, 2016.
[13] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[14] W. Kim, Y. Ahn and B. Shim, “Deep neural network-based active user detection for grant-free NOMA systems,” IEEE
Trans. on Commun., vol. 68, no. 4, pp. 2143–2155, 2020.
[15] B. Dai and W. Yu, “Energy efficiency of downlink transmission strategies for cloud radio access networks,” IEEE J. Sel.
Topics Commun., vol. 34, no. 4, pp. 1037–1050, 2016.
[16] N. Jalden, P. Zetterberg, B. Ottersten, A. Hong, and R. Thoma, “Correlation properties of large scale fading based on
indoor measurements,” in 2007 IEEE Wireless Commun. and Networking Conf., 2007, pp. 1894–1899.
[17] H. Pervaiz, O. Onireti, A. Mohamed, M. A. Imran, R. Tafazolli, and Q. Ni, “Energy-efficient and load-proportional eNodeB
for 5G user-centric networks: A multilevel sleep strategy mechanism,” IEEE Veh. Technol. Mag., vol. 13, no. 4, pp. 51–59,
2018.
[18] P. Asbeck and Z. Popovic, “ET comes of age: Envelope tracking for higher-efficiency power amplifiers,” IEEE Microwave
Mag., vol. 17, no. 3, pp. 16–25, 2016.
[19] A. Conte et al., “Power consumption of base stations,” in TREND Plenary Meeting, 2012.
[20] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, “A deep reinforcement learning based framework for power-efficient
resource allocation in cloud RANs,” in Proc. IEEE Int. Conf. on Commun. (ICC), 2017, pp. 1–6.
[21] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming.”
[22] D. E. Rumelhart, G. E. Hinton, R. J. Williams et al., “Learning representations by back-propagating errors,” Cognitive
Modeling, vol. 5, no. 3, p. 1, 1988.
[23] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proc. IEEE, vol. 78, no. 10, pp. 1550–1560,
1990.
[24] T. Gal and J. Nedoma, “Multiparametric linear programming,” Management Science, vol. 18, no. 7, pp. 406–422, 1972.
[25] R. T. Rockafellar, Convex analysis (No. 28). Princeton university press, 1970.
[26] S. Boyd, L. Xiao, and A. Mutapcic, “Subgradient methods,” lecture notes of EE392o, Stanford University, Autumn Quarter,
vol. 2004, pp. 2004–2005, 2003.
[27] A. Tang, J. Sun, and K. Gong, “Mobile propagation loss with a low base station antenna for NLOS street microcells in
urban area,” in Proc. IEEE Veh. Technol. Conf. (VTC), Sep. 2001, pp. 333–336.
[28] G. Auer, V. Giannini, C. Desset, I. Godor, P. Skillermark, M. Olsson, M. A. Imran, D. Sabella, M. J. Gonzalez, O. Blume
et al., “How much energy is needed to run a wireless network?” IEEE Wireless Commun., vol. 18, no. 5, pp. 40–49, 2011.
[29] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
27
[30] Y. S. Soh, T. Q. Quek, M. Kountouris, and H. Shin, “Energy efficient heterogeneous cellular networks,” IEEE J. Sel. Topics
Commun., vol. 31, no. 5, pp. 840–850, 2013.
[31] J. E. Ward and R. E. Wendell, “Approaches to sensitivity analysis in linear programming,” Annals of Operations Research,
vol. 27, no. 1, pp. 3–38, 1990.
[32] M. Akgul, “A note on shadow prices in linear programming,” Journal of the Operational Research Society, vol. 35, no. 5,
pp. 425–431, 1984.