energy-efﬁcient ultra-dense network using lstm-based deep

1

Energy-Efficient Ultra-Dense Network Using

LSTM-based Deep Neural Network

Seungnyun Kim, Junwon Son and Byonghyo Shim

Seoul National University, Seoul, Korea

Email: {snkim, jwson, bshim}@islab.snu.ac.kr

Abstract

As a means to achieve thousand-fold throughput improvements of future wireless communications,

ultra-dense network (UDN) where a large number of small cells are densely deployed on top of the

macro cells has received great deal of attention in recent years. Despite a variety of benefits that UDN

offers, intensive deployment of small cells may pose a serious concern in the energy consumption. Over

the years, to reduce the energy consumption of UDN, an approach that turns off the lightly loaded

base stations (BSs) has been proposed. However, determining the proper on/off modes of BSs is a

challenging problem due to the huge computational overhead and inefficiency caused by the delayed

decision. An aim of this paper is to propose a deep neural network (DNN)-based framework to achieve

reduction of energy consumption in UDN. By cascading the long short-term memory (LSTM) to extract

the temporally correlated features from the channel information and the feedforward network to make

BS on/off mode decision, we can control the on/off modes of BSs, thereby achieving a considerable

reduction of the cumulative energy consumption. From the extensive simulations, we demonstrate that

the proposed technique is effective in reducing the energy consumption of UDN.

This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded

by the Korea government(MSIT) (No.2018-0-01410, Development of Radio Transmission Technologies for High Capacity and

Low Cost in Ultra Dense Networks).

Parts of this paper will be appeared at the WCNC, 2020 [1].

2

Macro cell

Micro cell

Femto cell

Mobile

Digital Unit

Fig. 1: Illustration of UDN: the BSs are connected to digital unit (DU) via backhaul link.

I. INTRODUCTION

In recent years, ultra-dense network (UDN) where a large number of small cells (i.e., pico

cell and femto cell) are densely deployed (more than 103 cells/km2) on top of the macro cells

has received a great deal of attention as a means to improve the network capacity of future

wireless communication systems [2], [3]. UDN shortens the physical distance between the base

station (BS) and the mobile device, resulting in a reduction of the path loss caused by wireless

transmission, in particular for the millimeter wave (mmWave) signal transmission. Further, it

enables an aggressive reuse of frequency resources, thereby achieving a significant improvement

in the quality of service (QoS) of wireless networks [4]. One potential problem of UDN is that

an intensive deployment of small cells will increase the energy consumption of the network [5].

In fact, surge of energy consumption due to the use of a large number of small and macro cells

is a heavy burden for the network operators since it increases the operational expense (OPEX) to

a large extent [6]. Pursuing an enhancement in the energy efficiency is, therefore, an inevitable

option to ensure the sustainability of wireless network [7], [8].

Among several factors contributing to the energy consumption of the cellular systems, by

3

far the most dominant one is the BS (around 60% of the energy consumption [9]). Over the

years, a technique so-called sleep mode technique that turns off the lightly loaded BSs has been

proposed to reduce the energy consumption of BS [10]–[12]. Main idea of this approach is to

selectively switch off the underutilized BSs. In [11], an approach to randomly shut off the BSs

while guaranteeing the coverage of a network has been proposed. In [12], an approach that

iteratively turns off the BSs one after another while satisfying the required mobile rate has been

proposed. Well-known drawback of this approach is that the computational complexity increases

exponentially with the number of BSs, limiting the effectiveness of this approach.

The primary purpose of this paper is to put forth an entirely new approach based on deep

neural network (DNN) to achieve a reduction of energy consumption in UDN. While the

conventional approaches aim to minimize the instantaneous energy consumption using a heuristic

approach and thus incur a waste of energy due to the frequent mode transition (on to off

and vice versa), the proposed scheme, henceforth referred to as deep learning-based energy-

efficient network (DLEEN), achieves a reduction of the total energy consumption over a long-

term operational period. Among various DNN techniques, we use the long short-term memory

(LSTM), a technique specialized for extracting features from the sequential data as a main

engine [13]. We use the LSTM to extract temporally correlated features (e.g., angle of departure

(AoD), delay spread, and path gain) from the channel state information (CSI). Using the extracted

features, DLEEN makes a fast yet accurate on/off decision of BSs minimizing the cumulative

energy consumption [14]. As a result, the baseband processing block as well as the analog block

(i.e., PA and RF filter) can be switched off, achieving a substantial energy saving in UDN.

From the simulation results in the realistic UDN environment, we demonstrate that the pro-

posed DLEEN technique satisfies the QoS of UDN with only a small portion of active BSs,

reducing the energy consumption substantially. For example, DLEEN scheme saves around 30%

of energy against the full-association scenario where all the BSs are turned on. Even when

compared to the conventional approach based on the optimization technique, energy saving of

DLEEN is substantial since it takes into account the transition power of the switching BSs.

The rest of this paper is organized as follows. In Section II, we briefly introduce the system

model for UDN. In Section III, we provide the basics of LSTM and the LSTM-based framework

for energy-efficient UDN. In Section IV, we provide the practical issues for the implementation

of DLEEN. In Section V, we present the simulation results of the proposed DLEEN and conclude

the paper in Section VI.

4

Notations: Lower and upper case symbols are used to denote vectors and matrices, respectively.

The superscripts (·)T and (·)H denote transpose and Hermitian transpose, respectively. ⊗ and

◦ denote the Kronecker product and the Hadamard product, respectively. ‖x‖ is used as the

Euclidean norm of a vector x. Also, diag (X1,X2) denotes a block diagonal matrix whose

diagonal elements are X1 and X2. em = [0, · · · , 1]T is an M × 1 vector whose m-th element is

one and otherwise zero. 1+(x) is an indicator function whose value is 1 if x > 0 and otherwise

0. In addition, 0K and 1K denote K × 1 zero vector and one vector, respectively.

II. ULTRA-DENSE NETWORK SYSTEM

A. System Model of Ultra-Dense Network

In this subsection, we discuss the UDN system model. When compared to the conventional

cellular networks in which a single BS serves whole mobiles in a cell, a group of BSs coopera-

tively serves mobiles in UDN. We consider the downlink transmission where M BSs equipped

with a single transmit antennas cooperatively serve K mobiles equipped with a single antenna1.

The sets of BSs and mobiles are denoted as B = {1, · · · ,M} and U = {1, · · · , K}, respectively.

Note that B consists of Bon (set of turned on BSs) and Boff (set of turned off BSs). To indicate

the on/off modes of BSs at time slot l, we use a binary vector αl = [αl1, · · · , αlM ]T defined as

αlm =

1 if BS m is active

0 otherwise., m ∈ B. (1)

We use a block-fading channel model where the downlink channel vector hlm,k ∈ C between

the BS m and the mobile k at time slot l is given by

hlm,k =√βlm,kg

lm,k, (2)

where βlm,k ∈ R is the large-scale fading coefficient reflecting the path-loss and the shadowing

effect and glm,k ∈ C is the small-scale fading coefficient vector. We assume that the small

scale fading coefficients are independent and identically distributed (i.i.d.) complex Gaussian

random variables (i.e., glm,k ∼ CN (0, 1)). Note that the sequence of large-scale fading coefficients

{βlm,k}Ll=1 is temporally correlated since the change of βlm,k, determined primarily by the path-loss

and shadowing effect, is induced by the mobility of a mobile [16].

1Note that the proposed scheme can be easily extended to the multi-antenna systems [15].

5

In this setting, the transmit signal xlm ∈ C of the BS m at time slot l is

xlm =K∑k=1

√plm,ks

lk, (3)

where plm,k ∈ R+ ∪ {0} is the power weight between the BS m and the mobile k and slk ∈ C is

the data symbol for the mobile k. The received signal ylk ∈ C of the mobile k at time slot l is

ylk =M∑m=1

hl,Hm,kxlm + nlk (4)

=M∑m=1

√plm,kh

l,Hm,ks

lk +

K∑j 6=k

M∑m=1

√plm,jh

l,Hm,ks

lj + nlk, (5)

where nlk ∼ CN (0, σ2n) is the additive Gaussian noise at time slot l. The corresponding rate of

the mobile k at time slot l is

Rlk = log2

1 +

E[∣∣∣∑M

m=1

√plm,kh

l,Hm,k

∣∣∣2]∑K

j 6=k E[∣∣∣∑M

m=1

√plm,jh

l,Hm,k

∣∣∣2]+ σ2n

(6)

= log2

1 +

∑Mm=1 p

lm,kE

[∣∣hlm,k∣∣2]∑Kj 6=k∑M

m=1 plm,jE

[∣∣hlm,k∣∣2]+ σ2n

(7)

= log2

(1 +

∑Mm=1 p

lm,kβ

lm,k∑K

j 6=k∑M

m=1 plm,jβ

lm,k + σ2

n

). (8)

B. Power Consumption Model and Energy Minimization Problem Formulation

In this subsection, we explain the BS power consumption model and then formulate the energy

minimization problem. The power consumption at the BS consists of three major elements: 1)

transmission power P tx,lm consumed by the power amplifier and the RF circuitry, 2) maintenance

power P on,lm consumed by the power supply and the air conditioning, and 3) mode transition power

P trans,lm consumed when the BS is switched on and off [17]. Thus, the total power consumption

of the BS m at time slot l is P lm = P tx,l

m + P on,lm + P trans,l

m .

To be specific, the transmission power P tx,lm of the BS m at time slot l is

P tx,lm =

1

ηmE[∣∣xlm∣∣2] (9)

=1

ηm

K∑k=1

plm,k, (10)

6

where ηm ∈ [0, 1] the amplifier efficiency of the BS m. In general, depending on the amplifier

efficiency, only 40 ∼ 50% of the total transmission power is transmitted and the rest is dissipated

by the heat [18]. In a micro cell, for example, around 50% of the total power is consumed by

the transmission power [19].

Second, the maintenance power P on,lm of the BS m at time slot l, consumed for the air

conditioning and the power supply, is expressed as

P on,lm =

ρonm if BS m is active

ρoffm otherwise.

(11)

= αlmρonm + (1− αlm)ρoff

m , (12)

where ρonm and ρoff

m are the power consumption when the BS m is turned on and off, respectively.

One might guess that P on,lm is small, but in fact almost 35% of the total power is consumed at

the BS for the maintenance [10].

Lastly, the mode transition power P trans,lm of the BS m at time slot t, consumed to switch on

and off BS, is

P trans,lm =

ρtransm if BS m switches the mode (on to off and vice versa)

0 otherwise.(13)

=(αlm − αl−1m

)2ρtransm , (14)

where ρtransm is the power consumed by the mode transition at the BS m. In general, about 15%

of the total power is consumed by the mode transition [20]. Combining (10), (12), and (14), the

total power consumption P lm of the BS m at time slot l can be expressed as

P lm =

1

ηm

K∑k=1

plm,k +(αlmρ

onm + (1− αlm)ρoff

m

)+(αlm − αl−1m

)2ρtransm . (15)

While many of the conventional BS sleep mode techniques focused on the reduction of

the instantaneous power consumption [10], we pursue a reduction of the cumulative energy

consumption over a long-term period. To this end, we minimize the sum of power consumption

of every BSs during L time slots:

Etotal =L∑l=1

M∑m=1

P lm. (16)

7

The corresponding energy minimization problem is formulated as

P1 : min{αlm, plm,k}

Etotal (17a)

s.t. Rlk ≥ Rl

k,min, k ∈ U , l ∈ L (17b)

P tx,lm ≤ αlmPm,max, m ∈ B, l ∈ L (17c)

αlm ∈ {0, 1}, m ∈ B, l ∈ L (17d)

plm,k ≥ 0, m ∈ B, k ∈ U , l ∈ L (17e)

where L = {1, · · · , L}, Rlk,min is the rate requirement of the mobile k at time slot l, and Pm,max

is the maximum transmission power of the BS m. Note that αlm is applied to the maximum

transmission power in (17c) so that only the active BSs transmit the signal.

It is worth mentioning that αl is a binary vector and thus, P1 is classified as a mixed-integer

programming problem. To solve the problem, we basically need to explore humongous decision

combinations. For example, if there are 20 coordinated BSs, the size of decision set would be

220 ≈ 106 so that the computational overhead to find out the optimal on/off modes of BSs would

be prohibitive. Further, the optimal solution of P1 can be found only when the CSIs of whole

L time slots are available, which is not realistic due to the causality issue.

III. DEEP LEARNING-BASED ENERGY-EFFICIENT NETWORK USING LONG SHORT-TERM

MEMORY

The primary goal of the proposed DLEEN scheme is to find out the BS on/off modes {αlm} and

the power weights between active BSs and mobiles {plm,k} minimizing the energy consumption.

Since the transmission power is allocated only for the active BSs, we cannot determine the

on/off mode and the power weight concurrently. To address this issue, DLEEN uses a two-stage

processing where the BS on/off mode is determined through the DNN first and then the power

weight for the active BSs is obtained by solving the convex optimization problem.

In the BS on/off mode decision, DLEEN exploits the combination of LSTM and feedforward

neural network to learn the complicated nonlinear mapping between the CSI, the required

mobile rate, and the optimal BS on/off mode decision. To be specific, the input is the com-

bination of the CSI hl = [βl1,1, · · · , βlM,K ]T ∈ RMK and the rate requirements of mobiles

8

LSTM-based BS on/off mode

decision

Transmission power allocation

DEENInput data (i.e., CSI �

required data rate ��)

Control command

(i.e., on/off mode �,

power weight �)

On/off mode �

Fig. 2: System structure of DLEEN. The BSs collect CSI and rate requirement and use them for

the input data of DLEEN to determine the on/off modes of BSs and the transmission power.

rlmin = [Rl1,min, · · · , Rl

K,min]T ∈ RK and the output is the BS on/off mode decision vector

αl ∈ RM . Thus, a mapping function f describing the input-output relationship is given by

αl = f(h1, · · · ,hl, rlmin, · · · , rlmin;θ), (18)

where θ is the set of training parameters (weights and biases) of DLEEN. In the training process,

we find out the optimal f ∗, parameterized by θ∗, minimizing the energy consumption while

satisfying the rate and power constraints (we will say more on this in Section III.D).

The overall process of DLEEN consists of the following steps: 1) the BSs send the collected

CSI and the rate requirement to the digital unit (DU) through the backhaul link, 2) in DU, DLEEN

generates the BS on/off mode decision vector αl from the input data, 3) DLEEN calculates the

power weight pl for the active BSs using the convex optimization technique, and 4) DU sends

αl and pl back to the BSs (see Fig. 2).

A. Basics of the LSTM

In this subsection, we briefly explain the LSTM network. The key ingredients of LSTM block

are cell state cl and three gates, viz., forget gate f l, input gate il, and output gate ol (see Fig.

3). The cell state, serving as a memory to store information of past inputs, sequentially passes

through the forget, input, and output gates. Based on the input vector xl and the previous output

vector zl−1, each gate removes, writes, and reads the information in the cell state to generate

the output vector zl.

9

Fig. 3: Block structure of LSTM network consisting of forget gate, input gate, and output gate.

To be specific, the forget gate f l determines the amount of the previous cell state cl−1 to be

delivered to the current cell state cl. After the forget gate f l, the input gate il decides whether

the information of the input vector xl and the previous output vector zl−1 is transferred to the

cell state cl or not. Lastly, by using the cell state cl, the output gate ol produces the output

vector zl. The forget gate f l, input gate il, and output gate ol are given, respectively, by

f l = σg(Wfx

l + Ufzl−1 + bf

)(19)

il = σg(Wix

l + Uizl−1 + bi

)(20)

ol = σg(Wox

l + Uozl−1 + bo

), (21)

where Wf, Wi, Wo and Uf, Ui, Uo are the weights associated with xl and zl−1, respectively.

Also, bf, bi, and bo are the biases and σg(x) = fsig(x) = 11+e−x

is the sigmoid activation function.

10

Then, the cell state cl is determined by these gates as

cl = f l ◦ cl−1 + il ◦ tanh(Wcx

l + Uczl−1 + bc

), (22)

where Wc and Uc are the weights and bc is the bias. Finally, the output vector zl obtained by

the cell state cl and the output gate vector ol is given by

zl = ol ◦ tanh(cl), (23)

where tanh(x) = ex−e−xex+e−x

is the hyperbolic tangent function.

The forget, input, and output gates are trained through the backpropagation process to extract

the correlation feature of input data. For example, when the mobility of a mobile is high, a

temporal correlation of input data would not be high so that it is desirable to focus on the latest

data rather than the old one. In this case, the forget gate vector f l would be trained to be close

to zero vector (i.e., f l = [0, · · · , 0]T) and the input gate vector il would be trained to be close

to one vector (i.e., il = [1, · · · , 1]T), meaning that the previous cell state information would not

be delivered to the current cell state. On the other hand, when the mobility of mobile is low, a

temporal correlation of input dataset is high and thus the previous data will be used to identify

the correlation structure. In this case, the forget gate would be close to the one vector and the

input gate vector would be trained close to zero vector.

B. DLEEN Architecture

Since the LSTM network is effective in capturing the temporal correlation of input dataset

(i.e., CSI and mobile rate requirement), we use it as a main ingredient of DLEEN. After passing

through the LSTM network, the extracted feature is transformed to the BS on/off mode decision

vector via the feedforward network. Once the on/off mode of BSs is decided, we check the

feasibility of the acquired on/off mode to prevent the scenario where the rate requirement of

mobile is violated. Once the on/off mode passes the feasibility test, meaning that the on/off

mode can satisfy the rate requirement, we allocate the transmission power for the active BSs by

using the linear programming (LP). Detailed architecture of DLEEN is described in Fig. 4.

In the LSTM network, the input vector xl is a composite of CSI hl and rate requirement of

mobile rlmin:

xl =

hl

rlmin

. (24)

11

Fig. 4: Detailed architecture of DLEEN

A consecutive input dataset x1, · · · ,xl passes through the LSTM cells to generate the output

vector zl (see section III.A). Then, the output vector zl passes through the feedforward network.

In each component (i.e., fully connected (FC) layer) of this network, input-output relationship

is

zl = Wdzl + bd, (25)

where Wd and bd are the weight and bias, respectively. After the FC layer, a nonlinear activation

function is applied to zl to determine whether the information is delivered to the next layer or

not. To this end, we use the rectified linear unit (ReLU) function fReLU(x) = max(0, x). The

output zl of the ReLU function zl is

zl = fReLU(zl). (26)

After that, zl passes through the sigmoid layer, generating the output vector αl:

αl = [αl1, · · · , αlM ]T =[fsig(z

l1), · · · , fsig(z

lM)]T, (27)

12

where fsig(x) = 11+e−x

. Since αlm is the output of the sigmoid function for zlm (i.e., αlm =

fsig(zlm)), the range of αlm is from 0 to 1. If αlm is greater than the pre-defined threshold τ , we

set αlm = 1 and turn on the BS m. Otherwise, we set αlm = 0 and turn off the BS m.

C. Feasibility Test and Transmission Power Allocation via Linear Programming

Once the on/off mode vector αl is determined, we allocate the transmission power for the

active BSs using the convex optimization approach. This procedure consists of two main steps:

1) feasibility test to check whether αl satisfies the mobile’s rate requirement and 2) transmission

power allocation. In this subsection, we skip the time slot index l for notational simplicity.

1) Feasibility Test: In this step, we check whether the rate requirement of a mobile is satisfied

for a given α. In case the rate requirement is not satisfied, we measure the degree of infeasibility

dfb, a maximum violation of rate constraint:

dfb = maxk∈U

(Rk,min −Rk

). (28)

Since dfb is the maximum rate constraint violation for all mobiles, one can see that α is feasible

if dfb ≤ 0 and infeasible otherwise. When the rate requirement is violated for a given α, we

update α in the direction of reducing dfb.

The feasibility test problem to minimize dfb for a given α is formulated as

Pfeas : d∗fb = min{pm,k}

maxk∈U

(Rk,min −Rk

)(29a)

s.t. P txm ≤ αmPm,max, m ∈ B (29b)

pm,k ≥ 0, m ∈ B, k ∈ U . (29c)

Note that by concatenating the variables into a vector form (i.e., pk = [p1,k, · · · , pM,k]T, p =

[pT1 , · · · ,pT

K ]T), the rate constraint can be transformed to

Rk ≥ Rk,min ⇐⇒ log2

(1 +

∑Mm=1 pm,kβm,k∑K

j 6=k∑M

m=1 pm,jβm,k + σ2n

)≥ Rk,min (30)

⇐⇒ βTk

(pk −

(2Rk,min − 1

) K∑j 6=k

pj

)≥ σ2

n

(2Rk,min − 1

)(31)

⇐⇒ βTk(d

Tk ⊗ IM)p ≥ σ2

n

(2Rk,min − 1

), (32)

13

where βk = [β1,k, · · · , βM,k]T and dk is a K × 1 vector whose k-th element is one and others

are −(2Rk,min − 1). Using this, Pfeas is reformulated as

Pfeas : d∗fb = minp

maxk∈U

(σ2n

(2Rk,min − 1

)− βT

k(dTk ⊗ IM)p

)(33a)

s.t. (1TK ⊗ eT

m)p ≤ αmηmPm,max, m ∈ B (33b)

p � 0MK , (33c)

where (33b) is a vector form expression of the power constraint. Let dfb = maxk∈U

(σ2n

(2Rk,min −

1)− βT

k(dTk ⊗ IM)p

), then we have

Pfeas : d∗fb = minp,dfb

dfb (34a)

s.t. σ2n

(2Rk,min − 1

)− βT

k(dTk ⊗ IM)p ≤ dfb, k ∈ K (34b)

(1TK ⊗ eT

m)p ≤ αmηmPm,max, m ∈ B (34c)

p � 0MK . (34d)

Since the objective function and constraints are all linear functions of p, Pfeas is an LP and thus

can be easily solved by a convex optimization tool (e.g., CVX [21]).

2) Transmission Power Allocation: If α passes the feasibility test (i.e., d∗fb ≤ 0), we next

find out the optimal power weight {pm,k} for the active BSs. The transmission power allocation

problem Ppa is given by

Ppa : min{pm,k}

M∑m=1

P txm (35a)

s.t. Rk ≥ Rk,min, k ∈ U (35b)

P txm ≤ αmPm,max, m ∈ B. (35c)

pm,k ≥ 0, m ∈ B, k ∈ K. (35d)

Since the maintenance power P onm and the mode transition power P trans

m are functions of α, they

are constant once the on/off mode vector α is decided. Thus, what we need to minimize in Ppa

14

is the transmission power. Using the vectorized expressions (34b) and (34c), we have

Ppa : minp

(1K ⊗ η)Tp (36a)

s.t. βTk(d

Tk ⊗ IM)p ≥ σ2

n

(2Rk,min − 1

), k ∈ U (36b)

(1TK ⊗ eT

m)p ≤ αmηmPm,max, m ∈ B (36c)

p � 0MK , (36d)

where η = [ 1η1, · · · , 1

ηM]T. Since Ppa is an LP, we can easily find out the optimal power weight

p∗.

D. Training DLEEN using Unsupervised Learning

Essential part of DLEEN to satisfy our goal is the training process optimizing the network

parameters θ = {W,U,b}. In the training phase, the network parameters are updated iteratively

to minimize the loss function J(θ) (i.e., θ∗ = arg minθ J(θ)). When J(θ) is differentiable,

network parameters can be updated by the stochastic gradient descent (SGD) method. The update

equation of SGD is

θt+1 = θt − ε∇J(θt), (37)

where ε > 0 is the learning rate determining the step size at each iteration. While computing

the gradients of a large number of parameters of multiple layers is very difficult, thanks to the

backpropagation scheme that sequentially computes the gradient of loss function using the chain

rule, the gradient computation process can be greatly simplified [22]. In this work, we employ

the backpropagation through time (BPTT), a scheme specialized for the gradient computation of

sequential dataset [23]. While the network parameters of the current time slot are updated in the

conventional backpropagation scheme, those of the current and the past time slots are updated

simultaneously in the BPTT. This is because the LSTM cell takes not only the current input

vector xl but also the output vector of the past LSTM cell zl−1 as the input data. By exploiting

the chain rule in the gradient computation, the network parameters of the past and the current

time slots can be updated simultaneously.

1) Loss Function Design: Intriguing feature of the proposed scheme is to use the unsupervised

learning to train the network parameter. Note that the supervised learning requires a large number

of input dataset (CSI and required mobile rate) and labelled output dataset (in our case, optimal

15

BS on/off mode). Unfortunately, obtaining such large dataset is very difficult since it requires

huge data transmission (pilot signal transmission) for the training and also a exhaustive search to

find out the optimal BS on/off mode. We can avoid this hassle using the unsupervised learning

but we need to design the loss function and weight update mechanism carefully.

In this work, we set up the loss function J(θ) as a weighted sum of loss terms for energy

consumption Jon, J trans, J tx, degree of infeasibility J fb, and integer property J int as

J(θ) = Jon + J trans + J tx︸︷︷︸loss term for energy consumption

+ λfbJfb + λintJ

int︸︷︷︸loss term for constraints

, (38)

where λfb and λint are the regularization weights. Details are as follows:

• Jon is the loss term for the maintenance power given by

Jon =L∑l=1

M∑m=1

P on,lm (39)

=L∑l=1

M∑m=1

(αlmρ

onm + (1− αlm

)ρoffm )). (40)

• J trans is the loss term for the mode transition power given by

J trans =L∑l=2

M∑m=1

P trans,lm (41)

=L∑l=2

M∑m=1

(αlm − αl−1m

)2ρtransm . (42)

• J tx is the loss term for transmission power given by

J tx =L∑l=1

M∑m=1

P tx,lm (43)

=L∑l=1

M∑m=1

K∑k=1

1

ηmplm,k. (44)

• J fb is the loss term for degree of infeasibility given by

J fb =L∑l=1

1+(dlfb)dlfb, (45)

where dlfb is the degree of infeasibility corresponding to αl. Since the degree of infeasibility

is a measure of rate constraint violation, it should be considered only when αl is infeasible

(i.e., dlfb > 0). To this end, we use the indicator function 1+(dlfb) which has a non-zero

value only if dlfb > 0.

16

• J int is the loss term to enforce αl to be the integer value vector (i.e., αlm ∈ {0, 1}):

J int =L∑l=1

M∑m=1

(αlm − αlm

)2. (46)

It is worth mentioning that each term in the loss function plays a complementary yet crucial

role in accomplishing the mission. For example, if the rate requirement of a mobile is not satisfied

for the current on/off mode decision, J fb will enforce DLEEN to turn on more BSs to satisfy

the rate requirements. In contrast, Jon, J trans, and J tx will enforce DLEEN to turn off more BSs

to minimize the energy consumption. Also, J int will enforce αl to be an integer vector. At the

end of the training process, the harmonized loss terms will pursue an optimization of the energy

consumption while satisfying the rate requirements of mobiles.

2) Network Parameter Training: In order to use the SGD method, we need to ensure that the

loss function is a differentiable function of training parameters. While the update equation of

Jon, J trans, and J int can be easily obtained, such is not the case for J tx and J fb since they are

constructed from the outputs of LP (i.e., pl and dlfb). Note that the optimal solution (and the

corresponding cost) of LP is not a differentiable function of constraint parameters [24].

To address this issue, we use the notion of subgradient, a generalized concept of gradient for

convex nonsmooth function [25].

Definition 1. Let f : X → R be a real-valued convex function defined on a convex open set

X ⊆ RN . Then a vector v ∈ RN is called a subgradient at x0 ∈ X if

∀x ∈ X , f(x)− f(x0) ≥ vT(x− x0). (47)

Also, the set of all subgradients at x0 is called a subdifferential at x0 and is denoted as ∂f(x0).

When the loss function J is not differentiable, we can use the subgradient instead of gradient

to minimize J in a similar way to the gradient descent method [26]. In the following theorem,

we present the subdifferentials of J tx and J fb with respect to θ.

Theorem 1. The subdifferentials of J tx and J fb with respect to θ are given by

∂J tx(θ) =

{−

L∑l=1

λTΠP∂αl

∂θ

∣∣∣λ is the dual solution of Ppa associated with (36c)

}(48)

∂J fb(θ) =

{−

L∑l=1

1+(dlfb)µTΠP

∂αl

∂θ

∣∣∣µ is the dual solution of Pfeas associated with (34c)

},

(49)

17

TABLE I Training process of DLEENInput: CSI {hl}, required mobile rate {rlmin}, LSTM-based BS on/off decision network f ,

learning rate ε, on/off threshold τ , number of time slot L

Initialization: t = 0, θt = θini

Iteration:

1: while θt does not converge do

2: for l = 1, · · · , L do

3: xl = [hl,T, rl,Tmin]T

4: Obtain αl by passing x1, · · · ,xl through f parameterized by θt

5: Discritized αl into αl as αl = 1+sgn(αl−τ)2

6: Compute ∂αl

∂θ

∣∣θ=θt

by using the backpropagation algorithm

7: Solve the feasibility test Pfeas for αl to obtain the dual solution µ

8: Solve the feasibility test Pfeas for αl to obtain the degree of infeasibility dlfb9: Solve the power allocation problem Ppa to obtain the optimal power weight pl and

the dual solution λ

10: end for

11: Compute ∂Jon

∂θ, ∂J trans

∂θ, and ∂J int

∂θ

12: ∂J tx

∂θ= −

∑Ll=1 λ

TΠP∂αl

∂θ

13: ∂J fb

∂θ= −

∑Ll=1 1+(dlfb)µ

TΠP∂αl

∂θ

14: ∇J(θt) = ∂Jon

∂θ

∣∣θ=θt

+ ∂J trans

∂θ

∣∣θ=θt

+ ∂J tx

∂θ

∣∣θ=θt

+ ∂J fb

∂θ

∣∣θ=θt

+ ∂J int

∂θ

∣∣θ=θt

15: θt+1 = θt − ε∇J(θt)

16: t = t+ 1

17: end while

where Π = diag(η1, · · · , ηM) and P = diag(P1,max, · · · , PM,max).

Proof. See Appendix A.

Once the gradient of total loss function ∇J(θ) is obtained, we can update θ by using the

SGD method (see (37)). The training process of DLEEN is summarized in Table 1.

18

IV. PRACTICAL ISSUES FOR THE DLEEN IMPLEMENTATION

In this section, we discuss two practical issues when we use the DLEEN scheme. We first

discuss the training data acquisition issue and then described the offline training issue.

A. Training Data Acquisition

In order to find out an optimal on/off mode decision of BSs minimizing the energy consumption

of UDN, huge amount of training data (i.e., CSI and mobile rate requirement) is needed. Note that

the network trained with insufficient amount of dataset might not converge or can be overfitted

to the specific dataset, limiting the energy saving gain of UDN severely. In practice, however,

acquisition of the training dataset is very difficult since the data (channel estimate extracted

from the received pilot signal) should be collected from a large number of small cells for a long

period of time to cover wide variety of wireless environments.

To circumvent this issue, we use the synthetically generated training dataset in this work.

Specifically, we generate the UDN simulator where the small cells and mobiles are randomly

distributed. The channel between each BS and mobile is generated using this simulator. One

might concern that the synthetically generated channels might be different from the real channels.

While it is true, fortunately, this issue is not so critical since the large-scale fading coefficient,

a function of path loss and shadowing, depends heavily on the communication distance. By

applying a realistic path loss model for the synthetic data generation (we will say more on this

in Section V), we can mitigate the mismatch between the real and the synthetically generated

datasets.

B. Offline Training Process

Another important issue when applying the DNN technique for the wireless systems is a huge

computational overhead occurring in the training process. Note that the DNN includes multiple

hidden layers consisting of weight matrices and bias vectors. Since these large number of network

parameters are updated simultaneously through the backpropagation, computation time and costs

of training process are considerable2. This issue is even more serious in the proposed DLEEN

2Even when using the deep learning server for the network training, the training process takes several hours. Since most of

the decisions in wireless systems should be made in a few milliseconds, online training process is not suitable for the deep

learning-based wireless systems.

19

scheme since the network parameters of the previous and current time slots are updated at the

same time through BPTT. To deal with this issue, we use the offline training scheme together

with the synthetically generated dataset. Basically, we train the multiple DLEEN instances, each

of which is designed for distinct settings in terms of BS number, mobile number, and noise

power. In doing so, we can obtain several DLEEN instances optimized for each communication

scenario. In real applications, we simply choose the pre-trained network matching to the real

UDN environment.

V. SIMULATIONS AND DISCUSSIONS

A. Simulation Setup

We consider the UDN scenario where M small cells cooperatively serve K mobiles. The small

cells are randomly distributed in a square area of D ×D km2 and the mobiles are moving at a

constant speed v. In the fading channel model, the small-scale fading coefficient gm,k is generated

according to the complex Gaussian distribution (i.e., gm,k ∼ CN (0, 1)) and the large-scale fading

coefficient βm,k accounting for the path loss and the shadow fading is given by

βlm,k = PLlm,k × 10σshzm,k

10 , (50)

where PLlm,k represents the path loss at the time slot l and 10σshzm,k

10 represents the shadow

fading where σsh is the standard deviation and zm,k ∼ N (0, 1). As for the path loss, we use a

three-slope path loss model given by [27]

PLlm,k =

−L− 35 log10(d

lm,k) if dlm,k > d1

−L− 15 log10(d1)− 20 log10(dlm,k) if d0 < dlm,k ≤ d1

−L− 15 log10(d1)− 20 log10(d0) if dm,k ≤ d0

(51)

where dlm,k is the distance between the BS m and the mobile k at time slot l and

L = 46.3 + 33.9 log10(f)− 13.82 log10(hB)− (1.1 log10(f)− 0.7)hU + (1.56 log10(f)− 0.8),

(52)

where f is the carrier frequency (in MHz), hB and hU are the heights of BS and mobile,

respectively. The system parameters are summarized in Table II [28].

The proposed DLEEN consists of 3 LSTM layers followed by the feedforward network

consisting of 5 stacks of FC, ReLU layers (see Fig. 4). In each hidden layer, we set the width

20

TABLE II: System parameters

Parameters Values

Carrier frequency (f ) 1.9GHz

BS height (hB) 15m

Mobile height (hU ) 1.65m

Service area radius (D) 200m

Path loss variable (d0) 10m

Path loss variable (d1) 50m

Shadow fading deviation (σsh) 3 dB

Number of small cells (M ) 8

Number of mobiles (K) 4

Number of time slots (L) 50

Amplifier efficiency (η) 0.25

On mode BS power (ρon) 6.8W

Off mode BS power (ρoff) 4.3W

Maximum transmission power (Pmax) 1W

Mode transition power (ρtrans) 3W

Mobile mobility (v) 0 ∼ 6m/s

W = 512 and the sequence length 5. Also, when discretizing the output of neural network αl to

the BS on/off mode vector αl, we use the threshold value τ = 0.5. For the network parameter

training, we use the Adam optimizer, a well-known optimization tool to guarantee the robustness

of learning process [29]. We compare the proposed DLEEN scheme with four conventional BS

on/off strategies: 1) full association strategy where all the BSs are turned on, 2) traffic load-

based on/off strategy where the BSs with low traffic load are turned off [30], 3) sequential on/off

strategy where the BS having the minimum impact on the energy consumption is turned off one

after another until it reaches to point where the mobile’s rate requirement is violated [12], and 4)

mixed-integer linear programming (MILP)-based on/off strategy where the BS on/off mode and

the corresponding transmission power are optimized simultaneously at each time slot. In each

point of the simulation figure, we plot the average power consumption of L = 50 time slots.

21

0 5 10 15

Training epoch 104

46

47

48

49

50

51

52

53

Va

lue

of

loss f

un

ctio

n

43

43.5

44

44.5

45

45.5

46

46.5

47

47.5

Ave

rag

e p

ow

er

co

nsu

mp

tio

n

Total loss function

Average power consumption

Fig. 5: Average power consumption, degree of infeasibility, and total loss function as a function

of training epoch (M = 8, K = 4, L = 50, SNR = 10 dB, and Rmin = 0.5 bps/Hz).

B. Simulation Results

In Fig. 5, we plot the average power consumption and loss function value of DLEEN as

a function of the training epoch3 when SNR = 10 dB and Rmin = 0.5 bps/Hz. We observe

that as the training epoch increases, both the loss function and the average power consumption

decrease simultaneously, which implies that the proposed training process is working properly

and effective in reducing the energy consumption of the network.

In Fig. 6, we plot the average power consumption as a function of mobile’s rate requirement

when SNR = 10 dB. We observe that the proposed DLEEN achieves a significant energy saving

over the conventional on/off schemes. For example, when Rmin = 0.3 bps/Hz, DLEEN saves

more than 24% energy over the full association scheme, 27% over the traffic-load based on/off

scheme, and 12.4% over the sequential on/off scheme. Even when compared to the MILP-based

on/off scheme, DLEEN saves around 11% of energy. This is because the conventional on/off

schemes control the instantaneous power (i.e., maintenance power P onm , P off

m and transmission

power P txm) only but the proposed DLEEN scheme controls the instantaneous power as well as

3Epoch is a measure of training time in DNN. One epoch means that the network has been trained once using the given

dataset

22

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

Rate requirement (bps/Hz)

35

40

45

50

55

60A

vera

ge p

ow

er

consum

ption (

W/tim

e s

lot)

Proposed DLEEN

MILP-based on/off

Sequential on/off

Traffic load-based on/off

Full association

Fig. 6: Average power consumption as a function

of rate requirement Rmin (M = 8, K = 4, L =

50, and SNR = 10 dB)

6 7 8 9 10 11 12 13 14

SNR (dB)

35

40

45

50

55

60

Avera

ge p

ow

er

consum

ption (

W/tim

e s

lot)

Proposed DLEEN

MILP-based on/off

Sequential on/off


Full association


of SNR (M = 8, K = 4, L = 50, and Rmin =

0.3 bps/Hz)

the transition power P transm . Note that the traffic load-based on/off scheme consumes even more

energy than the full association since it produces considerable amount of energy caused by the

frequent mode transition.

In Fig. 7, we plot the average power consumption as a function of signal-to-noise ratio (SNR).

In this figure, we set the rate requirement of mobile into Rmin = 0.3 bps/Hz. We observe that

the energy saving of the proposed DLEEN scheme increases with the SNR. For instance, the

energy saving of DLEEN over the full association scheme is around 6% when SNR = 6 dB but

it increases to 30% when SNR = 14 dB. This is because only a small number of BSs is required

to serve the mobiles in the high SNR regime (see. Fig. 8), and thus the energy saving obtained

by turning off the unnecessary BSs is improving at high SNR. Interestingly, the energy saving

of DLEEN over the conventional on/off schemes also increases with the SNR. For example, the

energy saving of DLEEN over the MILP-based on/off scheme increases from 4% to 8% when

the SNR increases from 6 dB to 14 dB. Note that when the SNR is high, the conventional on/off

schemes usually choose active BSs near the mobile. Hence, when the mobile is moving and thus

changing the location, the active BS set is also changing, causing an increase in the transition

power.

In Fig. 8, we plot the percentage of active BSs as a function of rate requirement when

23

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

Rate requirement (bps/Hz)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1P

erc

enta

ge o

f active b

ase s

tations

Proposed DLEEN

MILP-based on/off

Sequential on/off


Full association

Fig. 8: Percentage of active BSs as a function

of the rate requirement Rmin (M = 8, K = 4,

L = 50, and SNR = 10 dB)

6 6.5 7 7.5 8 8.5 9 9.5 10

Number of base stations

30

35

40

45

50

55

60

65

70

75

Avera

ge p

ow

er

consum

ption (

W/tim

e s

lot) Proposed DLEEN

MILP-based on/off

Sequential on/off


Full association


of number of BSs M (K = 4, L = 50, SNR =

10 dB, and Rmin = 0.3 bps/Hz).

SNR = 10 dB. Note that the proposed DLEEN scheme turns on more BSs than the conventional

on/off schemes. For instance, when the rate requirement is 0.5 bps/Hz, DLEEN turns on 40%

of BSs while the MILP-based on/off scheme turns on only 30% of BSs. Since the saving of

transition power outweighs the maintenance power of turning on slightly more BSs, DLEEN

consumes 11% less energy than the MILP-based on/off scheme (see Fig. 6). We also note that

even in the case where the number of active BSs is similar (i.e., Rmin = 0.3 bps/Hz), DLEEN

saves more than 7% of energy over the MILP-based on/off scheme and 9% of energy over the

sequential on/off scheme, which implies that DLEEN picks better set of active BSs minimizing

the cumulative energy consumption.

In Fig. 9, we change the number of BSs and then plot the average power consumption when

SNR = 10 dB and Rmin = 0.3 bps/Hz. We observe that as the number of BSs increases, the

energy saving of DLEEN over the full association scheme increases. To be specific, when the

number of BSs increases from 6 to 14, the energy saving of DLEEN over the full association

scheme increases from 22% to 27%. Since we only need to turn on a few BSs to serve the

mobile, the energy saving obtained from the BS sleeping increases when the number of BSs is

large. Moreover, since we choose the active BSs from increased number of total BSs, we can

obtain the additional gain from the cooperation diversity. In fact, when the number of BSs is

24

10, the energy consumption of DLEEN is more or less similar to that of full association scheme

when the number of BSs is 7, which demonstrates that the proposed DLEEN is effective in

reducing the energy consumption in UDN.

VI. CONCLUSION

In this paper, we proposed a DNN-based framework to save the energy consumption in UDN.

In the proposed deep learning-based energy-efficient network (DLEEN), the BS on/off mode is

determined through the DNN first and then the transmission power is allocated to the active

BSs via the convex optimization technique. Key ingredient of DLEEN is the LSTM network

that exploits the temporally correlated feature of CSIs for the BS on/off mode decision. DLEEN

controls not only the instantaneous power but also the mode transition power, resulting in a

substantial reduction of cumulative energy consumption. From the simulations on realistic UDN

environments, we observe that the proposed DLEEN scheme saves a considerable amount of

power over the conventional on/off techniques. In this paper, we restricted our attention to the

BS sleep mode technique but we believe that there are many interesting research extensions of

the proposed scheme such as cognitive radio access, user scheduling, and resource allocation.

APPENDIX A

PROOF OF THEOREM 1

Note that the total transmission power P tx,l =∑M

m=1 Ptx,lm =

∑Mm=1

∑Kk=1

1ηmplm,k and the

degree of infeasibility dlfb are the optimal values of Ppa and Pfeas, respectively. Also, αl is the

coefficients of right-hand side (RHS) vector of power constraints. It is well-known that for a

bounded feasible LP, the optimal value is a convex function of the RHS vector [31, Theorem 2].

Also, it is well-known that the subdifferential of optimal value of LP with respect to RHS vector

is equal to the set of dual solutions [32, Theorem 1]. Using these, we obtain the subgradients

of P tx,l and dlfb with respect to αl as

∂P tx,l

∂αl=∂P tx,l

∂b

∂b

∂αl= −λTΠP (53)

∂dlfb∂αl

=∂dlfb∂b

∂b

∂αl= −µTΠP, (54)

25

where b = [αl1η1P1,max, · · · , αlMηMPM,max] is the common RHS vector of Ppa and Pfeas related

to the power constraints. From (53), we obtain the subgradient of J tx as

∂J tx

∂θ=

∂

∂θ

L∑l=1

(M∑m=1

K∑k=1

1

ηmplm,k

)(55)

=L∑l=1

∂P tx,l

∂θ(56)

=L∑l=1

∂P tx,l

∂αl

∂αl

∂θ(57)

= −L∑l=1

λTΠP∂αl

∂θ. (58)

Also, from (54), the subgradient of J fb is computed as

∂J fb

∂θ=

∂

∂θ

L∑l=1

1+(dlfb)dlfb (59)

=L∑l=1

1+(dlfb)∂dlfb∂θ

(60)

=L∑l=1

1+(dlfb)∂dlfb∂αl

∂αl

∂θ(61)

= −L∑l=1

1+(dlfb)µTΠP

∂αl

∂θ. (62)

REFERENCES

[1] J. Son, S. Kim, and B. Shim, “Energy efficient ultra-dense network using long short-term memory,” to appear in IEEE

Wireless Commun. and Networking Conf. (WCNC), April. 2020.

[2] X. Ge, S. Tu, G. Mao, C.-X. Wang, and T. Han, “5G ultra-dense cellular networks,” IEEE Wireless Commun., vol. 23,

no. 1, pp. 72–79, 2016.

[3] S. Kim, J. W. Choi, and B. Shim, “Downlink pilot precoding and compressed channel feedback for FDD-based cell-free

systems,” to appear in IEEE Trans. on Wireless Commun.

[4] M. Kamel, W. Hamouda, and A. Youssef, “Ultra-dense networks: A survey,” IEEE Commun. Surveys & Tutorials, vol. 18,

no. 4, pp. 2522–2545, 2016.

[5] I. Chih-Lin, C. Rowell, S. Han, Z. Xu, G. Li, and Z. Pan, “Toward green and soft: a 5G perspective,” IEEE Commun.

Mag., vol. 52, no. 2, pp. 66–73, 2014.

[6] A. P. Bianzino, C. Chaudet, D. Rossi, J.-L. Rougier et al., “A survey of green networking research,” IEEE Commun.

Surveys & Tutorials, vol. 14, no. 1, pp. 3–20, 2012.

26

[7] G. Y. Li, Z. Xu, C. Xiong, C. Yang, S. Zhang, Y. Chen, and S. Xu, “Energy-efficient wireless communications: tutorial,

survey, and open issues,” IEEE Wireless Commun., vol. 18, no. 6, pp. 28–35, 2011.

[8] D. Feng, C. Jiang, G. Lim, L. J. Cimini, G. Feng, and G. Y. Li, “A survey of energy-efficient wireless communications,”

IEEE Commun. Surveys & Tutorials, vol. 15, no. 1, pp. 167–178, 2013.

[9] C. Han, T. Harrold, S. Armour, I. Krikidis, S. Videv, P. M. Grant, H. Haas, J. S. Thompson, I. Ku, C.-X. Wang et al.,

“Green radio: radio techniques to enable energy-efficient wireless networks,” IEEE Commun. Mag., vol. 49, no. 6, 2011.

[10] J. Wu, Y. Zhang, M. Zukerman, and E. K.-N. Yung, “Energy-efficient base-stations sleep-mode techniques in green cellular

networks: A survey,” IEEE Commun. Surveys & Tutorials, vol. 17, no. 2, pp. 803–826, 2015.

[11] C. Liu, B. Natarajan, and H. Xia, “Small cell base station sleep strategies for energy efficiency,” IEEE Trans. Veh. Technol.,

vol. 65, no. 3, pp. 1652–1661, 2015.

[12] E. Oh and K. Son, “A unified base station switching framework considering both uplink and downlink traffic,” IEEE

Wireless Commun. Letters, vol. 6, no. 1, pp. 30–33, 2016.

[13] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[14] W. Kim, Y. Ahn and B. Shim, “Deep neural network-based active user detection for grant-free NOMA systems,” IEEE

Trans. on Commun., vol. 68, no. 4, pp. 2143–2155, 2020.

[15] B. Dai and W. Yu, “Energy efficiency of downlink transmission strategies for cloud radio access networks,” IEEE J. Sel.

Topics Commun., vol. 34, no. 4, pp. 1037–1050, 2016.

[16] N. Jalden, P. Zetterberg, B. Ottersten, A. Hong, and R. Thoma, “Correlation properties of large scale fading based on

indoor measurements,” in 2007 IEEE Wireless Commun. and Networking Conf., 2007, pp. 1894–1899.

[17] H. Pervaiz, O. Onireti, A. Mohamed, M. A. Imran, R. Tafazolli, and Q. Ni, “Energy-efficient and load-proportional eNodeB

for 5G user-centric networks: A multilevel sleep strategy mechanism,” IEEE Veh. Technol. Mag., vol. 13, no. 4, pp. 51–59,

2018.

[18] P. Asbeck and Z. Popovic, “ET comes of age: Envelope tracking for higher-efficiency power amplifiers,” IEEE Microwave

Mag., vol. 17, no. 3, pp. 16–25, 2016.

[19] A. Conte et al., “Power consumption of base stations,” in TREND Plenary Meeting, 2012.

[20] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, “A deep reinforcement learning based framework for power-efficient

resource allocation in cloud RANs,” in Proc. IEEE Int. Conf. on Commun. (ICC), 2017, pp. 1–6.

[21] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming.”

[22] D. E. Rumelhart, G. E. Hinton, R. J. Williams et al., “Learning representations by back-propagating errors,” Cognitive

Modeling, vol. 5, no. 3, p. 1, 1988.

[23] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proc. IEEE, vol. 78, no. 10, pp. 1550–1560,

1990.

[24] T. Gal and J. Nedoma, “Multiparametric linear programming,” Management Science, vol. 18, no. 7, pp. 406–422, 1972.

[25] R. T. Rockafellar, Convex analysis (No. 28). Princeton university press, 1970.

[26] S. Boyd, L. Xiao, and A. Mutapcic, “Subgradient methods,” lecture notes of EE392o, Stanford University, Autumn Quarter,

vol. 2004, pp. 2004–2005, 2003.

[27] A. Tang, J. Sun, and K. Gong, “Mobile propagation loss with a low base station antenna for NLOS street microcells in

urban area,” in Proc. IEEE Veh. Technol. Conf. (VTC), Sep. 2001, pp. 333–336.

[28] G. Auer, V. Giannini, C. Desset, I. Godor, P. Skillermark, M. Olsson, M. A. Imran, D. Sabella, M. J. Gonzalez, O. Blume

et al., “How much energy is needed to run a wireless network?” IEEE Wireless Commun., vol. 18, no. 5, pp. 40–49, 2011.

[29] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

27

[30] Y. S. Soh, T. Q. Quek, M. Kountouris, and H. Shin, “Energy efficient heterogeneous cellular networks,” IEEE J. Sel. Topics

Commun., vol. 31, no. 5, pp. 840–850, 2013.

[31] J. E. Ward and R. E. Wendell, “Approaches to sensitivity analysis in linear programming,” Annals of Operations Research,

vol. 27, no. 1, pp. 3–38, 1990.

[32] M. Akgul, “A note on shadow prices in linear programming,” Journal of the Operational Research Society, vol. 35, no. 5,

pp. 425–431, 1984.

energy-efﬁcient ultra-dense network using lstm-based deep

Documents