research article dynamic resource allocation with integrated reinforcement learning...

19
Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning for a D2D-Enabled LTE-A Network with Access to Unlicensed Band Alia Asheralieva and Yoshikazu Miyanaga Laboratory of Information Communication Networks, School of Information Science and Technology, Hokkaido University, Sapporo, Japan Correspondence should be addressed to Alia Asheralieva; [email protected] Received 30 May 2016; Revised 8 September 2016; Accepted 16 October 2016 Academic Editor: Juan C. Cano Copyright © 2016 A. Asheralieva and Y. Miyanaga. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We propose a dynamic resource allocation algorithm for device-to-device (D2D) communication underlying a Long Term Evolution Advanced (LTE-A) network with reinforcement learning (RL) applied for unlicensed channel allocation. In a considered system, the inband and outband resources are assigned by the LTE evolved NodeB (eNB) to different device pairs to maximize the network utility subject to the target signal-to-interference-and-noise ratio (SINR) constraints. Because of the absence of an established control link between the unlicensed and cellular radio interfaces, the eNB cannot acquire any information about the quality and availability of unlicensed channels. As a result, a considered problem becomes a stochastic optimization problem that can be dealt with by deploying a learning theory (to estimate the random unlicensed channel environment). Consequently, we formulate the outband D2D access as a dynamic single-player game in which the player (eNB) estimates its possible strategy and expected utility for all of its actions based only on its own local observations using a joint utility and strategy estimation based reinforcement learning (JUSTE-RL) with regret algorithm. A proposed approach for resource allocation demonstrates near-optimal performance aſter a small number of RL iterations and surpasses the other comparable methods in terms of energy efficiency and throughput maximization. 1. Introduction D2D communication is a direct communication between the users transmitting over the cellular spectrum (inband) or operating on an unlicensed band (i.e., outband). e main advantages of inband D2D communication are the increased spectrum efficiency and possibility of quality of service (QoS) provisioning for different cellular/D2D users. e chief obstacles to the implementation of inband D2D access are (i) interference mitigation (between the users transmitting over the same frequency bands) and (ii) resource allocation [1]. Effective resource allocation and interference management strategies can significantly improve the per- formance of cellular networks. e objectives here could be different (such as improvement of spectrum efficiency, cellular coverage, network throughput, or user experience) but to achieve the optimal system performance, the problems of cellular/D2D mode selection, spectrum assignment, power allocation, and interference mitigation should be considered jointly in the algorithm design. Related contributions in this area are [2–10] studying the problem of interference mitigation for underlying D2D communication. It should be noted, however, that the majority of proposed formulations (except [2, 3]) does not deal with the issues of mode selection, spectrum assignment, and interference management in a joint fashion but rather by splitting the original problem into smaller subproblems (see e.g., [10]) or by separating the time scales of these subproblems (e.g., [9]). Hence, although the complexity of such methods is less than the complexity of a joint resource allocation, their efficiency in maximizing some certain optimality criterion is clearly downgraded. Outband D2D communication (carried over Wi-Fi Direct [11], ZigBee [12], or Bluetooth [13]) eliminates the need for interference mitigation but can be distorted by Hindawi Publishing Corporation Mobile Information Systems Volume 2016, Article ID 4565203, 18 pages http://dx.doi.org/10.1155/2016/4565203

Upload: others

Post on 06-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

Research ArticleDynamic Resource Allocation with Integrated ReinforcementLearning for a D2D-Enabled LTE-A Network with Access toUnlicensed Band

Alia Asheralieva and Yoshikazu Miyanaga

Laboratory of Information Communication Networks School of Information Science and Technology Hokkaido UniversitySapporo Japan

Correspondence should be addressed to Alia Asheralieva aasheralievagmailcom

Received 30 May 2016 Revised 8 September 2016 Accepted 16 October 2016

Academic Editor Juan C Cano

Copyright copy 2016 A Asheralieva and Y Miyanaga This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

We propose a dynamic resource allocation algorithm for device-to-device (D2D) communication underlying a Long TermEvolution Advanced (LTE-A) network with reinforcement learning (RL) applied for unlicensed channel allocation In a consideredsystem the inband and outband resources are assigned by the LTE evolved NodeB (eNB) to different device pairs to maximizethe network utility subject to the target signal-to-interference-and-noise ratio (SINR) constraints Because of the absence of anestablished control link between the unlicensed and cellular radio interfaces the eNB cannot acquire any information about thequality and availability of unlicensed channels As a result a considered problem becomes a stochastic optimization problem thatcan be dealt with by deploying a learning theory (to estimate the random unlicensed channel environment) Consequently weformulate the outband D2D access as a dynamic single-player game in which the player (eNB) estimates its possible strategy andexpected utility for all of its actions based only on its own local observations using a joint utility and strategy estimation basedreinforcement learning (JUSTE-RL)with regret algorithmA proposed approach for resource allocation demonstrates near-optimalperformance after a small number of RL iterations and surpasses the other comparable methods in terms of energy efficiency andthroughput maximization

1 Introduction

D2D communication is a direct communication betweenthe users transmitting over the cellular spectrum (inband)or operating on an unlicensed band (ie outband) Themain advantages of inband D2D communication are theincreased spectrum efficiency and possibility of quality ofservice (QoS) provisioning for different cellularD2D usersThe chief obstacles to the implementation of inband D2Daccess are (i) interference mitigation (between the userstransmitting over the same frequency bands) and (ii) resourceallocation [1] Effective resource allocation and interferencemanagement strategies can significantly improve the per-formance of cellular networks The objectives here couldbe different (such as improvement of spectrum efficiencycellular coverage network throughput or user experience)but to achieve the optimal system performance the problems

of cellularD2Dmode selection spectrum assignment powerallocation and interference mitigation should be consideredjointly in the algorithm design Related contributions inthis area are [2ndash10] studying the problem of interferencemitigation for underlying D2D communication It should benoted however that the majority of proposed formulations(except [2 3]) does not deal with the issues ofmode selectionspectrum assignment and interference management in ajoint fashion but rather by splitting the original probleminto smaller subproblems (see eg [10]) or by separatingthe time scales of these subproblems (eg [9]) Hencealthough the complexity of such methods is less than thecomplexity of a joint resource allocation their efficiencyin maximizing some certain optimality criterion is clearlydowngraded Outband D2D communication (carried overWi-Fi Direct [11] ZigBee [12] or Bluetooth [13]) eliminatesthe need for interference mitigation but can be distorted by

Hindawi Publishing CorporationMobile Information SystemsVolume 2016 Article ID 4565203 18 pageshttpdxdoiorg10115520164565203

2 Mobile Information Systems

the randomness of unlicensed channels Existing works onoutband D2D access focus on such issues as power con-sumption (eg [14ndash17]) and coordination between cellularand wireless interfaces ([18ndash21]) Some of these works ([1415 21]) suggest control of unlicensed band by the cellularnetwork (which requires a certain amount of cooperation andinformation exchange between different radio interfaces)Other works (eg [17 18 20]) imply autonomous operationof D2D devices (based on stochastic modeling of unlicensedchannels)

The main contributions of this work are as followsWe consider a network-controlled D2D communicationin which the licensed and unlicensed spectrum resourcesuser modes and transmission power levels are allocatedto different device pairs by the LTE eNB to maximizethe overall network utility We consider a general networkdeployment scenario where the unlicensed band is assumedto be provided by one or more radio access technologies(RATs) based on the orthogonal frequency division multipleaccess (OFDMA) carrier sense multiple access with collisionavoidance (CSMACA) frequency-hopping code divisionmultiple access (FH-CDMA) or any other multiple accessmethod It is assumed that all device pairs are equippedwith different wireless interfaces allowing them to connectto the appropriate RAT and use a CSMACA to avoid colli-sions when operating on the unlicensed band Hence eachunlicensed channel becomes available to a D2D pair onlywhen it is idle Unlike many previous works we jointly solvethe problems of inbandoutband access mode selection andspectrumpower assignment by combining these problemsinto one optimization problem which allows to allocate theinband network resources and offload the D2D traffic ina most effective way (in terms of maximizing the overallnetwork utility) Note that the formulated problem can besolved to optimality only if the global channel and networkknowledge (including the precise information on the oper-ating conditions of the licensed and unlicensed channels) isavailable to the eNB However because of the absence of anestablished control link between the unlicensed and cellularradio interfaces the eNB cannot get any information aboutthe quality and availability of the unlicensed channels As aresult a considered resource allocation problem becomes astochastic optimization problem that can be dealt with bydeploying a learning theory [22] (to estimate the randomunlicensed channel environment)

Consequently we formulate the outband D2D access asa dynamic single-player game in which the player (eNB)estimates its possible strategy and expected utility for all ofits actions based only on its own local observations usinga JUSTE-RL with regret (originally proposed in [23]) Themain idea behind RL is that the actions leading to the highernetwork utility at the current stage should be granted withhigher probabilities at the next stage [22] In the simplest formof RL (described eg in [24]) a learning agent estimates itsbest strategy based on its observed utility without any priorinformation about its operating environment This form ofRL requires only algebraic operations but its convergence tothe equilibrium state is not guaranteed [25] In Q-learning[22] a utility is estimated using some value-action function

This RL method converges to a Nash equilibrium (NE)state However it requires maximization of the action-valueat every stage which can be computationally demanding[22] In JUSTE-RL algorithm (described in detail in [23])a learning agent estimates not only its own strategy butalso the expected utility for all of its actions Unlike Q-learning JUSTE-RL does not need to perform optimizationof the action-value (since only algebraic operations arerequired to update the strategies) and hence it has a lowercomputational complexity On the other hand comparedto a basic RL algorithm JUSTE-RL converges to a 120576-NE[23 25]

It is worth mentioning that in wireless communicationsRL has been studied in the context of various spectrum accessproblems In [26 27] the learning has been employed tomin-imize the interference (created by adjacent nodes) in partiallyoverlapping channels This problem has been formulated asthe exact potential graphical game admitting a pure-strategyNE and therefore the proposed approach is not realizablein a broader range of problems A cognitive network withmultiple players has been analyzed in [28] In this workthe learning and channel selection have been separated intotwo different procedures which increased the complexity of aproposed resource allocation approach Besides the stabilityof a final solution was not verified A multi-player game forinbandD2D access where the players (D2Dusers) learn theiroptimal strategies based on the throughput performance ina stochastic environment has been studied in [29] It wasassumed that eachD2D user can transmit over the vacant cel-lular channels using a CSMACA implying that there are nochannels with interfering users (ie each orthogonal channelcan be occupied by at most one cellularD2D user) Althoughthe authors consider a scenariowith twoD2Dusers operatingon the same channel it is not clear how a D2D user cansense whether the user operating on the channel is cellular orD2D An autonomous D2D access in heterogeneous cellularnetworks comprising multiple low-power and high-powerBSs with (possibly) overlapping spectrum bands has beeninvestigated in [30] This problem has been modeled as astochastic noncooperative game with multiple players (D2Dpairs) admitting amixed-strategy NEThe goal of each playerwas to jointly select the wireless channel and power levelto maximize its reward defined as the difference betweenthe achieved throughput and the cost of power consumptionconstrained by the minimum tolerable SINR requirementsof this D2D pair To solve this problem a fully autonomousmultiagent Q-learning algorithm (which does not requireany information exchange andor cooperation among dif-ferent users) is developed and implemented in an LTE-Anetwork

The rest of the paper is organized as follows A generalnetwork model for inband and outband network opera-tion is described in Section 2 A general problem and thealgorithms for unlicensed and licensed resource allocationare formulated in Section 3 The algorithm implementationincluding the proposed resource allocation procedure in anLTE-A networks and performance evaluation is presented inSection 4 The paper is finalized in Conclusion

Mobile Information Systems 3

2 Network Model

In this paper the problem of resource allocation for D2Dcommunication is investigated for both the uplink (UL) anddownlink (DL) directions Similarly the discussion throughthe rest of the paper is applicable (if not stated otherwise) toeither direction Consider a basic LTE-A network consistingof one eNB and 119873 user pairs denoted PU1 PU119873 withN = 1 119873 being the set of user pairsrsquo indices It isassumed that a fixed licensed spectrum band of the eNBspans119870 resource blocks (RBs) numberedRB1 RB119870 withK = 1 119870 denoting the set of RBsrsquo indices comprisingthe bandwidth The network runs on a slotted-time basiswith the time axis partitioned into equal nonoverlapping timeintervals (slots) of the length 119879119904 with t denoting an integer-valued slot index Each pair of users can communicate witheach other either by the traditional cellular mode (CM) viathe eNB or in a D2D mode (DM) without traversing theeNB Let C sube N be the set of the indices of device pairsthat can operate only in CM and let D = N C denotethe set of the indices of potential D2D pairs (The indices inC and D can be determined based on eg user application(such as video sharing gaming and proximity-aware socialnetworking) in which the pair of devices could potentiallybe in range for the direct communication Such informationcan be acquired from a standard session initiation protocol(SIP) procedure (which handles the session setups and usersarrivals in LTE networks) Interested readers are referred to[31] for a comprehensive description of an SIP procedure andits use in the D2D access)

In our network any potential D2D pair can be allocatedwith cellular or D2D mode (based on the results of resourceallocation procedure) Consequently we define a binarymode allocation variable 119888119899(119905) 119899 isin N equaling 1 if PUn isallocated CM at slot t and 0 otherwise Note that 119888119899(119905) = 1for all 119899 isin C Further we consider the following models ofD2D access

(i) Inband D2D a D2D pair operates within the licensedLTE spectrum in an underlay to cellular communica-tion

(ii) Outband D2D a D2D pair transmits over the unli-censed band by exploiting other RATs such as Wi-Fi Direct [11] ZigBee [12] or Bluetooth [13] (It isassumed that all user devices are equipped withthe corresponding wireless interfaces to be able tocommunicate using a suitable RAT) We assume thatthere is no coordination andor information exchangebetween different wireless interfaces

To differentiate the pairs according to their D2D access wedefine a binary channel access variable 119887119899(119905) 119899 isin N equaling1 if PUn operates inband at slot t and 0 otherwise Note thatall cellular users can access only the LTEbandsHence 119887119899(119905) =1 for all 119899 isin C

21 Inband Network Operation In LTELTE-A RBs areallocated to cellular users by the eNBs using a standard packetscheduling procedure [32] The use of packet scheduling in aD2D-enabled LTE-A network is described in detail in [33]

In short a packet scheduling process can be explained asfollows In the UL direction at the beginning of any slott each user is required to collect and transmit its bufferstatus information After collecting this data a user sendsthe scheduling request (SR) with its buffer status informationto the eNB via a dedicated physical uplink control channel(PUCCH) After receiving all the SRs the eNB allocates theRBs to the users (according to a certain scheduling algorithm)and responds to all the SRs by sending the schedulinggrants (SGs) together with the allocation information to thecorresponding users via dedicated physical downlink controlchannels (PDCCHs) [33] In theDL the eNB readily finds outtheDL buffer status for each user allocates the RBs and sendsthe SGs with allocation information via PDCCHs [33] In theframework used in this paper the above scheduling processis applied for both the cellular and D2D communication withsome modifications (the corresponding resource allocationprocedure will be described in Section 4)

Let us further define a binary RB allocation variable 119886119896119899(119905)119899 isin N 119896 isin K equaling 1 if PUn is allocated with RBk at slott and 0 otherwise Each RB can be allocated to at most onecellular user Hence

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (1a)

The number of D2D users operating on the same RBs isunlimited Additionally to maximize the network utilizationwe enforce each RB to be allocated to at least one user Thatis

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (1b)

Note that both the OFDMA used for DL transmissionsand single carrier frequency division multiple access (SC-FDMA) applied in the UL direction provide orthogonality ofresource allocation to cellular communications This allowsachieving a minimal level of cochannel interference betweenthe transmitter-receiver pairs located within one cell [34]Thus when information is transmitted by cellularD2D userit will be distorted only by the users operating on the sameRB(s)

Let 119866119896119899119898 119899 isin N 119898 isin N and 119896 isin K denote the channelgain coefficient between the transmitter and receiver of PUnand PUm operating on RBk (for 119899 isin C 119866119896119899119899 indicates thechannel gain coefficient between PUn operating on RBk andthe eNB) In LTE system the instantaneous values of 119866119896119899119898can be obtained from the channel state information (CSI)through the use of special reference signals (RSs) [35] andhence they are known to the eNB and the usersThen for anyPUn operating on RBk the SINR at slot t in the UL directionis described by

SINR119896119899 (119905) = 119886119896119899 (119905) 119901119899 (119905) 119866119896119899119899sum119895isinN119899 119886119896119899 (119905) 119886119896119895 (119905) 119901119899 (119905) 119866119896119895119899 + 1198730 forall119899 isin N forall119896 isin K

(2)

where 1198730 is the variance of zero-mean additive white Gaus-sian noise (AWGN) power and 119901119899(119905) is the transmission

4 Mobile Information Systems

power allocated to PUn at slot t Clearly 119901119899(119905) is nonnegativeand cannot exceed some predefined maximal level 119875max

119899 thatis

0 le 119901119899 (119905) le 119875max119899 forall119899 isin N (3)

At any t the inband service rate of PUn depends on thenumber of RBs allocated to this device pair and the SINR ineach RB That is

119903119871119899 (119905) = 120596119871 sum119896isinK

119886119896119899 (119905) log (1 + SINR119896119899 (119905)) forall119899 isin N (4)

where 119903119871119899 (119905) is the service rate of PUn (in bits per slot or bps)over licensed (inband) spectrum and 120596119871 is the bandwidth ofone LTE RB (120596 = 180 kHz)

22 Outband Network Operation We consider M separateoutband wireless channels numbered for notation consis-tency as 119862119870+1 119862119870+119872 (In this paper we consider ageneral scenario when the unlicensed outband access can bebased on OFDMA CSMACA (in case of Wi-Fi Direct) FH-CDMA (in case of Bluetooth) or any other multiple accessmethod) We denote by M = 119870 + 1 119870 + 119872 the set ofchannel indices within the unlicensed band and use a binarychannel allocation variable 119886119898119899 (119905) 119899 isin N 119898 isin M to indicateif PUn is allocated with the unlicensed channel 119862119898 (in whichcase 119886119898119899 (119905) = 1) or not (119886119898119899 (119905) = 0) Note that 119886119898119899 (119905) = 0 forall 119899 isin C and 119898 isin M (since cellular users can access only theLTE bands) For 119899 isin D 119887119899(119905) equals 0 ifsum119898isinM 119886119898119899 (119905) ge 1 and1 otherwise (ie if sum119898isinM 119886119898119899 (119905) = 0) Hence

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (5a)

To avoid collisions the D2D pairs use a CSMACA methodwhen operating outband As a result each unlicensed channel119862119898 is available to D2D communication only when it is idleAdditionally to reduce the possibility of collisions betweenD2D users we assume that at any slot t at most one devicepair can transmit over each unlicensed channel 119862119898 That is

sum119899isinD

119886119898119899 (119905) le 1sum119899isinD

(1 minus 119887119899 (119905)) le 119872forall119898 isin M

(5b)

The transmission procedure for the pair of D2D usersoperating outband is described as follows At the beginning ofslot t one of the users starts sensing the allocated unlicensedchannel 119862119898 (for simplicity we assume perfect sensing) If thechannel is free the transmission phase (of the length119879119898119903 suchthat 0 le 119879119898119903 le 119879119904) begins Note that the duration of 119879119898119903is random It depends on the availability of the channel 119862119898and the applied CSMACA scheme The probability densityfunction (pdf) of 119879119898119903 is not calculated here (since it has noimpact on the further analysis in this paper) An example ofsuch calculations can be found in [36]

Let119866119898119899119899 for all119898 isin M denote the channel gain coefficientbetween the transmitter and receiver of PUn operating onunlicensed channel 119862119898 Then the SINR of PUn transmittingover the channel 119862119898 at slot t can be expressed by

SINR119898119899 (119905) = 119886119898119899 (119905) 119901119899 (119905) 1198661198981198991198991198730 forall119899 isin D forall119898 isin M (6a)

and the service rate of PUn over unlicensed (outband)spectrum is described by

119903119880119899 (119905) = sum119898isinM

119879119898119903 120596119880119898119879119904 119886119898119899 (119905) log (1 + SINR119898119899 (119905)) forall119899 isin D

(6b)

where 120596119880119898 is the bandwidth (in Hz) of unlicensed channel119862119898 Note that neither the eNB nor D2D users have priorinformation about quality and availability of unlicensedchannels Therefore the exact values of 119866119898119899119899 and 119879119898119903 areunknown to the eNB and the D2D users

3 Resource Allocation Problem

31 Problem Statement We define a binary 119873 times 119870-dimensional RB allocation matrix a119871 and a binary 119873 times 119872-dimensional unlicensed channel allocation matrix a119880 as

a119871119905 = [[[[[

11988611 (119905) sdot sdot sdot 1198861198701 (119905) 1198861119873 (119905) sdot sdot sdot 119886119870119873 (119905)

]]]]]

a119880119905 = [[[[[

119886119870+11 (119905) sdot sdot sdot 119886119870+1198721 (119905) 119886119870+1119873 (119905) sdot sdot sdot 119886119870+119872119873 (119905)

]]]]]

(7)

respectively We also define a binary N-dimensional D2Daccess allocation vector b119905 = (1198871(119905) 119887119873(119905)) a binary N-dimensional mode allocation vector c119905 = (1198881(119905) 119888119873(119905))and a real-valued N-dimensional power allocation vectorp119905 = (1199011(119905) 119901119872(119905)) Then the sets of all admissible valuesfor a119871119905 a

119880119905 b119905 c119905 and p119905 are described by

A119871 = a119871119905 | 119886119896119899 (119905) isin 0 1 forall119899 isin N forall119896 isin K (8a)

A119880 = a119880119905 | 119886119898119899 (119905) isin 0 1 119886119898119894 (119905) = 0 sum119895isinD

119886119898119895 (119905)le 1 forall119899 isin D forall119894 isin C forall119898 isin M

(8b)

B = b119905 | 119887119899 (119905) isin 0 1 119887119894 (119905) = 1 sum119895isinD

(1 minus 119887119895 (119905))

le 119872 forall119899 isin D forall119894 isin C

(8c)

Mobile Information Systems 5

eNB

D2D link

Cellular link

PU1

PU2

PU3 PU4

PU5PU6PU7

PU8

PU1

PU2

PU3

PU4

PU5

PU7

PU8

RB1 RB2 RB3 RB4 RB5 RB6 C7 C8 C9

aLt =

((((((((((((((

1 1 0 0 0 0

0 1 1 0 0 0

0 0 0 1 1 0

0 0 0 0 0 1

0 0 1 1 0 0

0 0 0 0 0 0

0 0 0 0 1 1

0 0 0 0 0 0

))))))))))))))

aUt =

((((((((((((((

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

1 1 0

0 0 0

0 0 1

))))))))))))))

bt =

((((((((((((((

1

1

1

1

1

0

1

0

))))))))))))))

ct =

((((((((((((((

0

1

0

0

1

0

1

0

))))))))))))))

Figure 1 A D2D-enabled cellular network with three cellular pairs (PU2 PU5 and PU7) and five D2D pairs (PU1 PU3 PU4 PU6 and PU8)Three of the D2D pairs (PU1 PU3 and PU4) use inband access and two D2D pairs (PU6 PU8) are allocated with the unlicensed channelsIn this example different cellular and D2D pairs interfere with each other when transmitting over RB2 (where PU2 interferes with PU1) RB3(PU1 interferes with PU5) RB4 (PU5 interferes with PU3) RB5 (PU7 interferes with PU3) and RB6 (PU4 interferes with PU7)

C = c119905 | 119888119899 (119905) isin 0 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (8d)

P = p119905 | 0 le 119901119899 (119905) le 119875max119899 forall119899 isin N (8e)

Example of a D2D-enabled network with all defined opti-mization variables is shown in Figure 1

Ideally at any slot t the eNB should distribute the net-work resources among the users tomaximize their aggregatedservice rate That is to maximize the sum

sum119899isinN

119903119899 (119905) = sum119899isinN

(119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905)) (9)

where 119903119899(119905) represents the service rate of PUn (operatingeither inband or outband) However when communicating

6 Mobile Information Systems

over the unlicensed spectrum eachD2Dpair should transmitat a maximal power level to achieve the high SINR regime(and consequently service rate) which in turn results inincreased power consumption of mobile terminals There-fore when formulating the utility of each device pair weshould also consider the cost of power consumption toquantify the trade-off between the achieved rate and powerlevel (as in [37]) Accordingly we can define a utility 119906119899(119905)

of PUn at slot 119905 as the difference between its instantaneousservice rate 119903119899(119905) and the cost of power consumption

119906119899 (119905) = 119903119899 (119905) minus 120592119899119901119899 (119905)= 119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905) (10)

where 120592119899 ge 0 is the cost per unit (W) level of power for PUnUsing the above definition we can express our resource

allocation problem as follows

maximize sum119899isinN 119906119899 (119905) = sum119899isinN [119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (11a)

subject to a119871119905 isin A119871a119880119905 isin A119880b119905 isin Bc119905 isin Cp119905 isin P

(11b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (11c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (11d)

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (11e)

SINR119896119899 (119905) le SINRtar119899 forall119899 isin N forall119896 isin K cup M (11f)

where the constraint (11f) is necessary to protect the usersfrom heavy interference (here SINRtar

119899 stands for theminimalSINR level acceptable by PUn) Note that information onthe sets C and D is readily available at the eNB The valuesof 119866119896119899119898 for 119899 isin N 119898 isin N and 119896 isin K are obtained bythe eNB from the CSI carried by the RSs The only missinginformation is related to 119903119880119899 (119905) that depends on the parameters119879119898119903 (representing the availability of the unlicensed channel119862119896 in our model) and 119866119898119899119899 (which defines the quality ofunlicensed channel 119862119896) for all 119898 isin M The latter parameteris determined by the unlicensed channel allocations andhence the eNB can adapt to the changes of 119866119898119899119899 in time andspace Since there is no coordination (and no informationexchange) between the LTE and outband RAT interfacessolving (11a)ndash(11f) to optimality might be impossible whichis a rather strong argument in favor of applying a well-knownreinforcement learning (RL) for resource allocation

The main idea behind RL is that the actions (unlicensedchannel allocations) leading to the higher network utility atslot 119905 should be granted with higher probabilities at slot 119905 + 1and vice versa [22] In the simplest form of RL (presentedin [24]) the learning agent estimates its possible strategiesbased on the locally observed utility without any prior infor-mation about the operating environment This form of RL

requires only algebraic operations but does not guarantee theconvergence to an equilibrium [25] In Q-learning [22] theagentrsquos utility is estimated using some value-action functionGiven the certain (easy to follow) conditions this algorithmconverges (with probability 1) to an NE state However itrequires maximization of the action-value at every slot t(which can be computationally demanding depending onthe structure of a chosen value-action function) [20] InJUSTE-RL algorithm [23] the learning agent estimates notonly its own strategy but also the expected utility for allof its actions Unlike Q-learning JUSTE-RL does not needto perform optimization of the action-value (since onlyalgebraic operations are required to update the strategies)and hence it has a lower computational complexity On theother hand compared to a basic RL algorithm JUSTE-RLconverges to a 120576-NE [23 25] We now show how a JUSTE-RLwith regret can be applied to our problem

32 Unlicensed Channel Allocation To apply JUSTE-RL withregret to our problem we represent it as a game with oneplayer (the eNB) having no information about the operatingenvironment A finite set of the eNBrsquos actions A119880 representsthe set of all admissible unlicensed channel allocation deci-sions The objective of the eNB is to select at any slot t an

Mobile Information Systems 7

action 119860 119905 = a119880119905 isin A119880 to maximize the eNBrsquos utility 119906119905 =sum119899isinN 119906119899(119905) In the following we use notation 119886119898119899 (119905) forall119898 isin Mto specify the eNBrsquos decision regarding the allocation of anunlicensed channel 119862119896 to a pair PUn and a119880119905 to describe allunlicensed channel allocations by the eNB when selectinga particular action 119860 119905 at slot t We also use b119905 to denote

the D2D access allocation vector and 119903119880119899 (119905) to indicate theoutband service rate achieved by playing the action 119860 119905 Aftertaking an action 119860 119905 = a119880119905 at slot t the eNB observes the(random) service rate 119903119880119899 (119905) and estimates the network utility119906119905 = 119906119905(119860 119905) by solving the following problem

maximize 119906119905 = sum119899isinN

[119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (12a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(12b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (12c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (12d)

SINR119899 (119905) le SINRtar119899 forall119899 isin N (12e)

where

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (12f)

and 119887119899(119905) = 0 forall119899 isin C Note that unlike problem (11a)ndash(11f)problem (12a)ndash(12e) can be solved to optimality (since 119903119880119899 (119905) isknown) It has three optimization variables a119871119905 c119905 and p119905 andhence its complexity is lower than the complexity of (11a)ndash(11f) (the method for solving (12a)ndash(12e) is presented in thenext subsection)

We also define a mixed-strategy probability 120587119905 of playingan action 119860 119905 = a119880119905 at slot t as

120587119905 (119860 119905) = 120587119905 (119861)119861isinA119880 120587119905 (119861) = Pr 119860 119905 = 119861

forall119860 119905 isin A119880(13a)

and a regret 120588119905(119860 119905) for not playing this action at slot t as

120588119905 (119860 119905) = max 0 119906119905minus1 minus 119906119905 forall119860 119905 isin A119880 (13b)

In JUSTE-RL the probability distribution of a regret over allpossible actions becomes the BoltzmannndashGibbs distribution(aka canonical ensemble) given by [22]

119866 120588119905 (119860 119905) = exp minus120588119905 (119860 119905) 119896119879119861sum119861isinA119880 exp minus120588119905 (119861) 119896119879119861 forall119860 119905 isin A119880

(14)

where k = 138064852 times 10minus23 JK is the Boltzmann constant119879119861 is the system temperature (in K) High temperatures make

all actions almost equiprobable and low temperatures resultin greedy action selection [22]

Using the above definitions the dynamics of a JUSTE-RLwith regret can be described as [23]

119906119905+1 (119861) = 119906119905 (119861) + 1205721199051119860119905=119861 (119906119905 minus 119906119905 (119861)) 120588119905+1 (119861) = 120588119905 (119861) + 120573119905 (119906119905 (119861) minus 119906119905 minus 120588119905 (119861)) 120587119905+1 (119861) = 120587119905 (119861) + 120574119905 (119866 120588119905 (119861) minus 120587119899 (119861))

(15)

for all 119861 isin A119880 where 120572119905 120573119905 and 120574119905 are the learning rates suchthat [23]

lim119905rarrinfin

119905sum120591=1

120572120591 = +infinlim119905rarrinfin

119905sum120591=1

1205722120591 lt +infinlim119905rarrinfin

119905sum120591=1

120573120591 = +infinlim119905rarrinfin

119905sum120591=1

1205732120591 lt +infinlim119905rarrinfin

119905sum120591=1

120574120591 = +infinlim119905rarrinfin

119905sum120591=1

1205742120591 lt +infin

8 Mobile Information Systems

Initialization(1) Input 120592120572 120592120573 120592120574 119879(2) For all 1198600 isin A119880 set 1199060(1198600) larr 0 1205880(1198600) larr 0 1205870(1198600) larr 0(3) For all 119899 isin N set 119903119880119899 (0) larr 0Main Loop(4)While (119905 le 119879) do(5) Select 119860 119905 larr argmax119861isinA119880(120587119905(119861)) and set a119880119905 larr 119860 119905(6) For all 119899 isin D set 119887119899(119905) = max0 1 minus sum119898isinM 119886119898119899 (119905)(7) For all 119899 isin C set 119887119899(119905) = 0(8) Execute 119860 119905 and observe 119903119880119899 (119905) for all 119899 isin N(9) Solve (12a)ndash(12e) to find an optimal 119906119905(119860 119905) larr 119906119905(10) For all 119861 isin A119880 update 119906119905(119861) 120588119905(119861) 120587119905(119861) using (15)

(11) End

Algorithm 1 JUSTE-RL with regret for unlicensed channel allocation RL algorithm for outband channel allocation

lim119905rarrinfin

120574119905120572119905 = 0lim119905rarrinfin

120573119905120574119905 = 0(16a)

Typically the learning rates are set equal [22]

120572119905 = 1(119905 + 1)120592120572 120573119905 = 1(119905 + 1)120592120573 120574119905 = 1(119905 + 1)120592120574

(16b)

where 120592120572 isin (05 1] 120592120573 isin (05 1] 120592120574 isin (05 1] and 120592120572 le120592120573 le 120592120574 The initializations 1199060(119860) ge 0 1205880(119860) ge 0 and1205870(119860) ge 0 should be sufficiently close to zero for all 119860 isin A119880The dynamics (15) converge to the 120576-Nash equilibrium Notethat a Nash equilibrium point for (15) is given by [23]

lim119905rarrinfin

120587119905 (119860) = 120587lowast (119860) forall119860 isin A119880 (17)

The corresponding learning algorithm for unlicensed chan-nel allocation is presented in Algorithm 1 (where T indicatesthe total simulation length in slots) Note that this algorithmconverges when 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880 Thecomplexity of JUSTE-RL with regret is mainly determinedby the size of an action set A119880 since at any slot t we have

to select an action 119860 119905 isin A119880 that maximizes 120587119905(119860 119905) (thedynamics in (15) require only algebraic operations and thusits computational complexity is negligible) Consequently theworst-case time complexity of Algorithm 1 is 119874(119899) where119899 = |A119880| le 119873 times 119872 is the size of our action set

33 Inband Resource Allocation Consider (12a)ndash(12e) thatrepresents a joint mode RB and power level allocationproblemThis problem has two binary optimization variablesa119871119905 and c119905 one real-valued variable p119905 nonlinear objective(12a) and nonlinear constraints (12c) and (12e) Hence itbelongs to a family of the mixed-integer nonlinear program-ming (MINLP) problems It has been well established inthe past (see eg [38]) that all MINLP problems involvingbinary variables (such as (12a)ndash(12e)) are NondeterministicPolynomial-time- (NP-) hard For immediate NP-hardnessproof for a considered problem note that given that a119871119905 canbe either 0 or 1 any feasible solution to (12a)ndash(12e) is a subsetof vertices The constraint (12d) also implies that at least oneend point of every edge is included in this subset Hence thesolution to this problem describes a vertex cover for whichfinding a minimum is NP-hard

Most of the MINLP solution techniques involve theconstruction of the following relaxations to the consideredproblem a nonlinear programming (NLP) relaxation (theoriginal problem without integer restrictions) and a mixed-integer linear-programming (MILP) relaxation (an originalproblem where the nonlinearities are replaced by supportinghyperplanes) To form the MILP and NLP relaxations to(12a)ndash(12e) let us first represent in equivalent form thefollowing

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (18a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(18b)

Mobile Information Systems 9

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (18c)

1198921119896 (a119871119905 c119905) = sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) minus 1 le 0 forall119896 isin K (18d)

1198922119899 (a119871119905 p119905) = SINR119899 (119905) minus SINRtar119899 le 0 forall119899 isin N (18e)

1198923119899119896 (a119871119905 p119905) = SINR119896119899 (119905) minus 2119889119896119899 + 1 le 0 forall119899 isin N forall119896 isin K (18f)

where objective (18a) and constraints (18b) and (18c) arelinear while constraints (18d)ndash(18f) are nonlinear The MILP

relaxation to (18a)ndash(18f) in a given point (a1198710119905 c0119905 p0119905 ) is given

by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (19a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(19b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (19c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (19d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (19e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (19f)

The NLP relaxation to (18a)ndash(18f) is given by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (20a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(20b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (20c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (20d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (20e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (20f)

10 Mobile Information Systems

where

A119871 = a119871119905 | 0 le 119886119896119899 (119905) le 1 forall119899 isin N forall119896 isin K (20g)

C

= c119905 | 0 le 119888119899 (119905) le 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (20h)

In general allMINLPproblems can be solved using eitherexact techniques (eg branch-and-bound [39]) or heuristicmethods (such as local branching [40] large neighborhoodsearch [41] and feasibility pump [42]) Since we are inter-ested in a reasonably simple and fast algorithm it is moreconvenient to use heuristics to solve (18a)ndash(18f) Amongnumerous heuristic techniques feasibility pump (FP) [43] isperhaps the most simple and effective method for producingmore and better solutions in a shorter average running

time (the local convergence properties of FP for nonconvexproblems have been proved in [44]) The fundamental ideaof an FP heuristic is to decompose the MINLP probleminto two parts integer feasibility and constraint feasibilityThe former is achieved by rounding (solving the MILPrelaxation to an original problem) the latter by projection(solving the NLP relaxation) The algorithm generates twosequences of integral and rounding pointsThe first sequenceof integral points (a119871119894119905 c119894119905 p119894119905)119868119894=1 119868 = 1 2 containsthe solutions that may violate the nonlinear constraints thesecond sequence (a119871119894119905 c119894119905 p119894119905)119868119894=1 comprises the roundingpoints that are feasible for the MILP relaxation but might notbe integral

Particularly with the input (a1198711119905 c1119905 p1119905 ) being a solution toan NLP relaxation (20a)ndash(20f) FP generates two sequencesby solving the following problems for 119894 = 1 119868

(a119871119894119905 c119894119905 p119894119905) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038171 (21a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(21b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (21c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (21d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (21e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (21f)

(a119871119894+1119905 c119894+1119905 p119894+1119905 ) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038172 (22a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(22b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (22c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (22d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (22e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (22f)

where sdot1 and sdot2 are l1-norm and l2-norm respectivelyTherounding is carried out by solving the problem (21a)ndash(21f)

and the projection is the solution to (22a)ndash(22f) Conse-quently an FP algorithmalternates between the rounding and

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

2 Mobile Information Systems

the randomness of unlicensed channels Existing works onoutband D2D access focus on such issues as power con-sumption (eg [14ndash17]) and coordination between cellularand wireless interfaces ([18ndash21]) Some of these works ([1415 21]) suggest control of unlicensed band by the cellularnetwork (which requires a certain amount of cooperation andinformation exchange between different radio interfaces)Other works (eg [17 18 20]) imply autonomous operationof D2D devices (based on stochastic modeling of unlicensedchannels)

The main contributions of this work are as followsWe consider a network-controlled D2D communicationin which the licensed and unlicensed spectrum resourcesuser modes and transmission power levels are allocatedto different device pairs by the LTE eNB to maximizethe overall network utility We consider a general networkdeployment scenario where the unlicensed band is assumedto be provided by one or more radio access technologies(RATs) based on the orthogonal frequency division multipleaccess (OFDMA) carrier sense multiple access with collisionavoidance (CSMACA) frequency-hopping code divisionmultiple access (FH-CDMA) or any other multiple accessmethod It is assumed that all device pairs are equippedwith different wireless interfaces allowing them to connectto the appropriate RAT and use a CSMACA to avoid colli-sions when operating on the unlicensed band Hence eachunlicensed channel becomes available to a D2D pair onlywhen it is idle Unlike many previous works we jointly solvethe problems of inbandoutband access mode selection andspectrumpower assignment by combining these problemsinto one optimization problem which allows to allocate theinband network resources and offload the D2D traffic ina most effective way (in terms of maximizing the overallnetwork utility) Note that the formulated problem can besolved to optimality only if the global channel and networkknowledge (including the precise information on the oper-ating conditions of the licensed and unlicensed channels) isavailable to the eNB However because of the absence of anestablished control link between the unlicensed and cellularradio interfaces the eNB cannot get any information aboutthe quality and availability of the unlicensed channels As aresult a considered resource allocation problem becomes astochastic optimization problem that can be dealt with bydeploying a learning theory [22] (to estimate the randomunlicensed channel environment)

Consequently we formulate the outband D2D access asa dynamic single-player game in which the player (eNB)estimates its possible strategy and expected utility for all ofits actions based only on its own local observations usinga JUSTE-RL with regret (originally proposed in [23]) Themain idea behind RL is that the actions leading to the highernetwork utility at the current stage should be granted withhigher probabilities at the next stage [22] In the simplest formof RL (described eg in [24]) a learning agent estimates itsbest strategy based on its observed utility without any priorinformation about its operating environment This form ofRL requires only algebraic operations but its convergence tothe equilibrium state is not guaranteed [25] In Q-learning[22] a utility is estimated using some value-action function

This RL method converges to a Nash equilibrium (NE)state However it requires maximization of the action-valueat every stage which can be computationally demanding[22] In JUSTE-RL algorithm (described in detail in [23])a learning agent estimates not only its own strategy butalso the expected utility for all of its actions Unlike Q-learning JUSTE-RL does not need to perform optimizationof the action-value (since only algebraic operations arerequired to update the strategies) and hence it has a lowercomputational complexity On the other hand comparedto a basic RL algorithm JUSTE-RL converges to a 120576-NE[23 25]

It is worth mentioning that in wireless communicationsRL has been studied in the context of various spectrum accessproblems In [26 27] the learning has been employed tomin-imize the interference (created by adjacent nodes) in partiallyoverlapping channels This problem has been formulated asthe exact potential graphical game admitting a pure-strategyNE and therefore the proposed approach is not realizablein a broader range of problems A cognitive network withmultiple players has been analyzed in [28] In this workthe learning and channel selection have been separated intotwo different procedures which increased the complexity of aproposed resource allocation approach Besides the stabilityof a final solution was not verified A multi-player game forinbandD2D access where the players (D2Dusers) learn theiroptimal strategies based on the throughput performance ina stochastic environment has been studied in [29] It wasassumed that eachD2D user can transmit over the vacant cel-lular channels using a CSMACA implying that there are nochannels with interfering users (ie each orthogonal channelcan be occupied by at most one cellularD2D user) Althoughthe authors consider a scenariowith twoD2Dusers operatingon the same channel it is not clear how a D2D user cansense whether the user operating on the channel is cellular orD2D An autonomous D2D access in heterogeneous cellularnetworks comprising multiple low-power and high-powerBSs with (possibly) overlapping spectrum bands has beeninvestigated in [30] This problem has been modeled as astochastic noncooperative game with multiple players (D2Dpairs) admitting amixed-strategy NEThe goal of each playerwas to jointly select the wireless channel and power levelto maximize its reward defined as the difference betweenthe achieved throughput and the cost of power consumptionconstrained by the minimum tolerable SINR requirementsof this D2D pair To solve this problem a fully autonomousmultiagent Q-learning algorithm (which does not requireany information exchange andor cooperation among dif-ferent users) is developed and implemented in an LTE-Anetwork

The rest of the paper is organized as follows A generalnetwork model for inband and outband network opera-tion is described in Section 2 A general problem and thealgorithms for unlicensed and licensed resource allocationare formulated in Section 3 The algorithm implementationincluding the proposed resource allocation procedure in anLTE-A networks and performance evaluation is presented inSection 4 The paper is finalized in Conclusion

Mobile Information Systems 3

2 Network Model

In this paper the problem of resource allocation for D2Dcommunication is investigated for both the uplink (UL) anddownlink (DL) directions Similarly the discussion throughthe rest of the paper is applicable (if not stated otherwise) toeither direction Consider a basic LTE-A network consistingof one eNB and 119873 user pairs denoted PU1 PU119873 withN = 1 119873 being the set of user pairsrsquo indices It isassumed that a fixed licensed spectrum band of the eNBspans119870 resource blocks (RBs) numberedRB1 RB119870 withK = 1 119870 denoting the set of RBsrsquo indices comprisingthe bandwidth The network runs on a slotted-time basiswith the time axis partitioned into equal nonoverlapping timeintervals (slots) of the length 119879119904 with t denoting an integer-valued slot index Each pair of users can communicate witheach other either by the traditional cellular mode (CM) viathe eNB or in a D2D mode (DM) without traversing theeNB Let C sube N be the set of the indices of device pairsthat can operate only in CM and let D = N C denotethe set of the indices of potential D2D pairs (The indices inC and D can be determined based on eg user application(such as video sharing gaming and proximity-aware socialnetworking) in which the pair of devices could potentiallybe in range for the direct communication Such informationcan be acquired from a standard session initiation protocol(SIP) procedure (which handles the session setups and usersarrivals in LTE networks) Interested readers are referred to[31] for a comprehensive description of an SIP procedure andits use in the D2D access)

In our network any potential D2D pair can be allocatedwith cellular or D2D mode (based on the results of resourceallocation procedure) Consequently we define a binarymode allocation variable 119888119899(119905) 119899 isin N equaling 1 if PUn isallocated CM at slot t and 0 otherwise Note that 119888119899(119905) = 1for all 119899 isin C Further we consider the following models ofD2D access

(i) Inband D2D a D2D pair operates within the licensedLTE spectrum in an underlay to cellular communica-tion

(ii) Outband D2D a D2D pair transmits over the unli-censed band by exploiting other RATs such as Wi-Fi Direct [11] ZigBee [12] or Bluetooth [13] (It isassumed that all user devices are equipped withthe corresponding wireless interfaces to be able tocommunicate using a suitable RAT) We assume thatthere is no coordination andor information exchangebetween different wireless interfaces

To differentiate the pairs according to their D2D access wedefine a binary channel access variable 119887119899(119905) 119899 isin N equaling1 if PUn operates inband at slot t and 0 otherwise Note thatall cellular users can access only the LTEbandsHence 119887119899(119905) =1 for all 119899 isin C

21 Inband Network Operation In LTELTE-A RBs areallocated to cellular users by the eNBs using a standard packetscheduling procedure [32] The use of packet scheduling in aD2D-enabled LTE-A network is described in detail in [33]

In short a packet scheduling process can be explained asfollows In the UL direction at the beginning of any slott each user is required to collect and transmit its bufferstatus information After collecting this data a user sendsthe scheduling request (SR) with its buffer status informationto the eNB via a dedicated physical uplink control channel(PUCCH) After receiving all the SRs the eNB allocates theRBs to the users (according to a certain scheduling algorithm)and responds to all the SRs by sending the schedulinggrants (SGs) together with the allocation information to thecorresponding users via dedicated physical downlink controlchannels (PDCCHs) [33] In theDL the eNB readily finds outtheDL buffer status for each user allocates the RBs and sendsthe SGs with allocation information via PDCCHs [33] In theframework used in this paper the above scheduling processis applied for both the cellular and D2D communication withsome modifications (the corresponding resource allocationprocedure will be described in Section 4)

Let us further define a binary RB allocation variable 119886119896119899(119905)119899 isin N 119896 isin K equaling 1 if PUn is allocated with RBk at slott and 0 otherwise Each RB can be allocated to at most onecellular user Hence

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (1a)

The number of D2D users operating on the same RBs isunlimited Additionally to maximize the network utilizationwe enforce each RB to be allocated to at least one user Thatis

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (1b)

Note that both the OFDMA used for DL transmissionsand single carrier frequency division multiple access (SC-FDMA) applied in the UL direction provide orthogonality ofresource allocation to cellular communications This allowsachieving a minimal level of cochannel interference betweenthe transmitter-receiver pairs located within one cell [34]Thus when information is transmitted by cellularD2D userit will be distorted only by the users operating on the sameRB(s)

Let 119866119896119899119898 119899 isin N 119898 isin N and 119896 isin K denote the channelgain coefficient between the transmitter and receiver of PUnand PUm operating on RBk (for 119899 isin C 119866119896119899119899 indicates thechannel gain coefficient between PUn operating on RBk andthe eNB) In LTE system the instantaneous values of 119866119896119899119898can be obtained from the channel state information (CSI)through the use of special reference signals (RSs) [35] andhence they are known to the eNB and the usersThen for anyPUn operating on RBk the SINR at slot t in the UL directionis described by

SINR119896119899 (119905) = 119886119896119899 (119905) 119901119899 (119905) 119866119896119899119899sum119895isinN119899 119886119896119899 (119905) 119886119896119895 (119905) 119901119899 (119905) 119866119896119895119899 + 1198730 forall119899 isin N forall119896 isin K

(2)

where 1198730 is the variance of zero-mean additive white Gaus-sian noise (AWGN) power and 119901119899(119905) is the transmission

4 Mobile Information Systems

power allocated to PUn at slot t Clearly 119901119899(119905) is nonnegativeand cannot exceed some predefined maximal level 119875max

119899 thatis

0 le 119901119899 (119905) le 119875max119899 forall119899 isin N (3)

At any t the inband service rate of PUn depends on thenumber of RBs allocated to this device pair and the SINR ineach RB That is

119903119871119899 (119905) = 120596119871 sum119896isinK

119886119896119899 (119905) log (1 + SINR119896119899 (119905)) forall119899 isin N (4)

where 119903119871119899 (119905) is the service rate of PUn (in bits per slot or bps)over licensed (inband) spectrum and 120596119871 is the bandwidth ofone LTE RB (120596 = 180 kHz)

22 Outband Network Operation We consider M separateoutband wireless channels numbered for notation consis-tency as 119862119870+1 119862119870+119872 (In this paper we consider ageneral scenario when the unlicensed outband access can bebased on OFDMA CSMACA (in case of Wi-Fi Direct) FH-CDMA (in case of Bluetooth) or any other multiple accessmethod) We denote by M = 119870 + 1 119870 + 119872 the set ofchannel indices within the unlicensed band and use a binarychannel allocation variable 119886119898119899 (119905) 119899 isin N 119898 isin M to indicateif PUn is allocated with the unlicensed channel 119862119898 (in whichcase 119886119898119899 (119905) = 1) or not (119886119898119899 (119905) = 0) Note that 119886119898119899 (119905) = 0 forall 119899 isin C and 119898 isin M (since cellular users can access only theLTE bands) For 119899 isin D 119887119899(119905) equals 0 ifsum119898isinM 119886119898119899 (119905) ge 1 and1 otherwise (ie if sum119898isinM 119886119898119899 (119905) = 0) Hence

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (5a)

To avoid collisions the D2D pairs use a CSMACA methodwhen operating outband As a result each unlicensed channel119862119898 is available to D2D communication only when it is idleAdditionally to reduce the possibility of collisions betweenD2D users we assume that at any slot t at most one devicepair can transmit over each unlicensed channel 119862119898 That is

sum119899isinD

119886119898119899 (119905) le 1sum119899isinD

(1 minus 119887119899 (119905)) le 119872forall119898 isin M

(5b)

The transmission procedure for the pair of D2D usersoperating outband is described as follows At the beginning ofslot t one of the users starts sensing the allocated unlicensedchannel 119862119898 (for simplicity we assume perfect sensing) If thechannel is free the transmission phase (of the length119879119898119903 suchthat 0 le 119879119898119903 le 119879119904) begins Note that the duration of 119879119898119903is random It depends on the availability of the channel 119862119898and the applied CSMACA scheme The probability densityfunction (pdf) of 119879119898119903 is not calculated here (since it has noimpact on the further analysis in this paper) An example ofsuch calculations can be found in [36]

Let119866119898119899119899 for all119898 isin M denote the channel gain coefficientbetween the transmitter and receiver of PUn operating onunlicensed channel 119862119898 Then the SINR of PUn transmittingover the channel 119862119898 at slot t can be expressed by

SINR119898119899 (119905) = 119886119898119899 (119905) 119901119899 (119905) 1198661198981198991198991198730 forall119899 isin D forall119898 isin M (6a)

and the service rate of PUn over unlicensed (outband)spectrum is described by

119903119880119899 (119905) = sum119898isinM

119879119898119903 120596119880119898119879119904 119886119898119899 (119905) log (1 + SINR119898119899 (119905)) forall119899 isin D

(6b)

where 120596119880119898 is the bandwidth (in Hz) of unlicensed channel119862119898 Note that neither the eNB nor D2D users have priorinformation about quality and availability of unlicensedchannels Therefore the exact values of 119866119898119899119899 and 119879119898119903 areunknown to the eNB and the D2D users

3 Resource Allocation Problem

31 Problem Statement We define a binary 119873 times 119870-dimensional RB allocation matrix a119871 and a binary 119873 times 119872-dimensional unlicensed channel allocation matrix a119880 as

a119871119905 = [[[[[

11988611 (119905) sdot sdot sdot 1198861198701 (119905) 1198861119873 (119905) sdot sdot sdot 119886119870119873 (119905)

]]]]]

a119880119905 = [[[[[

119886119870+11 (119905) sdot sdot sdot 119886119870+1198721 (119905) 119886119870+1119873 (119905) sdot sdot sdot 119886119870+119872119873 (119905)

]]]]]

(7)

respectively We also define a binary N-dimensional D2Daccess allocation vector b119905 = (1198871(119905) 119887119873(119905)) a binary N-dimensional mode allocation vector c119905 = (1198881(119905) 119888119873(119905))and a real-valued N-dimensional power allocation vectorp119905 = (1199011(119905) 119901119872(119905)) Then the sets of all admissible valuesfor a119871119905 a

119880119905 b119905 c119905 and p119905 are described by

A119871 = a119871119905 | 119886119896119899 (119905) isin 0 1 forall119899 isin N forall119896 isin K (8a)

A119880 = a119880119905 | 119886119898119899 (119905) isin 0 1 119886119898119894 (119905) = 0 sum119895isinD

119886119898119895 (119905)le 1 forall119899 isin D forall119894 isin C forall119898 isin M

(8b)

B = b119905 | 119887119899 (119905) isin 0 1 119887119894 (119905) = 1 sum119895isinD

(1 minus 119887119895 (119905))

le 119872 forall119899 isin D forall119894 isin C

(8c)

Mobile Information Systems 5

eNB

D2D link

Cellular link

PU1

PU2

PU3 PU4

PU5PU6PU7

PU8

PU1

PU2

PU3

PU4

PU5

PU7

PU8

RB1 RB2 RB3 RB4 RB5 RB6 C7 C8 C9

aLt =

((((((((((((((

1 1 0 0 0 0

0 1 1 0 0 0

0 0 0 1 1 0

0 0 0 0 0 1

0 0 1 1 0 0

0 0 0 0 0 0

0 0 0 0 1 1

0 0 0 0 0 0

))))))))))))))

aUt =

((((((((((((((

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

1 1 0

0 0 0

0 0 1

))))))))))))))

bt =

((((((((((((((

1

1

1

1

1

0

1

0

))))))))))))))

ct =

((((((((((((((

0

1

0

0

1

0

1

0

))))))))))))))

Figure 1 A D2D-enabled cellular network with three cellular pairs (PU2 PU5 and PU7) and five D2D pairs (PU1 PU3 PU4 PU6 and PU8)Three of the D2D pairs (PU1 PU3 and PU4) use inband access and two D2D pairs (PU6 PU8) are allocated with the unlicensed channelsIn this example different cellular and D2D pairs interfere with each other when transmitting over RB2 (where PU2 interferes with PU1) RB3(PU1 interferes with PU5) RB4 (PU5 interferes with PU3) RB5 (PU7 interferes with PU3) and RB6 (PU4 interferes with PU7)

C = c119905 | 119888119899 (119905) isin 0 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (8d)

P = p119905 | 0 le 119901119899 (119905) le 119875max119899 forall119899 isin N (8e)

Example of a D2D-enabled network with all defined opti-mization variables is shown in Figure 1

Ideally at any slot t the eNB should distribute the net-work resources among the users tomaximize their aggregatedservice rate That is to maximize the sum

sum119899isinN

119903119899 (119905) = sum119899isinN

(119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905)) (9)

where 119903119899(119905) represents the service rate of PUn (operatingeither inband or outband) However when communicating

6 Mobile Information Systems

over the unlicensed spectrum eachD2Dpair should transmitat a maximal power level to achieve the high SINR regime(and consequently service rate) which in turn results inincreased power consumption of mobile terminals There-fore when formulating the utility of each device pair weshould also consider the cost of power consumption toquantify the trade-off between the achieved rate and powerlevel (as in [37]) Accordingly we can define a utility 119906119899(119905)

of PUn at slot 119905 as the difference between its instantaneousservice rate 119903119899(119905) and the cost of power consumption

119906119899 (119905) = 119903119899 (119905) minus 120592119899119901119899 (119905)= 119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905) (10)

where 120592119899 ge 0 is the cost per unit (W) level of power for PUnUsing the above definition we can express our resource

allocation problem as follows

maximize sum119899isinN 119906119899 (119905) = sum119899isinN [119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (11a)

subject to a119871119905 isin A119871a119880119905 isin A119880b119905 isin Bc119905 isin Cp119905 isin P

(11b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (11c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (11d)

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (11e)

SINR119896119899 (119905) le SINRtar119899 forall119899 isin N forall119896 isin K cup M (11f)

where the constraint (11f) is necessary to protect the usersfrom heavy interference (here SINRtar

119899 stands for theminimalSINR level acceptable by PUn) Note that information onthe sets C and D is readily available at the eNB The valuesof 119866119896119899119898 for 119899 isin N 119898 isin N and 119896 isin K are obtained bythe eNB from the CSI carried by the RSs The only missinginformation is related to 119903119880119899 (119905) that depends on the parameters119879119898119903 (representing the availability of the unlicensed channel119862119896 in our model) and 119866119898119899119899 (which defines the quality ofunlicensed channel 119862119896) for all 119898 isin M The latter parameteris determined by the unlicensed channel allocations andhence the eNB can adapt to the changes of 119866119898119899119899 in time andspace Since there is no coordination (and no informationexchange) between the LTE and outband RAT interfacessolving (11a)ndash(11f) to optimality might be impossible whichis a rather strong argument in favor of applying a well-knownreinforcement learning (RL) for resource allocation

The main idea behind RL is that the actions (unlicensedchannel allocations) leading to the higher network utility atslot 119905 should be granted with higher probabilities at slot 119905 + 1and vice versa [22] In the simplest form of RL (presentedin [24]) the learning agent estimates its possible strategiesbased on the locally observed utility without any prior infor-mation about the operating environment This form of RL

requires only algebraic operations but does not guarantee theconvergence to an equilibrium [25] In Q-learning [22] theagentrsquos utility is estimated using some value-action functionGiven the certain (easy to follow) conditions this algorithmconverges (with probability 1) to an NE state However itrequires maximization of the action-value at every slot t(which can be computationally demanding depending onthe structure of a chosen value-action function) [20] InJUSTE-RL algorithm [23] the learning agent estimates notonly its own strategy but also the expected utility for allof its actions Unlike Q-learning JUSTE-RL does not needto perform optimization of the action-value (since onlyalgebraic operations are required to update the strategies)and hence it has a lower computational complexity On theother hand compared to a basic RL algorithm JUSTE-RLconverges to a 120576-NE [23 25] We now show how a JUSTE-RLwith regret can be applied to our problem

32 Unlicensed Channel Allocation To apply JUSTE-RL withregret to our problem we represent it as a game with oneplayer (the eNB) having no information about the operatingenvironment A finite set of the eNBrsquos actions A119880 representsthe set of all admissible unlicensed channel allocation deci-sions The objective of the eNB is to select at any slot t an

Mobile Information Systems 7

action 119860 119905 = a119880119905 isin A119880 to maximize the eNBrsquos utility 119906119905 =sum119899isinN 119906119899(119905) In the following we use notation 119886119898119899 (119905) forall119898 isin Mto specify the eNBrsquos decision regarding the allocation of anunlicensed channel 119862119896 to a pair PUn and a119880119905 to describe allunlicensed channel allocations by the eNB when selectinga particular action 119860 119905 at slot t We also use b119905 to denote

the D2D access allocation vector and 119903119880119899 (119905) to indicate theoutband service rate achieved by playing the action 119860 119905 Aftertaking an action 119860 119905 = a119880119905 at slot t the eNB observes the(random) service rate 119903119880119899 (119905) and estimates the network utility119906119905 = 119906119905(119860 119905) by solving the following problem

maximize 119906119905 = sum119899isinN

[119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (12a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(12b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (12c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (12d)

SINR119899 (119905) le SINRtar119899 forall119899 isin N (12e)

where

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (12f)

and 119887119899(119905) = 0 forall119899 isin C Note that unlike problem (11a)ndash(11f)problem (12a)ndash(12e) can be solved to optimality (since 119903119880119899 (119905) isknown) It has three optimization variables a119871119905 c119905 and p119905 andhence its complexity is lower than the complexity of (11a)ndash(11f) (the method for solving (12a)ndash(12e) is presented in thenext subsection)

We also define a mixed-strategy probability 120587119905 of playingan action 119860 119905 = a119880119905 at slot t as

120587119905 (119860 119905) = 120587119905 (119861)119861isinA119880 120587119905 (119861) = Pr 119860 119905 = 119861

forall119860 119905 isin A119880(13a)

and a regret 120588119905(119860 119905) for not playing this action at slot t as

120588119905 (119860 119905) = max 0 119906119905minus1 minus 119906119905 forall119860 119905 isin A119880 (13b)

In JUSTE-RL the probability distribution of a regret over allpossible actions becomes the BoltzmannndashGibbs distribution(aka canonical ensemble) given by [22]

119866 120588119905 (119860 119905) = exp minus120588119905 (119860 119905) 119896119879119861sum119861isinA119880 exp minus120588119905 (119861) 119896119879119861 forall119860 119905 isin A119880

(14)

where k = 138064852 times 10minus23 JK is the Boltzmann constant119879119861 is the system temperature (in K) High temperatures make

all actions almost equiprobable and low temperatures resultin greedy action selection [22]

Using the above definitions the dynamics of a JUSTE-RLwith regret can be described as [23]

119906119905+1 (119861) = 119906119905 (119861) + 1205721199051119860119905=119861 (119906119905 minus 119906119905 (119861)) 120588119905+1 (119861) = 120588119905 (119861) + 120573119905 (119906119905 (119861) minus 119906119905 minus 120588119905 (119861)) 120587119905+1 (119861) = 120587119905 (119861) + 120574119905 (119866 120588119905 (119861) minus 120587119899 (119861))

(15)

for all 119861 isin A119880 where 120572119905 120573119905 and 120574119905 are the learning rates suchthat [23]

lim119905rarrinfin

119905sum120591=1

120572120591 = +infinlim119905rarrinfin

119905sum120591=1

1205722120591 lt +infinlim119905rarrinfin

119905sum120591=1

120573120591 = +infinlim119905rarrinfin

119905sum120591=1

1205732120591 lt +infinlim119905rarrinfin

119905sum120591=1

120574120591 = +infinlim119905rarrinfin

119905sum120591=1

1205742120591 lt +infin

8 Mobile Information Systems

Initialization(1) Input 120592120572 120592120573 120592120574 119879(2) For all 1198600 isin A119880 set 1199060(1198600) larr 0 1205880(1198600) larr 0 1205870(1198600) larr 0(3) For all 119899 isin N set 119903119880119899 (0) larr 0Main Loop(4)While (119905 le 119879) do(5) Select 119860 119905 larr argmax119861isinA119880(120587119905(119861)) and set a119880119905 larr 119860 119905(6) For all 119899 isin D set 119887119899(119905) = max0 1 minus sum119898isinM 119886119898119899 (119905)(7) For all 119899 isin C set 119887119899(119905) = 0(8) Execute 119860 119905 and observe 119903119880119899 (119905) for all 119899 isin N(9) Solve (12a)ndash(12e) to find an optimal 119906119905(119860 119905) larr 119906119905(10) For all 119861 isin A119880 update 119906119905(119861) 120588119905(119861) 120587119905(119861) using (15)

(11) End

Algorithm 1 JUSTE-RL with regret for unlicensed channel allocation RL algorithm for outband channel allocation

lim119905rarrinfin

120574119905120572119905 = 0lim119905rarrinfin

120573119905120574119905 = 0(16a)

Typically the learning rates are set equal [22]

120572119905 = 1(119905 + 1)120592120572 120573119905 = 1(119905 + 1)120592120573 120574119905 = 1(119905 + 1)120592120574

(16b)

where 120592120572 isin (05 1] 120592120573 isin (05 1] 120592120574 isin (05 1] and 120592120572 le120592120573 le 120592120574 The initializations 1199060(119860) ge 0 1205880(119860) ge 0 and1205870(119860) ge 0 should be sufficiently close to zero for all 119860 isin A119880The dynamics (15) converge to the 120576-Nash equilibrium Notethat a Nash equilibrium point for (15) is given by [23]

lim119905rarrinfin

120587119905 (119860) = 120587lowast (119860) forall119860 isin A119880 (17)

The corresponding learning algorithm for unlicensed chan-nel allocation is presented in Algorithm 1 (where T indicatesthe total simulation length in slots) Note that this algorithmconverges when 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880 Thecomplexity of JUSTE-RL with regret is mainly determinedby the size of an action set A119880 since at any slot t we have

to select an action 119860 119905 isin A119880 that maximizes 120587119905(119860 119905) (thedynamics in (15) require only algebraic operations and thusits computational complexity is negligible) Consequently theworst-case time complexity of Algorithm 1 is 119874(119899) where119899 = |A119880| le 119873 times 119872 is the size of our action set

33 Inband Resource Allocation Consider (12a)ndash(12e) thatrepresents a joint mode RB and power level allocationproblemThis problem has two binary optimization variablesa119871119905 and c119905 one real-valued variable p119905 nonlinear objective(12a) and nonlinear constraints (12c) and (12e) Hence itbelongs to a family of the mixed-integer nonlinear program-ming (MINLP) problems It has been well established inthe past (see eg [38]) that all MINLP problems involvingbinary variables (such as (12a)ndash(12e)) are NondeterministicPolynomial-time- (NP-) hard For immediate NP-hardnessproof for a considered problem note that given that a119871119905 canbe either 0 or 1 any feasible solution to (12a)ndash(12e) is a subsetof vertices The constraint (12d) also implies that at least oneend point of every edge is included in this subset Hence thesolution to this problem describes a vertex cover for whichfinding a minimum is NP-hard

Most of the MINLP solution techniques involve theconstruction of the following relaxations to the consideredproblem a nonlinear programming (NLP) relaxation (theoriginal problem without integer restrictions) and a mixed-integer linear-programming (MILP) relaxation (an originalproblem where the nonlinearities are replaced by supportinghyperplanes) To form the MILP and NLP relaxations to(12a)ndash(12e) let us first represent in equivalent form thefollowing

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (18a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(18b)

Mobile Information Systems 9

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (18c)

1198921119896 (a119871119905 c119905) = sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) minus 1 le 0 forall119896 isin K (18d)

1198922119899 (a119871119905 p119905) = SINR119899 (119905) minus SINRtar119899 le 0 forall119899 isin N (18e)

1198923119899119896 (a119871119905 p119905) = SINR119896119899 (119905) minus 2119889119896119899 + 1 le 0 forall119899 isin N forall119896 isin K (18f)

where objective (18a) and constraints (18b) and (18c) arelinear while constraints (18d)ndash(18f) are nonlinear The MILP

relaxation to (18a)ndash(18f) in a given point (a1198710119905 c0119905 p0119905 ) is given

by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (19a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(19b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (19c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (19d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (19e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (19f)

The NLP relaxation to (18a)ndash(18f) is given by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (20a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(20b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (20c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (20d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (20e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (20f)

10 Mobile Information Systems

where

A119871 = a119871119905 | 0 le 119886119896119899 (119905) le 1 forall119899 isin N forall119896 isin K (20g)

C

= c119905 | 0 le 119888119899 (119905) le 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (20h)

In general allMINLPproblems can be solved using eitherexact techniques (eg branch-and-bound [39]) or heuristicmethods (such as local branching [40] large neighborhoodsearch [41] and feasibility pump [42]) Since we are inter-ested in a reasonably simple and fast algorithm it is moreconvenient to use heuristics to solve (18a)ndash(18f) Amongnumerous heuristic techniques feasibility pump (FP) [43] isperhaps the most simple and effective method for producingmore and better solutions in a shorter average running

time (the local convergence properties of FP for nonconvexproblems have been proved in [44]) The fundamental ideaof an FP heuristic is to decompose the MINLP probleminto two parts integer feasibility and constraint feasibilityThe former is achieved by rounding (solving the MILPrelaxation to an original problem) the latter by projection(solving the NLP relaxation) The algorithm generates twosequences of integral and rounding pointsThe first sequenceof integral points (a119871119894119905 c119894119905 p119894119905)119868119894=1 119868 = 1 2 containsthe solutions that may violate the nonlinear constraints thesecond sequence (a119871119894119905 c119894119905 p119894119905)119868119894=1 comprises the roundingpoints that are feasible for the MILP relaxation but might notbe integral

Particularly with the input (a1198711119905 c1119905 p1119905 ) being a solution toan NLP relaxation (20a)ndash(20f) FP generates two sequencesby solving the following problems for 119894 = 1 119868

(a119871119894119905 c119894119905 p119894119905) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038171 (21a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(21b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (21c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (21d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (21e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (21f)

(a119871119894+1119905 c119894+1119905 p119894+1119905 ) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038172 (22a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(22b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (22c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (22d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (22e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (22f)

where sdot1 and sdot2 are l1-norm and l2-norm respectivelyTherounding is carried out by solving the problem (21a)ndash(21f)

and the projection is the solution to (22a)ndash(22f) Conse-quently an FP algorithmalternates between the rounding and

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

Mobile Information Systems 3

2 Network Model

In this paper the problem of resource allocation for D2Dcommunication is investigated for both the uplink (UL) anddownlink (DL) directions Similarly the discussion throughthe rest of the paper is applicable (if not stated otherwise) toeither direction Consider a basic LTE-A network consistingof one eNB and 119873 user pairs denoted PU1 PU119873 withN = 1 119873 being the set of user pairsrsquo indices It isassumed that a fixed licensed spectrum band of the eNBspans119870 resource blocks (RBs) numberedRB1 RB119870 withK = 1 119870 denoting the set of RBsrsquo indices comprisingthe bandwidth The network runs on a slotted-time basiswith the time axis partitioned into equal nonoverlapping timeintervals (slots) of the length 119879119904 with t denoting an integer-valued slot index Each pair of users can communicate witheach other either by the traditional cellular mode (CM) viathe eNB or in a D2D mode (DM) without traversing theeNB Let C sube N be the set of the indices of device pairsthat can operate only in CM and let D = N C denotethe set of the indices of potential D2D pairs (The indices inC and D can be determined based on eg user application(such as video sharing gaming and proximity-aware socialnetworking) in which the pair of devices could potentiallybe in range for the direct communication Such informationcan be acquired from a standard session initiation protocol(SIP) procedure (which handles the session setups and usersarrivals in LTE networks) Interested readers are referred to[31] for a comprehensive description of an SIP procedure andits use in the D2D access)

In our network any potential D2D pair can be allocatedwith cellular or D2D mode (based on the results of resourceallocation procedure) Consequently we define a binarymode allocation variable 119888119899(119905) 119899 isin N equaling 1 if PUn isallocated CM at slot t and 0 otherwise Note that 119888119899(119905) = 1for all 119899 isin C Further we consider the following models ofD2D access

(i) Inband D2D a D2D pair operates within the licensedLTE spectrum in an underlay to cellular communica-tion

(ii) Outband D2D a D2D pair transmits over the unli-censed band by exploiting other RATs such as Wi-Fi Direct [11] ZigBee [12] or Bluetooth [13] (It isassumed that all user devices are equipped withthe corresponding wireless interfaces to be able tocommunicate using a suitable RAT) We assume thatthere is no coordination andor information exchangebetween different wireless interfaces

To differentiate the pairs according to their D2D access wedefine a binary channel access variable 119887119899(119905) 119899 isin N equaling1 if PUn operates inband at slot t and 0 otherwise Note thatall cellular users can access only the LTEbandsHence 119887119899(119905) =1 for all 119899 isin C

21 Inband Network Operation In LTELTE-A RBs areallocated to cellular users by the eNBs using a standard packetscheduling procedure [32] The use of packet scheduling in aD2D-enabled LTE-A network is described in detail in [33]

In short a packet scheduling process can be explained asfollows In the UL direction at the beginning of any slott each user is required to collect and transmit its bufferstatus information After collecting this data a user sendsthe scheduling request (SR) with its buffer status informationto the eNB via a dedicated physical uplink control channel(PUCCH) After receiving all the SRs the eNB allocates theRBs to the users (according to a certain scheduling algorithm)and responds to all the SRs by sending the schedulinggrants (SGs) together with the allocation information to thecorresponding users via dedicated physical downlink controlchannels (PDCCHs) [33] In theDL the eNB readily finds outtheDL buffer status for each user allocates the RBs and sendsthe SGs with allocation information via PDCCHs [33] In theframework used in this paper the above scheduling processis applied for both the cellular and D2D communication withsome modifications (the corresponding resource allocationprocedure will be described in Section 4)

Let us further define a binary RB allocation variable 119886119896119899(119905)119899 isin N 119896 isin K equaling 1 if PUn is allocated with RBk at slott and 0 otherwise Each RB can be allocated to at most onecellular user Hence

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (1a)

The number of D2D users operating on the same RBs isunlimited Additionally to maximize the network utilizationwe enforce each RB to be allocated to at least one user Thatis

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (1b)

Note that both the OFDMA used for DL transmissionsand single carrier frequency division multiple access (SC-FDMA) applied in the UL direction provide orthogonality ofresource allocation to cellular communications This allowsachieving a minimal level of cochannel interference betweenthe transmitter-receiver pairs located within one cell [34]Thus when information is transmitted by cellularD2D userit will be distorted only by the users operating on the sameRB(s)

Let 119866119896119899119898 119899 isin N 119898 isin N and 119896 isin K denote the channelgain coefficient between the transmitter and receiver of PUnand PUm operating on RBk (for 119899 isin C 119866119896119899119899 indicates thechannel gain coefficient between PUn operating on RBk andthe eNB) In LTE system the instantaneous values of 119866119896119899119898can be obtained from the channel state information (CSI)through the use of special reference signals (RSs) [35] andhence they are known to the eNB and the usersThen for anyPUn operating on RBk the SINR at slot t in the UL directionis described by

SINR119896119899 (119905) = 119886119896119899 (119905) 119901119899 (119905) 119866119896119899119899sum119895isinN119899 119886119896119899 (119905) 119886119896119895 (119905) 119901119899 (119905) 119866119896119895119899 + 1198730 forall119899 isin N forall119896 isin K

(2)

where 1198730 is the variance of zero-mean additive white Gaus-sian noise (AWGN) power and 119901119899(119905) is the transmission

4 Mobile Information Systems

power allocated to PUn at slot t Clearly 119901119899(119905) is nonnegativeand cannot exceed some predefined maximal level 119875max

119899 thatis

0 le 119901119899 (119905) le 119875max119899 forall119899 isin N (3)

At any t the inband service rate of PUn depends on thenumber of RBs allocated to this device pair and the SINR ineach RB That is

119903119871119899 (119905) = 120596119871 sum119896isinK

119886119896119899 (119905) log (1 + SINR119896119899 (119905)) forall119899 isin N (4)

where 119903119871119899 (119905) is the service rate of PUn (in bits per slot or bps)over licensed (inband) spectrum and 120596119871 is the bandwidth ofone LTE RB (120596 = 180 kHz)

22 Outband Network Operation We consider M separateoutband wireless channels numbered for notation consis-tency as 119862119870+1 119862119870+119872 (In this paper we consider ageneral scenario when the unlicensed outband access can bebased on OFDMA CSMACA (in case of Wi-Fi Direct) FH-CDMA (in case of Bluetooth) or any other multiple accessmethod) We denote by M = 119870 + 1 119870 + 119872 the set ofchannel indices within the unlicensed band and use a binarychannel allocation variable 119886119898119899 (119905) 119899 isin N 119898 isin M to indicateif PUn is allocated with the unlicensed channel 119862119898 (in whichcase 119886119898119899 (119905) = 1) or not (119886119898119899 (119905) = 0) Note that 119886119898119899 (119905) = 0 forall 119899 isin C and 119898 isin M (since cellular users can access only theLTE bands) For 119899 isin D 119887119899(119905) equals 0 ifsum119898isinM 119886119898119899 (119905) ge 1 and1 otherwise (ie if sum119898isinM 119886119898119899 (119905) = 0) Hence

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (5a)

To avoid collisions the D2D pairs use a CSMACA methodwhen operating outband As a result each unlicensed channel119862119898 is available to D2D communication only when it is idleAdditionally to reduce the possibility of collisions betweenD2D users we assume that at any slot t at most one devicepair can transmit over each unlicensed channel 119862119898 That is

sum119899isinD

119886119898119899 (119905) le 1sum119899isinD

(1 minus 119887119899 (119905)) le 119872forall119898 isin M

(5b)

The transmission procedure for the pair of D2D usersoperating outband is described as follows At the beginning ofslot t one of the users starts sensing the allocated unlicensedchannel 119862119898 (for simplicity we assume perfect sensing) If thechannel is free the transmission phase (of the length119879119898119903 suchthat 0 le 119879119898119903 le 119879119904) begins Note that the duration of 119879119898119903is random It depends on the availability of the channel 119862119898and the applied CSMACA scheme The probability densityfunction (pdf) of 119879119898119903 is not calculated here (since it has noimpact on the further analysis in this paper) An example ofsuch calculations can be found in [36]

Let119866119898119899119899 for all119898 isin M denote the channel gain coefficientbetween the transmitter and receiver of PUn operating onunlicensed channel 119862119898 Then the SINR of PUn transmittingover the channel 119862119898 at slot t can be expressed by

SINR119898119899 (119905) = 119886119898119899 (119905) 119901119899 (119905) 1198661198981198991198991198730 forall119899 isin D forall119898 isin M (6a)

and the service rate of PUn over unlicensed (outband)spectrum is described by

119903119880119899 (119905) = sum119898isinM

119879119898119903 120596119880119898119879119904 119886119898119899 (119905) log (1 + SINR119898119899 (119905)) forall119899 isin D

(6b)

where 120596119880119898 is the bandwidth (in Hz) of unlicensed channel119862119898 Note that neither the eNB nor D2D users have priorinformation about quality and availability of unlicensedchannels Therefore the exact values of 119866119898119899119899 and 119879119898119903 areunknown to the eNB and the D2D users

3 Resource Allocation Problem

31 Problem Statement We define a binary 119873 times 119870-dimensional RB allocation matrix a119871 and a binary 119873 times 119872-dimensional unlicensed channel allocation matrix a119880 as

a119871119905 = [[[[[

11988611 (119905) sdot sdot sdot 1198861198701 (119905) 1198861119873 (119905) sdot sdot sdot 119886119870119873 (119905)

]]]]]

a119880119905 = [[[[[

119886119870+11 (119905) sdot sdot sdot 119886119870+1198721 (119905) 119886119870+1119873 (119905) sdot sdot sdot 119886119870+119872119873 (119905)

]]]]]

(7)

respectively We also define a binary N-dimensional D2Daccess allocation vector b119905 = (1198871(119905) 119887119873(119905)) a binary N-dimensional mode allocation vector c119905 = (1198881(119905) 119888119873(119905))and a real-valued N-dimensional power allocation vectorp119905 = (1199011(119905) 119901119872(119905)) Then the sets of all admissible valuesfor a119871119905 a

119880119905 b119905 c119905 and p119905 are described by

A119871 = a119871119905 | 119886119896119899 (119905) isin 0 1 forall119899 isin N forall119896 isin K (8a)

A119880 = a119880119905 | 119886119898119899 (119905) isin 0 1 119886119898119894 (119905) = 0 sum119895isinD

119886119898119895 (119905)le 1 forall119899 isin D forall119894 isin C forall119898 isin M

(8b)

B = b119905 | 119887119899 (119905) isin 0 1 119887119894 (119905) = 1 sum119895isinD

(1 minus 119887119895 (119905))

le 119872 forall119899 isin D forall119894 isin C

(8c)

Mobile Information Systems 5

eNB

D2D link

Cellular link

PU1

PU2

PU3 PU4

PU5PU6PU7

PU8

PU1

PU2

PU3

PU4

PU5

PU7

PU8

RB1 RB2 RB3 RB4 RB5 RB6 C7 C8 C9

aLt =

((((((((((((((

1 1 0 0 0 0

0 1 1 0 0 0

0 0 0 1 1 0

0 0 0 0 0 1

0 0 1 1 0 0

0 0 0 0 0 0

0 0 0 0 1 1

0 0 0 0 0 0

))))))))))))))

aUt =

((((((((((((((

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

1 1 0

0 0 0

0 0 1

))))))))))))))

bt =

((((((((((((((

1

1

1

1

1

0

1

0

))))))))))))))

ct =

((((((((((((((

0

1

0

0

1

0

1

0

))))))))))))))

Figure 1 A D2D-enabled cellular network with three cellular pairs (PU2 PU5 and PU7) and five D2D pairs (PU1 PU3 PU4 PU6 and PU8)Three of the D2D pairs (PU1 PU3 and PU4) use inband access and two D2D pairs (PU6 PU8) are allocated with the unlicensed channelsIn this example different cellular and D2D pairs interfere with each other when transmitting over RB2 (where PU2 interferes with PU1) RB3(PU1 interferes with PU5) RB4 (PU5 interferes with PU3) RB5 (PU7 interferes with PU3) and RB6 (PU4 interferes with PU7)

C = c119905 | 119888119899 (119905) isin 0 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (8d)

P = p119905 | 0 le 119901119899 (119905) le 119875max119899 forall119899 isin N (8e)

Example of a D2D-enabled network with all defined opti-mization variables is shown in Figure 1

Ideally at any slot t the eNB should distribute the net-work resources among the users tomaximize their aggregatedservice rate That is to maximize the sum

sum119899isinN

119903119899 (119905) = sum119899isinN

(119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905)) (9)

where 119903119899(119905) represents the service rate of PUn (operatingeither inband or outband) However when communicating

6 Mobile Information Systems

over the unlicensed spectrum eachD2Dpair should transmitat a maximal power level to achieve the high SINR regime(and consequently service rate) which in turn results inincreased power consumption of mobile terminals There-fore when formulating the utility of each device pair weshould also consider the cost of power consumption toquantify the trade-off between the achieved rate and powerlevel (as in [37]) Accordingly we can define a utility 119906119899(119905)

of PUn at slot 119905 as the difference between its instantaneousservice rate 119903119899(119905) and the cost of power consumption

119906119899 (119905) = 119903119899 (119905) minus 120592119899119901119899 (119905)= 119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905) (10)

where 120592119899 ge 0 is the cost per unit (W) level of power for PUnUsing the above definition we can express our resource

allocation problem as follows

maximize sum119899isinN 119906119899 (119905) = sum119899isinN [119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (11a)

subject to a119871119905 isin A119871a119880119905 isin A119880b119905 isin Bc119905 isin Cp119905 isin P

(11b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (11c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (11d)

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (11e)

SINR119896119899 (119905) le SINRtar119899 forall119899 isin N forall119896 isin K cup M (11f)

where the constraint (11f) is necessary to protect the usersfrom heavy interference (here SINRtar

119899 stands for theminimalSINR level acceptable by PUn) Note that information onthe sets C and D is readily available at the eNB The valuesof 119866119896119899119898 for 119899 isin N 119898 isin N and 119896 isin K are obtained bythe eNB from the CSI carried by the RSs The only missinginformation is related to 119903119880119899 (119905) that depends on the parameters119879119898119903 (representing the availability of the unlicensed channel119862119896 in our model) and 119866119898119899119899 (which defines the quality ofunlicensed channel 119862119896) for all 119898 isin M The latter parameteris determined by the unlicensed channel allocations andhence the eNB can adapt to the changes of 119866119898119899119899 in time andspace Since there is no coordination (and no informationexchange) between the LTE and outband RAT interfacessolving (11a)ndash(11f) to optimality might be impossible whichis a rather strong argument in favor of applying a well-knownreinforcement learning (RL) for resource allocation

The main idea behind RL is that the actions (unlicensedchannel allocations) leading to the higher network utility atslot 119905 should be granted with higher probabilities at slot 119905 + 1and vice versa [22] In the simplest form of RL (presentedin [24]) the learning agent estimates its possible strategiesbased on the locally observed utility without any prior infor-mation about the operating environment This form of RL

requires only algebraic operations but does not guarantee theconvergence to an equilibrium [25] In Q-learning [22] theagentrsquos utility is estimated using some value-action functionGiven the certain (easy to follow) conditions this algorithmconverges (with probability 1) to an NE state However itrequires maximization of the action-value at every slot t(which can be computationally demanding depending onthe structure of a chosen value-action function) [20] InJUSTE-RL algorithm [23] the learning agent estimates notonly its own strategy but also the expected utility for allof its actions Unlike Q-learning JUSTE-RL does not needto perform optimization of the action-value (since onlyalgebraic operations are required to update the strategies)and hence it has a lower computational complexity On theother hand compared to a basic RL algorithm JUSTE-RLconverges to a 120576-NE [23 25] We now show how a JUSTE-RLwith regret can be applied to our problem

32 Unlicensed Channel Allocation To apply JUSTE-RL withregret to our problem we represent it as a game with oneplayer (the eNB) having no information about the operatingenvironment A finite set of the eNBrsquos actions A119880 representsthe set of all admissible unlicensed channel allocation deci-sions The objective of the eNB is to select at any slot t an

Mobile Information Systems 7

action 119860 119905 = a119880119905 isin A119880 to maximize the eNBrsquos utility 119906119905 =sum119899isinN 119906119899(119905) In the following we use notation 119886119898119899 (119905) forall119898 isin Mto specify the eNBrsquos decision regarding the allocation of anunlicensed channel 119862119896 to a pair PUn and a119880119905 to describe allunlicensed channel allocations by the eNB when selectinga particular action 119860 119905 at slot t We also use b119905 to denote

the D2D access allocation vector and 119903119880119899 (119905) to indicate theoutband service rate achieved by playing the action 119860 119905 Aftertaking an action 119860 119905 = a119880119905 at slot t the eNB observes the(random) service rate 119903119880119899 (119905) and estimates the network utility119906119905 = 119906119905(119860 119905) by solving the following problem

maximize 119906119905 = sum119899isinN

[119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (12a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(12b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (12c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (12d)

SINR119899 (119905) le SINRtar119899 forall119899 isin N (12e)

where

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (12f)

and 119887119899(119905) = 0 forall119899 isin C Note that unlike problem (11a)ndash(11f)problem (12a)ndash(12e) can be solved to optimality (since 119903119880119899 (119905) isknown) It has three optimization variables a119871119905 c119905 and p119905 andhence its complexity is lower than the complexity of (11a)ndash(11f) (the method for solving (12a)ndash(12e) is presented in thenext subsection)

We also define a mixed-strategy probability 120587119905 of playingan action 119860 119905 = a119880119905 at slot t as

120587119905 (119860 119905) = 120587119905 (119861)119861isinA119880 120587119905 (119861) = Pr 119860 119905 = 119861

forall119860 119905 isin A119880(13a)

and a regret 120588119905(119860 119905) for not playing this action at slot t as

120588119905 (119860 119905) = max 0 119906119905minus1 minus 119906119905 forall119860 119905 isin A119880 (13b)

In JUSTE-RL the probability distribution of a regret over allpossible actions becomes the BoltzmannndashGibbs distribution(aka canonical ensemble) given by [22]

119866 120588119905 (119860 119905) = exp minus120588119905 (119860 119905) 119896119879119861sum119861isinA119880 exp minus120588119905 (119861) 119896119879119861 forall119860 119905 isin A119880

(14)

where k = 138064852 times 10minus23 JK is the Boltzmann constant119879119861 is the system temperature (in K) High temperatures make

all actions almost equiprobable and low temperatures resultin greedy action selection [22]

Using the above definitions the dynamics of a JUSTE-RLwith regret can be described as [23]

119906119905+1 (119861) = 119906119905 (119861) + 1205721199051119860119905=119861 (119906119905 minus 119906119905 (119861)) 120588119905+1 (119861) = 120588119905 (119861) + 120573119905 (119906119905 (119861) minus 119906119905 minus 120588119905 (119861)) 120587119905+1 (119861) = 120587119905 (119861) + 120574119905 (119866 120588119905 (119861) minus 120587119899 (119861))

(15)

for all 119861 isin A119880 where 120572119905 120573119905 and 120574119905 are the learning rates suchthat [23]

lim119905rarrinfin

119905sum120591=1

120572120591 = +infinlim119905rarrinfin

119905sum120591=1

1205722120591 lt +infinlim119905rarrinfin

119905sum120591=1

120573120591 = +infinlim119905rarrinfin

119905sum120591=1

1205732120591 lt +infinlim119905rarrinfin

119905sum120591=1

120574120591 = +infinlim119905rarrinfin

119905sum120591=1

1205742120591 lt +infin

8 Mobile Information Systems

Initialization(1) Input 120592120572 120592120573 120592120574 119879(2) For all 1198600 isin A119880 set 1199060(1198600) larr 0 1205880(1198600) larr 0 1205870(1198600) larr 0(3) For all 119899 isin N set 119903119880119899 (0) larr 0Main Loop(4)While (119905 le 119879) do(5) Select 119860 119905 larr argmax119861isinA119880(120587119905(119861)) and set a119880119905 larr 119860 119905(6) For all 119899 isin D set 119887119899(119905) = max0 1 minus sum119898isinM 119886119898119899 (119905)(7) For all 119899 isin C set 119887119899(119905) = 0(8) Execute 119860 119905 and observe 119903119880119899 (119905) for all 119899 isin N(9) Solve (12a)ndash(12e) to find an optimal 119906119905(119860 119905) larr 119906119905(10) For all 119861 isin A119880 update 119906119905(119861) 120588119905(119861) 120587119905(119861) using (15)

(11) End

Algorithm 1 JUSTE-RL with regret for unlicensed channel allocation RL algorithm for outband channel allocation

lim119905rarrinfin

120574119905120572119905 = 0lim119905rarrinfin

120573119905120574119905 = 0(16a)

Typically the learning rates are set equal [22]

120572119905 = 1(119905 + 1)120592120572 120573119905 = 1(119905 + 1)120592120573 120574119905 = 1(119905 + 1)120592120574

(16b)

where 120592120572 isin (05 1] 120592120573 isin (05 1] 120592120574 isin (05 1] and 120592120572 le120592120573 le 120592120574 The initializations 1199060(119860) ge 0 1205880(119860) ge 0 and1205870(119860) ge 0 should be sufficiently close to zero for all 119860 isin A119880The dynamics (15) converge to the 120576-Nash equilibrium Notethat a Nash equilibrium point for (15) is given by [23]

lim119905rarrinfin

120587119905 (119860) = 120587lowast (119860) forall119860 isin A119880 (17)

The corresponding learning algorithm for unlicensed chan-nel allocation is presented in Algorithm 1 (where T indicatesthe total simulation length in slots) Note that this algorithmconverges when 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880 Thecomplexity of JUSTE-RL with regret is mainly determinedby the size of an action set A119880 since at any slot t we have

to select an action 119860 119905 isin A119880 that maximizes 120587119905(119860 119905) (thedynamics in (15) require only algebraic operations and thusits computational complexity is negligible) Consequently theworst-case time complexity of Algorithm 1 is 119874(119899) where119899 = |A119880| le 119873 times 119872 is the size of our action set

33 Inband Resource Allocation Consider (12a)ndash(12e) thatrepresents a joint mode RB and power level allocationproblemThis problem has two binary optimization variablesa119871119905 and c119905 one real-valued variable p119905 nonlinear objective(12a) and nonlinear constraints (12c) and (12e) Hence itbelongs to a family of the mixed-integer nonlinear program-ming (MINLP) problems It has been well established inthe past (see eg [38]) that all MINLP problems involvingbinary variables (such as (12a)ndash(12e)) are NondeterministicPolynomial-time- (NP-) hard For immediate NP-hardnessproof for a considered problem note that given that a119871119905 canbe either 0 or 1 any feasible solution to (12a)ndash(12e) is a subsetof vertices The constraint (12d) also implies that at least oneend point of every edge is included in this subset Hence thesolution to this problem describes a vertex cover for whichfinding a minimum is NP-hard

Most of the MINLP solution techniques involve theconstruction of the following relaxations to the consideredproblem a nonlinear programming (NLP) relaxation (theoriginal problem without integer restrictions) and a mixed-integer linear-programming (MILP) relaxation (an originalproblem where the nonlinearities are replaced by supportinghyperplanes) To form the MILP and NLP relaxations to(12a)ndash(12e) let us first represent in equivalent form thefollowing

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (18a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(18b)

Mobile Information Systems 9

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (18c)

1198921119896 (a119871119905 c119905) = sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) minus 1 le 0 forall119896 isin K (18d)

1198922119899 (a119871119905 p119905) = SINR119899 (119905) minus SINRtar119899 le 0 forall119899 isin N (18e)

1198923119899119896 (a119871119905 p119905) = SINR119896119899 (119905) minus 2119889119896119899 + 1 le 0 forall119899 isin N forall119896 isin K (18f)

where objective (18a) and constraints (18b) and (18c) arelinear while constraints (18d)ndash(18f) are nonlinear The MILP

relaxation to (18a)ndash(18f) in a given point (a1198710119905 c0119905 p0119905 ) is given

by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (19a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(19b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (19c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (19d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (19e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (19f)

The NLP relaxation to (18a)ndash(18f) is given by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (20a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(20b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (20c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (20d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (20e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (20f)

10 Mobile Information Systems

where

A119871 = a119871119905 | 0 le 119886119896119899 (119905) le 1 forall119899 isin N forall119896 isin K (20g)

C

= c119905 | 0 le 119888119899 (119905) le 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (20h)

In general allMINLPproblems can be solved using eitherexact techniques (eg branch-and-bound [39]) or heuristicmethods (such as local branching [40] large neighborhoodsearch [41] and feasibility pump [42]) Since we are inter-ested in a reasonably simple and fast algorithm it is moreconvenient to use heuristics to solve (18a)ndash(18f) Amongnumerous heuristic techniques feasibility pump (FP) [43] isperhaps the most simple and effective method for producingmore and better solutions in a shorter average running

time (the local convergence properties of FP for nonconvexproblems have been proved in [44]) The fundamental ideaof an FP heuristic is to decompose the MINLP probleminto two parts integer feasibility and constraint feasibilityThe former is achieved by rounding (solving the MILPrelaxation to an original problem) the latter by projection(solving the NLP relaxation) The algorithm generates twosequences of integral and rounding pointsThe first sequenceof integral points (a119871119894119905 c119894119905 p119894119905)119868119894=1 119868 = 1 2 containsthe solutions that may violate the nonlinear constraints thesecond sequence (a119871119894119905 c119894119905 p119894119905)119868119894=1 comprises the roundingpoints that are feasible for the MILP relaxation but might notbe integral

Particularly with the input (a1198711119905 c1119905 p1119905 ) being a solution toan NLP relaxation (20a)ndash(20f) FP generates two sequencesby solving the following problems for 119894 = 1 119868

(a119871119894119905 c119894119905 p119894119905) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038171 (21a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(21b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (21c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (21d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (21e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (21f)

(a119871119894+1119905 c119894+1119905 p119894+1119905 ) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038172 (22a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(22b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (22c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (22d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (22e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (22f)

where sdot1 and sdot2 are l1-norm and l2-norm respectivelyTherounding is carried out by solving the problem (21a)ndash(21f)

and the projection is the solution to (22a)ndash(22f) Conse-quently an FP algorithmalternates between the rounding and

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

4 Mobile Information Systems

power allocated to PUn at slot t Clearly 119901119899(119905) is nonnegativeand cannot exceed some predefined maximal level 119875max

119899 thatis

0 le 119901119899 (119905) le 119875max119899 forall119899 isin N (3)

At any t the inband service rate of PUn depends on thenumber of RBs allocated to this device pair and the SINR ineach RB That is

119903119871119899 (119905) = 120596119871 sum119896isinK

119886119896119899 (119905) log (1 + SINR119896119899 (119905)) forall119899 isin N (4)

where 119903119871119899 (119905) is the service rate of PUn (in bits per slot or bps)over licensed (inband) spectrum and 120596119871 is the bandwidth ofone LTE RB (120596 = 180 kHz)

22 Outband Network Operation We consider M separateoutband wireless channels numbered for notation consis-tency as 119862119870+1 119862119870+119872 (In this paper we consider ageneral scenario when the unlicensed outband access can bebased on OFDMA CSMACA (in case of Wi-Fi Direct) FH-CDMA (in case of Bluetooth) or any other multiple accessmethod) We denote by M = 119870 + 1 119870 + 119872 the set ofchannel indices within the unlicensed band and use a binarychannel allocation variable 119886119898119899 (119905) 119899 isin N 119898 isin M to indicateif PUn is allocated with the unlicensed channel 119862119898 (in whichcase 119886119898119899 (119905) = 1) or not (119886119898119899 (119905) = 0) Note that 119886119898119899 (119905) = 0 forall 119899 isin C and 119898 isin M (since cellular users can access only theLTE bands) For 119899 isin D 119887119899(119905) equals 0 ifsum119898isinM 119886119898119899 (119905) ge 1 and1 otherwise (ie if sum119898isinM 119886119898119899 (119905) = 0) Hence

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (5a)

To avoid collisions the D2D pairs use a CSMACA methodwhen operating outband As a result each unlicensed channel119862119898 is available to D2D communication only when it is idleAdditionally to reduce the possibility of collisions betweenD2D users we assume that at any slot t at most one devicepair can transmit over each unlicensed channel 119862119898 That is

sum119899isinD

119886119898119899 (119905) le 1sum119899isinD

(1 minus 119887119899 (119905)) le 119872forall119898 isin M

(5b)

The transmission procedure for the pair of D2D usersoperating outband is described as follows At the beginning ofslot t one of the users starts sensing the allocated unlicensedchannel 119862119898 (for simplicity we assume perfect sensing) If thechannel is free the transmission phase (of the length119879119898119903 suchthat 0 le 119879119898119903 le 119879119904) begins Note that the duration of 119879119898119903is random It depends on the availability of the channel 119862119898and the applied CSMACA scheme The probability densityfunction (pdf) of 119879119898119903 is not calculated here (since it has noimpact on the further analysis in this paper) An example ofsuch calculations can be found in [36]

Let119866119898119899119899 for all119898 isin M denote the channel gain coefficientbetween the transmitter and receiver of PUn operating onunlicensed channel 119862119898 Then the SINR of PUn transmittingover the channel 119862119898 at slot t can be expressed by

SINR119898119899 (119905) = 119886119898119899 (119905) 119901119899 (119905) 1198661198981198991198991198730 forall119899 isin D forall119898 isin M (6a)

and the service rate of PUn over unlicensed (outband)spectrum is described by

119903119880119899 (119905) = sum119898isinM

119879119898119903 120596119880119898119879119904 119886119898119899 (119905) log (1 + SINR119898119899 (119905)) forall119899 isin D

(6b)

where 120596119880119898 is the bandwidth (in Hz) of unlicensed channel119862119898 Note that neither the eNB nor D2D users have priorinformation about quality and availability of unlicensedchannels Therefore the exact values of 119866119898119899119899 and 119879119898119903 areunknown to the eNB and the D2D users

3 Resource Allocation Problem

31 Problem Statement We define a binary 119873 times 119870-dimensional RB allocation matrix a119871 and a binary 119873 times 119872-dimensional unlicensed channel allocation matrix a119880 as

a119871119905 = [[[[[

11988611 (119905) sdot sdot sdot 1198861198701 (119905) 1198861119873 (119905) sdot sdot sdot 119886119870119873 (119905)

]]]]]

a119880119905 = [[[[[

119886119870+11 (119905) sdot sdot sdot 119886119870+1198721 (119905) 119886119870+1119873 (119905) sdot sdot sdot 119886119870+119872119873 (119905)

]]]]]

(7)

respectively We also define a binary N-dimensional D2Daccess allocation vector b119905 = (1198871(119905) 119887119873(119905)) a binary N-dimensional mode allocation vector c119905 = (1198881(119905) 119888119873(119905))and a real-valued N-dimensional power allocation vectorp119905 = (1199011(119905) 119901119872(119905)) Then the sets of all admissible valuesfor a119871119905 a

119880119905 b119905 c119905 and p119905 are described by

A119871 = a119871119905 | 119886119896119899 (119905) isin 0 1 forall119899 isin N forall119896 isin K (8a)

A119880 = a119880119905 | 119886119898119899 (119905) isin 0 1 119886119898119894 (119905) = 0 sum119895isinD

119886119898119895 (119905)le 1 forall119899 isin D forall119894 isin C forall119898 isin M

(8b)

B = b119905 | 119887119899 (119905) isin 0 1 119887119894 (119905) = 1 sum119895isinD

(1 minus 119887119895 (119905))

le 119872 forall119899 isin D forall119894 isin C

(8c)

Mobile Information Systems 5

eNB

D2D link

Cellular link

PU1

PU2

PU3 PU4

PU5PU6PU7

PU8

PU1

PU2

PU3

PU4

PU5

PU7

PU8

RB1 RB2 RB3 RB4 RB5 RB6 C7 C8 C9

aLt =

((((((((((((((

1 1 0 0 0 0

0 1 1 0 0 0

0 0 0 1 1 0

0 0 0 0 0 1

0 0 1 1 0 0

0 0 0 0 0 0

0 0 0 0 1 1

0 0 0 0 0 0

))))))))))))))

aUt =

((((((((((((((

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

1 1 0

0 0 0

0 0 1

))))))))))))))

bt =

((((((((((((((

1

1

1

1

1

0

1

0

))))))))))))))

ct =

((((((((((((((

0

1

0

0

1

0

1

0

))))))))))))))

Figure 1 A D2D-enabled cellular network with three cellular pairs (PU2 PU5 and PU7) and five D2D pairs (PU1 PU3 PU4 PU6 and PU8)Three of the D2D pairs (PU1 PU3 and PU4) use inband access and two D2D pairs (PU6 PU8) are allocated with the unlicensed channelsIn this example different cellular and D2D pairs interfere with each other when transmitting over RB2 (where PU2 interferes with PU1) RB3(PU1 interferes with PU5) RB4 (PU5 interferes with PU3) RB5 (PU7 interferes with PU3) and RB6 (PU4 interferes with PU7)

C = c119905 | 119888119899 (119905) isin 0 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (8d)

P = p119905 | 0 le 119901119899 (119905) le 119875max119899 forall119899 isin N (8e)

Example of a D2D-enabled network with all defined opti-mization variables is shown in Figure 1

Ideally at any slot t the eNB should distribute the net-work resources among the users tomaximize their aggregatedservice rate That is to maximize the sum

sum119899isinN

119903119899 (119905) = sum119899isinN

(119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905)) (9)

where 119903119899(119905) represents the service rate of PUn (operatingeither inband or outband) However when communicating

6 Mobile Information Systems

over the unlicensed spectrum eachD2Dpair should transmitat a maximal power level to achieve the high SINR regime(and consequently service rate) which in turn results inincreased power consumption of mobile terminals There-fore when formulating the utility of each device pair weshould also consider the cost of power consumption toquantify the trade-off between the achieved rate and powerlevel (as in [37]) Accordingly we can define a utility 119906119899(119905)

of PUn at slot 119905 as the difference between its instantaneousservice rate 119903119899(119905) and the cost of power consumption

119906119899 (119905) = 119903119899 (119905) minus 120592119899119901119899 (119905)= 119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905) (10)

where 120592119899 ge 0 is the cost per unit (W) level of power for PUnUsing the above definition we can express our resource

allocation problem as follows

maximize sum119899isinN 119906119899 (119905) = sum119899isinN [119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (11a)

subject to a119871119905 isin A119871a119880119905 isin A119880b119905 isin Bc119905 isin Cp119905 isin P

(11b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (11c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (11d)

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (11e)

SINR119896119899 (119905) le SINRtar119899 forall119899 isin N forall119896 isin K cup M (11f)

where the constraint (11f) is necessary to protect the usersfrom heavy interference (here SINRtar

119899 stands for theminimalSINR level acceptable by PUn) Note that information onthe sets C and D is readily available at the eNB The valuesof 119866119896119899119898 for 119899 isin N 119898 isin N and 119896 isin K are obtained bythe eNB from the CSI carried by the RSs The only missinginformation is related to 119903119880119899 (119905) that depends on the parameters119879119898119903 (representing the availability of the unlicensed channel119862119896 in our model) and 119866119898119899119899 (which defines the quality ofunlicensed channel 119862119896) for all 119898 isin M The latter parameteris determined by the unlicensed channel allocations andhence the eNB can adapt to the changes of 119866119898119899119899 in time andspace Since there is no coordination (and no informationexchange) between the LTE and outband RAT interfacessolving (11a)ndash(11f) to optimality might be impossible whichis a rather strong argument in favor of applying a well-knownreinforcement learning (RL) for resource allocation

The main idea behind RL is that the actions (unlicensedchannel allocations) leading to the higher network utility atslot 119905 should be granted with higher probabilities at slot 119905 + 1and vice versa [22] In the simplest form of RL (presentedin [24]) the learning agent estimates its possible strategiesbased on the locally observed utility without any prior infor-mation about the operating environment This form of RL

requires only algebraic operations but does not guarantee theconvergence to an equilibrium [25] In Q-learning [22] theagentrsquos utility is estimated using some value-action functionGiven the certain (easy to follow) conditions this algorithmconverges (with probability 1) to an NE state However itrequires maximization of the action-value at every slot t(which can be computationally demanding depending onthe structure of a chosen value-action function) [20] InJUSTE-RL algorithm [23] the learning agent estimates notonly its own strategy but also the expected utility for allof its actions Unlike Q-learning JUSTE-RL does not needto perform optimization of the action-value (since onlyalgebraic operations are required to update the strategies)and hence it has a lower computational complexity On theother hand compared to a basic RL algorithm JUSTE-RLconverges to a 120576-NE [23 25] We now show how a JUSTE-RLwith regret can be applied to our problem

32 Unlicensed Channel Allocation To apply JUSTE-RL withregret to our problem we represent it as a game with oneplayer (the eNB) having no information about the operatingenvironment A finite set of the eNBrsquos actions A119880 representsthe set of all admissible unlicensed channel allocation deci-sions The objective of the eNB is to select at any slot t an

Mobile Information Systems 7

action 119860 119905 = a119880119905 isin A119880 to maximize the eNBrsquos utility 119906119905 =sum119899isinN 119906119899(119905) In the following we use notation 119886119898119899 (119905) forall119898 isin Mto specify the eNBrsquos decision regarding the allocation of anunlicensed channel 119862119896 to a pair PUn and a119880119905 to describe allunlicensed channel allocations by the eNB when selectinga particular action 119860 119905 at slot t We also use b119905 to denote

the D2D access allocation vector and 119903119880119899 (119905) to indicate theoutband service rate achieved by playing the action 119860 119905 Aftertaking an action 119860 119905 = a119880119905 at slot t the eNB observes the(random) service rate 119903119880119899 (119905) and estimates the network utility119906119905 = 119906119905(119860 119905) by solving the following problem

maximize 119906119905 = sum119899isinN

[119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (12a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(12b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (12c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (12d)

SINR119899 (119905) le SINRtar119899 forall119899 isin N (12e)

where

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (12f)

and 119887119899(119905) = 0 forall119899 isin C Note that unlike problem (11a)ndash(11f)problem (12a)ndash(12e) can be solved to optimality (since 119903119880119899 (119905) isknown) It has three optimization variables a119871119905 c119905 and p119905 andhence its complexity is lower than the complexity of (11a)ndash(11f) (the method for solving (12a)ndash(12e) is presented in thenext subsection)

We also define a mixed-strategy probability 120587119905 of playingan action 119860 119905 = a119880119905 at slot t as

120587119905 (119860 119905) = 120587119905 (119861)119861isinA119880 120587119905 (119861) = Pr 119860 119905 = 119861

forall119860 119905 isin A119880(13a)

and a regret 120588119905(119860 119905) for not playing this action at slot t as

120588119905 (119860 119905) = max 0 119906119905minus1 minus 119906119905 forall119860 119905 isin A119880 (13b)

In JUSTE-RL the probability distribution of a regret over allpossible actions becomes the BoltzmannndashGibbs distribution(aka canonical ensemble) given by [22]

119866 120588119905 (119860 119905) = exp minus120588119905 (119860 119905) 119896119879119861sum119861isinA119880 exp minus120588119905 (119861) 119896119879119861 forall119860 119905 isin A119880

(14)

where k = 138064852 times 10minus23 JK is the Boltzmann constant119879119861 is the system temperature (in K) High temperatures make

all actions almost equiprobable and low temperatures resultin greedy action selection [22]

Using the above definitions the dynamics of a JUSTE-RLwith regret can be described as [23]

119906119905+1 (119861) = 119906119905 (119861) + 1205721199051119860119905=119861 (119906119905 minus 119906119905 (119861)) 120588119905+1 (119861) = 120588119905 (119861) + 120573119905 (119906119905 (119861) minus 119906119905 minus 120588119905 (119861)) 120587119905+1 (119861) = 120587119905 (119861) + 120574119905 (119866 120588119905 (119861) minus 120587119899 (119861))

(15)

for all 119861 isin A119880 where 120572119905 120573119905 and 120574119905 are the learning rates suchthat [23]

lim119905rarrinfin

119905sum120591=1

120572120591 = +infinlim119905rarrinfin

119905sum120591=1

1205722120591 lt +infinlim119905rarrinfin

119905sum120591=1

120573120591 = +infinlim119905rarrinfin

119905sum120591=1

1205732120591 lt +infinlim119905rarrinfin

119905sum120591=1

120574120591 = +infinlim119905rarrinfin

119905sum120591=1

1205742120591 lt +infin

8 Mobile Information Systems

Initialization(1) Input 120592120572 120592120573 120592120574 119879(2) For all 1198600 isin A119880 set 1199060(1198600) larr 0 1205880(1198600) larr 0 1205870(1198600) larr 0(3) For all 119899 isin N set 119903119880119899 (0) larr 0Main Loop(4)While (119905 le 119879) do(5) Select 119860 119905 larr argmax119861isinA119880(120587119905(119861)) and set a119880119905 larr 119860 119905(6) For all 119899 isin D set 119887119899(119905) = max0 1 minus sum119898isinM 119886119898119899 (119905)(7) For all 119899 isin C set 119887119899(119905) = 0(8) Execute 119860 119905 and observe 119903119880119899 (119905) for all 119899 isin N(9) Solve (12a)ndash(12e) to find an optimal 119906119905(119860 119905) larr 119906119905(10) For all 119861 isin A119880 update 119906119905(119861) 120588119905(119861) 120587119905(119861) using (15)

(11) End

Algorithm 1 JUSTE-RL with regret for unlicensed channel allocation RL algorithm for outband channel allocation

lim119905rarrinfin

120574119905120572119905 = 0lim119905rarrinfin

120573119905120574119905 = 0(16a)

Typically the learning rates are set equal [22]

120572119905 = 1(119905 + 1)120592120572 120573119905 = 1(119905 + 1)120592120573 120574119905 = 1(119905 + 1)120592120574

(16b)

where 120592120572 isin (05 1] 120592120573 isin (05 1] 120592120574 isin (05 1] and 120592120572 le120592120573 le 120592120574 The initializations 1199060(119860) ge 0 1205880(119860) ge 0 and1205870(119860) ge 0 should be sufficiently close to zero for all 119860 isin A119880The dynamics (15) converge to the 120576-Nash equilibrium Notethat a Nash equilibrium point for (15) is given by [23]

lim119905rarrinfin

120587119905 (119860) = 120587lowast (119860) forall119860 isin A119880 (17)

The corresponding learning algorithm for unlicensed chan-nel allocation is presented in Algorithm 1 (where T indicatesthe total simulation length in slots) Note that this algorithmconverges when 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880 Thecomplexity of JUSTE-RL with regret is mainly determinedby the size of an action set A119880 since at any slot t we have

to select an action 119860 119905 isin A119880 that maximizes 120587119905(119860 119905) (thedynamics in (15) require only algebraic operations and thusits computational complexity is negligible) Consequently theworst-case time complexity of Algorithm 1 is 119874(119899) where119899 = |A119880| le 119873 times 119872 is the size of our action set

33 Inband Resource Allocation Consider (12a)ndash(12e) thatrepresents a joint mode RB and power level allocationproblemThis problem has two binary optimization variablesa119871119905 and c119905 one real-valued variable p119905 nonlinear objective(12a) and nonlinear constraints (12c) and (12e) Hence itbelongs to a family of the mixed-integer nonlinear program-ming (MINLP) problems It has been well established inthe past (see eg [38]) that all MINLP problems involvingbinary variables (such as (12a)ndash(12e)) are NondeterministicPolynomial-time- (NP-) hard For immediate NP-hardnessproof for a considered problem note that given that a119871119905 canbe either 0 or 1 any feasible solution to (12a)ndash(12e) is a subsetof vertices The constraint (12d) also implies that at least oneend point of every edge is included in this subset Hence thesolution to this problem describes a vertex cover for whichfinding a minimum is NP-hard

Most of the MINLP solution techniques involve theconstruction of the following relaxations to the consideredproblem a nonlinear programming (NLP) relaxation (theoriginal problem without integer restrictions) and a mixed-integer linear-programming (MILP) relaxation (an originalproblem where the nonlinearities are replaced by supportinghyperplanes) To form the MILP and NLP relaxations to(12a)ndash(12e) let us first represent in equivalent form thefollowing

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (18a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(18b)

Mobile Information Systems 9

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (18c)

1198921119896 (a119871119905 c119905) = sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) minus 1 le 0 forall119896 isin K (18d)

1198922119899 (a119871119905 p119905) = SINR119899 (119905) minus SINRtar119899 le 0 forall119899 isin N (18e)

1198923119899119896 (a119871119905 p119905) = SINR119896119899 (119905) minus 2119889119896119899 + 1 le 0 forall119899 isin N forall119896 isin K (18f)

where objective (18a) and constraints (18b) and (18c) arelinear while constraints (18d)ndash(18f) are nonlinear The MILP

relaxation to (18a)ndash(18f) in a given point (a1198710119905 c0119905 p0119905 ) is given

by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (19a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(19b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (19c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (19d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (19e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (19f)

The NLP relaxation to (18a)ndash(18f) is given by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (20a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(20b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (20c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (20d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (20e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (20f)

10 Mobile Information Systems

where

A119871 = a119871119905 | 0 le 119886119896119899 (119905) le 1 forall119899 isin N forall119896 isin K (20g)

C

= c119905 | 0 le 119888119899 (119905) le 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (20h)

In general allMINLPproblems can be solved using eitherexact techniques (eg branch-and-bound [39]) or heuristicmethods (such as local branching [40] large neighborhoodsearch [41] and feasibility pump [42]) Since we are inter-ested in a reasonably simple and fast algorithm it is moreconvenient to use heuristics to solve (18a)ndash(18f) Amongnumerous heuristic techniques feasibility pump (FP) [43] isperhaps the most simple and effective method for producingmore and better solutions in a shorter average running

time (the local convergence properties of FP for nonconvexproblems have been proved in [44]) The fundamental ideaof an FP heuristic is to decompose the MINLP probleminto two parts integer feasibility and constraint feasibilityThe former is achieved by rounding (solving the MILPrelaxation to an original problem) the latter by projection(solving the NLP relaxation) The algorithm generates twosequences of integral and rounding pointsThe first sequenceof integral points (a119871119894119905 c119894119905 p119894119905)119868119894=1 119868 = 1 2 containsthe solutions that may violate the nonlinear constraints thesecond sequence (a119871119894119905 c119894119905 p119894119905)119868119894=1 comprises the roundingpoints that are feasible for the MILP relaxation but might notbe integral

Particularly with the input (a1198711119905 c1119905 p1119905 ) being a solution toan NLP relaxation (20a)ndash(20f) FP generates two sequencesby solving the following problems for 119894 = 1 119868

(a119871119894119905 c119894119905 p119894119905) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038171 (21a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(21b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (21c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (21d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (21e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (21f)

(a119871119894+1119905 c119894+1119905 p119894+1119905 ) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038172 (22a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(22b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (22c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (22d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (22e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (22f)

where sdot1 and sdot2 are l1-norm and l2-norm respectivelyTherounding is carried out by solving the problem (21a)ndash(21f)

and the projection is the solution to (22a)ndash(22f) Conse-quently an FP algorithmalternates between the rounding and

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

Mobile Information Systems 5

eNB

D2D link

Cellular link

PU1

PU2

PU3 PU4

PU5PU6PU7

PU8

PU1

PU2

PU3

PU4

PU5

PU7

PU8

RB1 RB2 RB3 RB4 RB5 RB6 C7 C8 C9

aLt =

((((((((((((((

1 1 0 0 0 0

0 1 1 0 0 0

0 0 0 1 1 0

0 0 0 0 0 1

0 0 1 1 0 0

0 0 0 0 0 0

0 0 0 0 1 1

0 0 0 0 0 0

))))))))))))))

aUt =

((((((((((((((

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

1 1 0

0 0 0

0 0 1

))))))))))))))

bt =

((((((((((((((

1

1

1

1

1

0

1

0

))))))))))))))

ct =

((((((((((((((

0

1

0

0

1

0

1

0

))))))))))))))

Figure 1 A D2D-enabled cellular network with three cellular pairs (PU2 PU5 and PU7) and five D2D pairs (PU1 PU3 PU4 PU6 and PU8)Three of the D2D pairs (PU1 PU3 and PU4) use inband access and two D2D pairs (PU6 PU8) are allocated with the unlicensed channelsIn this example different cellular and D2D pairs interfere with each other when transmitting over RB2 (where PU2 interferes with PU1) RB3(PU1 interferes with PU5) RB4 (PU5 interferes with PU3) RB5 (PU7 interferes with PU3) and RB6 (PU4 interferes with PU7)

C = c119905 | 119888119899 (119905) isin 0 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (8d)

P = p119905 | 0 le 119901119899 (119905) le 119875max119899 forall119899 isin N (8e)

Example of a D2D-enabled network with all defined opti-mization variables is shown in Figure 1

Ideally at any slot t the eNB should distribute the net-work resources among the users tomaximize their aggregatedservice rate That is to maximize the sum

sum119899isinN

119903119899 (119905) = sum119899isinN

(119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905)) (9)

where 119903119899(119905) represents the service rate of PUn (operatingeither inband or outband) However when communicating

6 Mobile Information Systems

over the unlicensed spectrum eachD2Dpair should transmitat a maximal power level to achieve the high SINR regime(and consequently service rate) which in turn results inincreased power consumption of mobile terminals There-fore when formulating the utility of each device pair weshould also consider the cost of power consumption toquantify the trade-off between the achieved rate and powerlevel (as in [37]) Accordingly we can define a utility 119906119899(119905)

of PUn at slot 119905 as the difference between its instantaneousservice rate 119903119899(119905) and the cost of power consumption

119906119899 (119905) = 119903119899 (119905) minus 120592119899119901119899 (119905)= 119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905) (10)

where 120592119899 ge 0 is the cost per unit (W) level of power for PUnUsing the above definition we can express our resource

allocation problem as follows

maximize sum119899isinN 119906119899 (119905) = sum119899isinN [119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (11a)

subject to a119871119905 isin A119871a119880119905 isin A119880b119905 isin Bc119905 isin Cp119905 isin P

(11b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (11c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (11d)

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (11e)

SINR119896119899 (119905) le SINRtar119899 forall119899 isin N forall119896 isin K cup M (11f)

where the constraint (11f) is necessary to protect the usersfrom heavy interference (here SINRtar

119899 stands for theminimalSINR level acceptable by PUn) Note that information onthe sets C and D is readily available at the eNB The valuesof 119866119896119899119898 for 119899 isin N 119898 isin N and 119896 isin K are obtained bythe eNB from the CSI carried by the RSs The only missinginformation is related to 119903119880119899 (119905) that depends on the parameters119879119898119903 (representing the availability of the unlicensed channel119862119896 in our model) and 119866119898119899119899 (which defines the quality ofunlicensed channel 119862119896) for all 119898 isin M The latter parameteris determined by the unlicensed channel allocations andhence the eNB can adapt to the changes of 119866119898119899119899 in time andspace Since there is no coordination (and no informationexchange) between the LTE and outband RAT interfacessolving (11a)ndash(11f) to optimality might be impossible whichis a rather strong argument in favor of applying a well-knownreinforcement learning (RL) for resource allocation

The main idea behind RL is that the actions (unlicensedchannel allocations) leading to the higher network utility atslot 119905 should be granted with higher probabilities at slot 119905 + 1and vice versa [22] In the simplest form of RL (presentedin [24]) the learning agent estimates its possible strategiesbased on the locally observed utility without any prior infor-mation about the operating environment This form of RL

requires only algebraic operations but does not guarantee theconvergence to an equilibrium [25] In Q-learning [22] theagentrsquos utility is estimated using some value-action functionGiven the certain (easy to follow) conditions this algorithmconverges (with probability 1) to an NE state However itrequires maximization of the action-value at every slot t(which can be computationally demanding depending onthe structure of a chosen value-action function) [20] InJUSTE-RL algorithm [23] the learning agent estimates notonly its own strategy but also the expected utility for allof its actions Unlike Q-learning JUSTE-RL does not needto perform optimization of the action-value (since onlyalgebraic operations are required to update the strategies)and hence it has a lower computational complexity On theother hand compared to a basic RL algorithm JUSTE-RLconverges to a 120576-NE [23 25] We now show how a JUSTE-RLwith regret can be applied to our problem

32 Unlicensed Channel Allocation To apply JUSTE-RL withregret to our problem we represent it as a game with oneplayer (the eNB) having no information about the operatingenvironment A finite set of the eNBrsquos actions A119880 representsthe set of all admissible unlicensed channel allocation deci-sions The objective of the eNB is to select at any slot t an

Mobile Information Systems 7

action 119860 119905 = a119880119905 isin A119880 to maximize the eNBrsquos utility 119906119905 =sum119899isinN 119906119899(119905) In the following we use notation 119886119898119899 (119905) forall119898 isin Mto specify the eNBrsquos decision regarding the allocation of anunlicensed channel 119862119896 to a pair PUn and a119880119905 to describe allunlicensed channel allocations by the eNB when selectinga particular action 119860 119905 at slot t We also use b119905 to denote

the D2D access allocation vector and 119903119880119899 (119905) to indicate theoutband service rate achieved by playing the action 119860 119905 Aftertaking an action 119860 119905 = a119880119905 at slot t the eNB observes the(random) service rate 119903119880119899 (119905) and estimates the network utility119906119905 = 119906119905(119860 119905) by solving the following problem

maximize 119906119905 = sum119899isinN

[119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (12a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(12b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (12c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (12d)

SINR119899 (119905) le SINRtar119899 forall119899 isin N (12e)

where

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (12f)

and 119887119899(119905) = 0 forall119899 isin C Note that unlike problem (11a)ndash(11f)problem (12a)ndash(12e) can be solved to optimality (since 119903119880119899 (119905) isknown) It has three optimization variables a119871119905 c119905 and p119905 andhence its complexity is lower than the complexity of (11a)ndash(11f) (the method for solving (12a)ndash(12e) is presented in thenext subsection)

We also define a mixed-strategy probability 120587119905 of playingan action 119860 119905 = a119880119905 at slot t as

120587119905 (119860 119905) = 120587119905 (119861)119861isinA119880 120587119905 (119861) = Pr 119860 119905 = 119861

forall119860 119905 isin A119880(13a)

and a regret 120588119905(119860 119905) for not playing this action at slot t as

120588119905 (119860 119905) = max 0 119906119905minus1 minus 119906119905 forall119860 119905 isin A119880 (13b)

In JUSTE-RL the probability distribution of a regret over allpossible actions becomes the BoltzmannndashGibbs distribution(aka canonical ensemble) given by [22]

119866 120588119905 (119860 119905) = exp minus120588119905 (119860 119905) 119896119879119861sum119861isinA119880 exp minus120588119905 (119861) 119896119879119861 forall119860 119905 isin A119880

(14)

where k = 138064852 times 10minus23 JK is the Boltzmann constant119879119861 is the system temperature (in K) High temperatures make

all actions almost equiprobable and low temperatures resultin greedy action selection [22]

Using the above definitions the dynamics of a JUSTE-RLwith regret can be described as [23]

119906119905+1 (119861) = 119906119905 (119861) + 1205721199051119860119905=119861 (119906119905 minus 119906119905 (119861)) 120588119905+1 (119861) = 120588119905 (119861) + 120573119905 (119906119905 (119861) minus 119906119905 minus 120588119905 (119861)) 120587119905+1 (119861) = 120587119905 (119861) + 120574119905 (119866 120588119905 (119861) minus 120587119899 (119861))

(15)

for all 119861 isin A119880 where 120572119905 120573119905 and 120574119905 are the learning rates suchthat [23]

lim119905rarrinfin

119905sum120591=1

120572120591 = +infinlim119905rarrinfin

119905sum120591=1

1205722120591 lt +infinlim119905rarrinfin

119905sum120591=1

120573120591 = +infinlim119905rarrinfin

119905sum120591=1

1205732120591 lt +infinlim119905rarrinfin

119905sum120591=1

120574120591 = +infinlim119905rarrinfin

119905sum120591=1

1205742120591 lt +infin

8 Mobile Information Systems

Initialization(1) Input 120592120572 120592120573 120592120574 119879(2) For all 1198600 isin A119880 set 1199060(1198600) larr 0 1205880(1198600) larr 0 1205870(1198600) larr 0(3) For all 119899 isin N set 119903119880119899 (0) larr 0Main Loop(4)While (119905 le 119879) do(5) Select 119860 119905 larr argmax119861isinA119880(120587119905(119861)) and set a119880119905 larr 119860 119905(6) For all 119899 isin D set 119887119899(119905) = max0 1 minus sum119898isinM 119886119898119899 (119905)(7) For all 119899 isin C set 119887119899(119905) = 0(8) Execute 119860 119905 and observe 119903119880119899 (119905) for all 119899 isin N(9) Solve (12a)ndash(12e) to find an optimal 119906119905(119860 119905) larr 119906119905(10) For all 119861 isin A119880 update 119906119905(119861) 120588119905(119861) 120587119905(119861) using (15)

(11) End

Algorithm 1 JUSTE-RL with regret for unlicensed channel allocation RL algorithm for outband channel allocation

lim119905rarrinfin

120574119905120572119905 = 0lim119905rarrinfin

120573119905120574119905 = 0(16a)

Typically the learning rates are set equal [22]

120572119905 = 1(119905 + 1)120592120572 120573119905 = 1(119905 + 1)120592120573 120574119905 = 1(119905 + 1)120592120574

(16b)

where 120592120572 isin (05 1] 120592120573 isin (05 1] 120592120574 isin (05 1] and 120592120572 le120592120573 le 120592120574 The initializations 1199060(119860) ge 0 1205880(119860) ge 0 and1205870(119860) ge 0 should be sufficiently close to zero for all 119860 isin A119880The dynamics (15) converge to the 120576-Nash equilibrium Notethat a Nash equilibrium point for (15) is given by [23]

lim119905rarrinfin

120587119905 (119860) = 120587lowast (119860) forall119860 isin A119880 (17)

The corresponding learning algorithm for unlicensed chan-nel allocation is presented in Algorithm 1 (where T indicatesthe total simulation length in slots) Note that this algorithmconverges when 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880 Thecomplexity of JUSTE-RL with regret is mainly determinedby the size of an action set A119880 since at any slot t we have

to select an action 119860 119905 isin A119880 that maximizes 120587119905(119860 119905) (thedynamics in (15) require only algebraic operations and thusits computational complexity is negligible) Consequently theworst-case time complexity of Algorithm 1 is 119874(119899) where119899 = |A119880| le 119873 times 119872 is the size of our action set

33 Inband Resource Allocation Consider (12a)ndash(12e) thatrepresents a joint mode RB and power level allocationproblemThis problem has two binary optimization variablesa119871119905 and c119905 one real-valued variable p119905 nonlinear objective(12a) and nonlinear constraints (12c) and (12e) Hence itbelongs to a family of the mixed-integer nonlinear program-ming (MINLP) problems It has been well established inthe past (see eg [38]) that all MINLP problems involvingbinary variables (such as (12a)ndash(12e)) are NondeterministicPolynomial-time- (NP-) hard For immediate NP-hardnessproof for a considered problem note that given that a119871119905 canbe either 0 or 1 any feasible solution to (12a)ndash(12e) is a subsetof vertices The constraint (12d) also implies that at least oneend point of every edge is included in this subset Hence thesolution to this problem describes a vertex cover for whichfinding a minimum is NP-hard

Most of the MINLP solution techniques involve theconstruction of the following relaxations to the consideredproblem a nonlinear programming (NLP) relaxation (theoriginal problem without integer restrictions) and a mixed-integer linear-programming (MILP) relaxation (an originalproblem where the nonlinearities are replaced by supportinghyperplanes) To form the MILP and NLP relaxations to(12a)ndash(12e) let us first represent in equivalent form thefollowing

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (18a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(18b)

Mobile Information Systems 9

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (18c)

1198921119896 (a119871119905 c119905) = sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) minus 1 le 0 forall119896 isin K (18d)

1198922119899 (a119871119905 p119905) = SINR119899 (119905) minus SINRtar119899 le 0 forall119899 isin N (18e)

1198923119899119896 (a119871119905 p119905) = SINR119896119899 (119905) minus 2119889119896119899 + 1 le 0 forall119899 isin N forall119896 isin K (18f)

where objective (18a) and constraints (18b) and (18c) arelinear while constraints (18d)ndash(18f) are nonlinear The MILP

relaxation to (18a)ndash(18f) in a given point (a1198710119905 c0119905 p0119905 ) is given

by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (19a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(19b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (19c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (19d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (19e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (19f)

The NLP relaxation to (18a)ndash(18f) is given by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (20a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(20b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (20c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (20d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (20e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (20f)

10 Mobile Information Systems

where

A119871 = a119871119905 | 0 le 119886119896119899 (119905) le 1 forall119899 isin N forall119896 isin K (20g)

C

= c119905 | 0 le 119888119899 (119905) le 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (20h)

In general allMINLPproblems can be solved using eitherexact techniques (eg branch-and-bound [39]) or heuristicmethods (such as local branching [40] large neighborhoodsearch [41] and feasibility pump [42]) Since we are inter-ested in a reasonably simple and fast algorithm it is moreconvenient to use heuristics to solve (18a)ndash(18f) Amongnumerous heuristic techniques feasibility pump (FP) [43] isperhaps the most simple and effective method for producingmore and better solutions in a shorter average running

time (the local convergence properties of FP for nonconvexproblems have been proved in [44]) The fundamental ideaof an FP heuristic is to decompose the MINLP probleminto two parts integer feasibility and constraint feasibilityThe former is achieved by rounding (solving the MILPrelaxation to an original problem) the latter by projection(solving the NLP relaxation) The algorithm generates twosequences of integral and rounding pointsThe first sequenceof integral points (a119871119894119905 c119894119905 p119894119905)119868119894=1 119868 = 1 2 containsthe solutions that may violate the nonlinear constraints thesecond sequence (a119871119894119905 c119894119905 p119894119905)119868119894=1 comprises the roundingpoints that are feasible for the MILP relaxation but might notbe integral

Particularly with the input (a1198711119905 c1119905 p1119905 ) being a solution toan NLP relaxation (20a)ndash(20f) FP generates two sequencesby solving the following problems for 119894 = 1 119868

(a119871119894119905 c119894119905 p119894119905) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038171 (21a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(21b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (21c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (21d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (21e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (21f)

(a119871119894+1119905 c119894+1119905 p119894+1119905 ) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038172 (22a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(22b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (22c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (22d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (22e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (22f)

where sdot1 and sdot2 are l1-norm and l2-norm respectivelyTherounding is carried out by solving the problem (21a)ndash(21f)

and the projection is the solution to (22a)ndash(22f) Conse-quently an FP algorithmalternates between the rounding and

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

6 Mobile Information Systems

over the unlicensed spectrum eachD2Dpair should transmitat a maximal power level to achieve the high SINR regime(and consequently service rate) which in turn results inincreased power consumption of mobile terminals There-fore when formulating the utility of each device pair weshould also consider the cost of power consumption toquantify the trade-off between the achieved rate and powerlevel (as in [37]) Accordingly we can define a utility 119906119899(119905)

of PUn at slot 119905 as the difference between its instantaneousservice rate 119903119899(119905) and the cost of power consumption

119906119899 (119905) = 119903119899 (119905) minus 120592119899119901119899 (119905)= 119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905) (10)

where 120592119899 ge 0 is the cost per unit (W) level of power for PUnUsing the above definition we can express our resource

allocation problem as follows

maximize sum119899isinN 119906119899 (119905) = sum119899isinN [119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (11a)

subject to a119871119905 isin A119871a119880119905 isin A119880b119905 isin Bc119905 isin Cp119905 isin P

(11b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (11c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (11d)

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (11e)

SINR119896119899 (119905) le SINRtar119899 forall119899 isin N forall119896 isin K cup M (11f)

where the constraint (11f) is necessary to protect the usersfrom heavy interference (here SINRtar

119899 stands for theminimalSINR level acceptable by PUn) Note that information onthe sets C and D is readily available at the eNB The valuesof 119866119896119899119898 for 119899 isin N 119898 isin N and 119896 isin K are obtained bythe eNB from the CSI carried by the RSs The only missinginformation is related to 119903119880119899 (119905) that depends on the parameters119879119898119903 (representing the availability of the unlicensed channel119862119896 in our model) and 119866119898119899119899 (which defines the quality ofunlicensed channel 119862119896) for all 119898 isin M The latter parameteris determined by the unlicensed channel allocations andhence the eNB can adapt to the changes of 119866119898119899119899 in time andspace Since there is no coordination (and no informationexchange) between the LTE and outband RAT interfacessolving (11a)ndash(11f) to optimality might be impossible whichis a rather strong argument in favor of applying a well-knownreinforcement learning (RL) for resource allocation

The main idea behind RL is that the actions (unlicensedchannel allocations) leading to the higher network utility atslot 119905 should be granted with higher probabilities at slot 119905 + 1and vice versa [22] In the simplest form of RL (presentedin [24]) the learning agent estimates its possible strategiesbased on the locally observed utility without any prior infor-mation about the operating environment This form of RL

requires only algebraic operations but does not guarantee theconvergence to an equilibrium [25] In Q-learning [22] theagentrsquos utility is estimated using some value-action functionGiven the certain (easy to follow) conditions this algorithmconverges (with probability 1) to an NE state However itrequires maximization of the action-value at every slot t(which can be computationally demanding depending onthe structure of a chosen value-action function) [20] InJUSTE-RL algorithm [23] the learning agent estimates notonly its own strategy but also the expected utility for allof its actions Unlike Q-learning JUSTE-RL does not needto perform optimization of the action-value (since onlyalgebraic operations are required to update the strategies)and hence it has a lower computational complexity On theother hand compared to a basic RL algorithm JUSTE-RLconverges to a 120576-NE [23 25] We now show how a JUSTE-RLwith regret can be applied to our problem

32 Unlicensed Channel Allocation To apply JUSTE-RL withregret to our problem we represent it as a game with oneplayer (the eNB) having no information about the operatingenvironment A finite set of the eNBrsquos actions A119880 representsthe set of all admissible unlicensed channel allocation deci-sions The objective of the eNB is to select at any slot t an

Mobile Information Systems 7

action 119860 119905 = a119880119905 isin A119880 to maximize the eNBrsquos utility 119906119905 =sum119899isinN 119906119899(119905) In the following we use notation 119886119898119899 (119905) forall119898 isin Mto specify the eNBrsquos decision regarding the allocation of anunlicensed channel 119862119896 to a pair PUn and a119880119905 to describe allunlicensed channel allocations by the eNB when selectinga particular action 119860 119905 at slot t We also use b119905 to denote

the D2D access allocation vector and 119903119880119899 (119905) to indicate theoutband service rate achieved by playing the action 119860 119905 Aftertaking an action 119860 119905 = a119880119905 at slot t the eNB observes the(random) service rate 119903119880119899 (119905) and estimates the network utility119906119905 = 119906119905(119860 119905) by solving the following problem

maximize 119906119905 = sum119899isinN

[119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (12a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(12b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (12c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (12d)

SINR119899 (119905) le SINRtar119899 forall119899 isin N (12e)

where

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (12f)

and 119887119899(119905) = 0 forall119899 isin C Note that unlike problem (11a)ndash(11f)problem (12a)ndash(12e) can be solved to optimality (since 119903119880119899 (119905) isknown) It has three optimization variables a119871119905 c119905 and p119905 andhence its complexity is lower than the complexity of (11a)ndash(11f) (the method for solving (12a)ndash(12e) is presented in thenext subsection)

We also define a mixed-strategy probability 120587119905 of playingan action 119860 119905 = a119880119905 at slot t as

120587119905 (119860 119905) = 120587119905 (119861)119861isinA119880 120587119905 (119861) = Pr 119860 119905 = 119861

forall119860 119905 isin A119880(13a)

and a regret 120588119905(119860 119905) for not playing this action at slot t as

120588119905 (119860 119905) = max 0 119906119905minus1 minus 119906119905 forall119860 119905 isin A119880 (13b)

In JUSTE-RL the probability distribution of a regret over allpossible actions becomes the BoltzmannndashGibbs distribution(aka canonical ensemble) given by [22]

119866 120588119905 (119860 119905) = exp minus120588119905 (119860 119905) 119896119879119861sum119861isinA119880 exp minus120588119905 (119861) 119896119879119861 forall119860 119905 isin A119880

(14)

where k = 138064852 times 10minus23 JK is the Boltzmann constant119879119861 is the system temperature (in K) High temperatures make

all actions almost equiprobable and low temperatures resultin greedy action selection [22]

Using the above definitions the dynamics of a JUSTE-RLwith regret can be described as [23]

119906119905+1 (119861) = 119906119905 (119861) + 1205721199051119860119905=119861 (119906119905 minus 119906119905 (119861)) 120588119905+1 (119861) = 120588119905 (119861) + 120573119905 (119906119905 (119861) minus 119906119905 minus 120588119905 (119861)) 120587119905+1 (119861) = 120587119905 (119861) + 120574119905 (119866 120588119905 (119861) minus 120587119899 (119861))

(15)

for all 119861 isin A119880 where 120572119905 120573119905 and 120574119905 are the learning rates suchthat [23]

lim119905rarrinfin

119905sum120591=1

120572120591 = +infinlim119905rarrinfin

119905sum120591=1

1205722120591 lt +infinlim119905rarrinfin

119905sum120591=1

120573120591 = +infinlim119905rarrinfin

119905sum120591=1

1205732120591 lt +infinlim119905rarrinfin

119905sum120591=1

120574120591 = +infinlim119905rarrinfin

119905sum120591=1

1205742120591 lt +infin

8 Mobile Information Systems

Initialization(1) Input 120592120572 120592120573 120592120574 119879(2) For all 1198600 isin A119880 set 1199060(1198600) larr 0 1205880(1198600) larr 0 1205870(1198600) larr 0(3) For all 119899 isin N set 119903119880119899 (0) larr 0Main Loop(4)While (119905 le 119879) do(5) Select 119860 119905 larr argmax119861isinA119880(120587119905(119861)) and set a119880119905 larr 119860 119905(6) For all 119899 isin D set 119887119899(119905) = max0 1 minus sum119898isinM 119886119898119899 (119905)(7) For all 119899 isin C set 119887119899(119905) = 0(8) Execute 119860 119905 and observe 119903119880119899 (119905) for all 119899 isin N(9) Solve (12a)ndash(12e) to find an optimal 119906119905(119860 119905) larr 119906119905(10) For all 119861 isin A119880 update 119906119905(119861) 120588119905(119861) 120587119905(119861) using (15)

(11) End

Algorithm 1 JUSTE-RL with regret for unlicensed channel allocation RL algorithm for outband channel allocation

lim119905rarrinfin

120574119905120572119905 = 0lim119905rarrinfin

120573119905120574119905 = 0(16a)

Typically the learning rates are set equal [22]

120572119905 = 1(119905 + 1)120592120572 120573119905 = 1(119905 + 1)120592120573 120574119905 = 1(119905 + 1)120592120574

(16b)

where 120592120572 isin (05 1] 120592120573 isin (05 1] 120592120574 isin (05 1] and 120592120572 le120592120573 le 120592120574 The initializations 1199060(119860) ge 0 1205880(119860) ge 0 and1205870(119860) ge 0 should be sufficiently close to zero for all 119860 isin A119880The dynamics (15) converge to the 120576-Nash equilibrium Notethat a Nash equilibrium point for (15) is given by [23]

lim119905rarrinfin

120587119905 (119860) = 120587lowast (119860) forall119860 isin A119880 (17)

The corresponding learning algorithm for unlicensed chan-nel allocation is presented in Algorithm 1 (where T indicatesthe total simulation length in slots) Note that this algorithmconverges when 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880 Thecomplexity of JUSTE-RL with regret is mainly determinedby the size of an action set A119880 since at any slot t we have

to select an action 119860 119905 isin A119880 that maximizes 120587119905(119860 119905) (thedynamics in (15) require only algebraic operations and thusits computational complexity is negligible) Consequently theworst-case time complexity of Algorithm 1 is 119874(119899) where119899 = |A119880| le 119873 times 119872 is the size of our action set

33 Inband Resource Allocation Consider (12a)ndash(12e) thatrepresents a joint mode RB and power level allocationproblemThis problem has two binary optimization variablesa119871119905 and c119905 one real-valued variable p119905 nonlinear objective(12a) and nonlinear constraints (12c) and (12e) Hence itbelongs to a family of the mixed-integer nonlinear program-ming (MINLP) problems It has been well established inthe past (see eg [38]) that all MINLP problems involvingbinary variables (such as (12a)ndash(12e)) are NondeterministicPolynomial-time- (NP-) hard For immediate NP-hardnessproof for a considered problem note that given that a119871119905 canbe either 0 or 1 any feasible solution to (12a)ndash(12e) is a subsetof vertices The constraint (12d) also implies that at least oneend point of every edge is included in this subset Hence thesolution to this problem describes a vertex cover for whichfinding a minimum is NP-hard

Most of the MINLP solution techniques involve theconstruction of the following relaxations to the consideredproblem a nonlinear programming (NLP) relaxation (theoriginal problem without integer restrictions) and a mixed-integer linear-programming (MILP) relaxation (an originalproblem where the nonlinearities are replaced by supportinghyperplanes) To form the MILP and NLP relaxations to(12a)ndash(12e) let us first represent in equivalent form thefollowing

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (18a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(18b)

Mobile Information Systems 9

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (18c)

1198921119896 (a119871119905 c119905) = sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) minus 1 le 0 forall119896 isin K (18d)

1198922119899 (a119871119905 p119905) = SINR119899 (119905) minus SINRtar119899 le 0 forall119899 isin N (18e)

1198923119899119896 (a119871119905 p119905) = SINR119896119899 (119905) minus 2119889119896119899 + 1 le 0 forall119899 isin N forall119896 isin K (18f)

where objective (18a) and constraints (18b) and (18c) arelinear while constraints (18d)ndash(18f) are nonlinear The MILP

relaxation to (18a)ndash(18f) in a given point (a1198710119905 c0119905 p0119905 ) is given

by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (19a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(19b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (19c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (19d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (19e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (19f)

The NLP relaxation to (18a)ndash(18f) is given by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (20a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(20b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (20c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (20d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (20e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (20f)

10 Mobile Information Systems

where

A119871 = a119871119905 | 0 le 119886119896119899 (119905) le 1 forall119899 isin N forall119896 isin K (20g)

C

= c119905 | 0 le 119888119899 (119905) le 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (20h)

In general allMINLPproblems can be solved using eitherexact techniques (eg branch-and-bound [39]) or heuristicmethods (such as local branching [40] large neighborhoodsearch [41] and feasibility pump [42]) Since we are inter-ested in a reasonably simple and fast algorithm it is moreconvenient to use heuristics to solve (18a)ndash(18f) Amongnumerous heuristic techniques feasibility pump (FP) [43] isperhaps the most simple and effective method for producingmore and better solutions in a shorter average running

time (the local convergence properties of FP for nonconvexproblems have been proved in [44]) The fundamental ideaof an FP heuristic is to decompose the MINLP probleminto two parts integer feasibility and constraint feasibilityThe former is achieved by rounding (solving the MILPrelaxation to an original problem) the latter by projection(solving the NLP relaxation) The algorithm generates twosequences of integral and rounding pointsThe first sequenceof integral points (a119871119894119905 c119894119905 p119894119905)119868119894=1 119868 = 1 2 containsthe solutions that may violate the nonlinear constraints thesecond sequence (a119871119894119905 c119894119905 p119894119905)119868119894=1 comprises the roundingpoints that are feasible for the MILP relaxation but might notbe integral

Particularly with the input (a1198711119905 c1119905 p1119905 ) being a solution toan NLP relaxation (20a)ndash(20f) FP generates two sequencesby solving the following problems for 119894 = 1 119868

(a119871119894119905 c119894119905 p119894119905) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038171 (21a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(21b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (21c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (21d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (21e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (21f)

(a119871119894+1119905 c119894+1119905 p119894+1119905 ) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038172 (22a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(22b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (22c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (22d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (22e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (22f)

where sdot1 and sdot2 are l1-norm and l2-norm respectivelyTherounding is carried out by solving the problem (21a)ndash(21f)

and the projection is the solution to (22a)ndash(22f) Conse-quently an FP algorithmalternates between the rounding and

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

Mobile Information Systems 7

action 119860 119905 = a119880119905 isin A119880 to maximize the eNBrsquos utility 119906119905 =sum119899isinN 119906119899(119905) In the following we use notation 119886119898119899 (119905) forall119898 isin Mto specify the eNBrsquos decision regarding the allocation of anunlicensed channel 119862119896 to a pair PUn and a119880119905 to describe allunlicensed channel allocations by the eNB when selectinga particular action 119860 119905 at slot t We also use b119905 to denote

the D2D access allocation vector and 119903119880119899 (119905) to indicate theoutband service rate achieved by playing the action 119860 119905 Aftertaking an action 119860 119905 = a119880119905 at slot t the eNB observes the(random) service rate 119903119880119899 (119905) and estimates the network utility119906119905 = 119906119905(119860 119905) by solving the following problem

maximize 119906119905 = sum119899isinN

[119887119899 (119905) 119903119871119899 (119905) + (1 minus 119887119899 (119905)) 119903119880119899 (119905) minus 120592119899119901119899 (119905)] (12a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(12b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) le 1 forall119896 isin K (12c)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (12d)

SINR119899 (119905) le SINRtar119899 forall119899 isin N (12e)

where

119887119899 (119905) = max0 1 minus sum119898isinM

119886119898119899 (119905) forall119899 isin D (12f)

and 119887119899(119905) = 0 forall119899 isin C Note that unlike problem (11a)ndash(11f)problem (12a)ndash(12e) can be solved to optimality (since 119903119880119899 (119905) isknown) It has three optimization variables a119871119905 c119905 and p119905 andhence its complexity is lower than the complexity of (11a)ndash(11f) (the method for solving (12a)ndash(12e) is presented in thenext subsection)

We also define a mixed-strategy probability 120587119905 of playingan action 119860 119905 = a119880119905 at slot t as

120587119905 (119860 119905) = 120587119905 (119861)119861isinA119880 120587119905 (119861) = Pr 119860 119905 = 119861

forall119860 119905 isin A119880(13a)

and a regret 120588119905(119860 119905) for not playing this action at slot t as

120588119905 (119860 119905) = max 0 119906119905minus1 minus 119906119905 forall119860 119905 isin A119880 (13b)

In JUSTE-RL the probability distribution of a regret over allpossible actions becomes the BoltzmannndashGibbs distribution(aka canonical ensemble) given by [22]

119866 120588119905 (119860 119905) = exp minus120588119905 (119860 119905) 119896119879119861sum119861isinA119880 exp minus120588119905 (119861) 119896119879119861 forall119860 119905 isin A119880

(14)

where k = 138064852 times 10minus23 JK is the Boltzmann constant119879119861 is the system temperature (in K) High temperatures make

all actions almost equiprobable and low temperatures resultin greedy action selection [22]

Using the above definitions the dynamics of a JUSTE-RLwith regret can be described as [23]

119906119905+1 (119861) = 119906119905 (119861) + 1205721199051119860119905=119861 (119906119905 minus 119906119905 (119861)) 120588119905+1 (119861) = 120588119905 (119861) + 120573119905 (119906119905 (119861) minus 119906119905 minus 120588119905 (119861)) 120587119905+1 (119861) = 120587119905 (119861) + 120574119905 (119866 120588119905 (119861) minus 120587119899 (119861))

(15)

for all 119861 isin A119880 where 120572119905 120573119905 and 120574119905 are the learning rates suchthat [23]

lim119905rarrinfin

119905sum120591=1

120572120591 = +infinlim119905rarrinfin

119905sum120591=1

1205722120591 lt +infinlim119905rarrinfin

119905sum120591=1

120573120591 = +infinlim119905rarrinfin

119905sum120591=1

1205732120591 lt +infinlim119905rarrinfin

119905sum120591=1

120574120591 = +infinlim119905rarrinfin

119905sum120591=1

1205742120591 lt +infin

8 Mobile Information Systems

Initialization(1) Input 120592120572 120592120573 120592120574 119879(2) For all 1198600 isin A119880 set 1199060(1198600) larr 0 1205880(1198600) larr 0 1205870(1198600) larr 0(3) For all 119899 isin N set 119903119880119899 (0) larr 0Main Loop(4)While (119905 le 119879) do(5) Select 119860 119905 larr argmax119861isinA119880(120587119905(119861)) and set a119880119905 larr 119860 119905(6) For all 119899 isin D set 119887119899(119905) = max0 1 minus sum119898isinM 119886119898119899 (119905)(7) For all 119899 isin C set 119887119899(119905) = 0(8) Execute 119860 119905 and observe 119903119880119899 (119905) for all 119899 isin N(9) Solve (12a)ndash(12e) to find an optimal 119906119905(119860 119905) larr 119906119905(10) For all 119861 isin A119880 update 119906119905(119861) 120588119905(119861) 120587119905(119861) using (15)

(11) End

Algorithm 1 JUSTE-RL with regret for unlicensed channel allocation RL algorithm for outband channel allocation

lim119905rarrinfin

120574119905120572119905 = 0lim119905rarrinfin

120573119905120574119905 = 0(16a)

Typically the learning rates are set equal [22]

120572119905 = 1(119905 + 1)120592120572 120573119905 = 1(119905 + 1)120592120573 120574119905 = 1(119905 + 1)120592120574

(16b)

where 120592120572 isin (05 1] 120592120573 isin (05 1] 120592120574 isin (05 1] and 120592120572 le120592120573 le 120592120574 The initializations 1199060(119860) ge 0 1205880(119860) ge 0 and1205870(119860) ge 0 should be sufficiently close to zero for all 119860 isin A119880The dynamics (15) converge to the 120576-Nash equilibrium Notethat a Nash equilibrium point for (15) is given by [23]

lim119905rarrinfin

120587119905 (119860) = 120587lowast (119860) forall119860 isin A119880 (17)

The corresponding learning algorithm for unlicensed chan-nel allocation is presented in Algorithm 1 (where T indicatesthe total simulation length in slots) Note that this algorithmconverges when 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880 Thecomplexity of JUSTE-RL with regret is mainly determinedby the size of an action set A119880 since at any slot t we have

to select an action 119860 119905 isin A119880 that maximizes 120587119905(119860 119905) (thedynamics in (15) require only algebraic operations and thusits computational complexity is negligible) Consequently theworst-case time complexity of Algorithm 1 is 119874(119899) where119899 = |A119880| le 119873 times 119872 is the size of our action set

33 Inband Resource Allocation Consider (12a)ndash(12e) thatrepresents a joint mode RB and power level allocationproblemThis problem has two binary optimization variablesa119871119905 and c119905 one real-valued variable p119905 nonlinear objective(12a) and nonlinear constraints (12c) and (12e) Hence itbelongs to a family of the mixed-integer nonlinear program-ming (MINLP) problems It has been well established inthe past (see eg [38]) that all MINLP problems involvingbinary variables (such as (12a)ndash(12e)) are NondeterministicPolynomial-time- (NP-) hard For immediate NP-hardnessproof for a considered problem note that given that a119871119905 canbe either 0 or 1 any feasible solution to (12a)ndash(12e) is a subsetof vertices The constraint (12d) also implies that at least oneend point of every edge is included in this subset Hence thesolution to this problem describes a vertex cover for whichfinding a minimum is NP-hard

Most of the MINLP solution techniques involve theconstruction of the following relaxations to the consideredproblem a nonlinear programming (NLP) relaxation (theoriginal problem without integer restrictions) and a mixed-integer linear-programming (MILP) relaxation (an originalproblem where the nonlinearities are replaced by supportinghyperplanes) To form the MILP and NLP relaxations to(12a)ndash(12e) let us first represent in equivalent form thefollowing

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (18a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(18b)

Mobile Information Systems 9

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (18c)

1198921119896 (a119871119905 c119905) = sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) minus 1 le 0 forall119896 isin K (18d)

1198922119899 (a119871119905 p119905) = SINR119899 (119905) minus SINRtar119899 le 0 forall119899 isin N (18e)

1198923119899119896 (a119871119905 p119905) = SINR119896119899 (119905) minus 2119889119896119899 + 1 le 0 forall119899 isin N forall119896 isin K (18f)

where objective (18a) and constraints (18b) and (18c) arelinear while constraints (18d)ndash(18f) are nonlinear The MILP

relaxation to (18a)ndash(18f) in a given point (a1198710119905 c0119905 p0119905 ) is given

by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (19a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(19b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (19c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (19d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (19e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (19f)

The NLP relaxation to (18a)ndash(18f) is given by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (20a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(20b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (20c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (20d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (20e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (20f)

10 Mobile Information Systems

where

A119871 = a119871119905 | 0 le 119886119896119899 (119905) le 1 forall119899 isin N forall119896 isin K (20g)

C

= c119905 | 0 le 119888119899 (119905) le 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (20h)

In general allMINLPproblems can be solved using eitherexact techniques (eg branch-and-bound [39]) or heuristicmethods (such as local branching [40] large neighborhoodsearch [41] and feasibility pump [42]) Since we are inter-ested in a reasonably simple and fast algorithm it is moreconvenient to use heuristics to solve (18a)ndash(18f) Amongnumerous heuristic techniques feasibility pump (FP) [43] isperhaps the most simple and effective method for producingmore and better solutions in a shorter average running

time (the local convergence properties of FP for nonconvexproblems have been proved in [44]) The fundamental ideaof an FP heuristic is to decompose the MINLP probleminto two parts integer feasibility and constraint feasibilityThe former is achieved by rounding (solving the MILPrelaxation to an original problem) the latter by projection(solving the NLP relaxation) The algorithm generates twosequences of integral and rounding pointsThe first sequenceof integral points (a119871119894119905 c119894119905 p119894119905)119868119894=1 119868 = 1 2 containsthe solutions that may violate the nonlinear constraints thesecond sequence (a119871119894119905 c119894119905 p119894119905)119868119894=1 comprises the roundingpoints that are feasible for the MILP relaxation but might notbe integral

Particularly with the input (a1198711119905 c1119905 p1119905 ) being a solution toan NLP relaxation (20a)ndash(20f) FP generates two sequencesby solving the following problems for 119894 = 1 119868

(a119871119894119905 c119894119905 p119894119905) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038171 (21a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(21b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (21c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (21d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (21e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (21f)

(a119871119894+1119905 c119894+1119905 p119894+1119905 ) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038172 (22a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(22b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (22c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (22d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (22e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (22f)

where sdot1 and sdot2 are l1-norm and l2-norm respectivelyTherounding is carried out by solving the problem (21a)ndash(21f)

and the projection is the solution to (22a)ndash(22f) Conse-quently an FP algorithmalternates between the rounding and

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

8 Mobile Information Systems

Initialization(1) Input 120592120572 120592120573 120592120574 119879(2) For all 1198600 isin A119880 set 1199060(1198600) larr 0 1205880(1198600) larr 0 1205870(1198600) larr 0(3) For all 119899 isin N set 119903119880119899 (0) larr 0Main Loop(4)While (119905 le 119879) do(5) Select 119860 119905 larr argmax119861isinA119880(120587119905(119861)) and set a119880119905 larr 119860 119905(6) For all 119899 isin D set 119887119899(119905) = max0 1 minus sum119898isinM 119886119898119899 (119905)(7) For all 119899 isin C set 119887119899(119905) = 0(8) Execute 119860 119905 and observe 119903119880119899 (119905) for all 119899 isin N(9) Solve (12a)ndash(12e) to find an optimal 119906119905(119860 119905) larr 119906119905(10) For all 119861 isin A119880 update 119906119905(119861) 120588119905(119861) 120587119905(119861) using (15)

(11) End

Algorithm 1 JUSTE-RL with regret for unlicensed channel allocation RL algorithm for outband channel allocation

lim119905rarrinfin

120574119905120572119905 = 0lim119905rarrinfin

120573119905120574119905 = 0(16a)

Typically the learning rates are set equal [22]

120572119905 = 1(119905 + 1)120592120572 120573119905 = 1(119905 + 1)120592120573 120574119905 = 1(119905 + 1)120592120574

(16b)

where 120592120572 isin (05 1] 120592120573 isin (05 1] 120592120574 isin (05 1] and 120592120572 le120592120573 le 120592120574 The initializations 1199060(119860) ge 0 1205880(119860) ge 0 and1205870(119860) ge 0 should be sufficiently close to zero for all 119860 isin A119880The dynamics (15) converge to the 120576-Nash equilibrium Notethat a Nash equilibrium point for (15) is given by [23]

lim119905rarrinfin

120587119905 (119860) = 120587lowast (119860) forall119860 isin A119880 (17)

The corresponding learning algorithm for unlicensed chan-nel allocation is presented in Algorithm 1 (where T indicatesthe total simulation length in slots) Note that this algorithmconverges when 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880 Thecomplexity of JUSTE-RL with regret is mainly determinedby the size of an action set A119880 since at any slot t we have

to select an action 119860 119905 isin A119880 that maximizes 120587119905(119860 119905) (thedynamics in (15) require only algebraic operations and thusits computational complexity is negligible) Consequently theworst-case time complexity of Algorithm 1 is 119874(119899) where119899 = |A119880| le 119873 times 119872 is the size of our action set

33 Inband Resource Allocation Consider (12a)ndash(12e) thatrepresents a joint mode RB and power level allocationproblemThis problem has two binary optimization variablesa119871119905 and c119905 one real-valued variable p119905 nonlinear objective(12a) and nonlinear constraints (12c) and (12e) Hence itbelongs to a family of the mixed-integer nonlinear program-ming (MINLP) problems It has been well established inthe past (see eg [38]) that all MINLP problems involvingbinary variables (such as (12a)ndash(12e)) are NondeterministicPolynomial-time- (NP-) hard For immediate NP-hardnessproof for a considered problem note that given that a119871119905 canbe either 0 or 1 any feasible solution to (12a)ndash(12e) is a subsetof vertices The constraint (12d) also implies that at least oneend point of every edge is included in this subset Hence thesolution to this problem describes a vertex cover for whichfinding a minimum is NP-hard

Most of the MINLP solution techniques involve theconstruction of the following relaxations to the consideredproblem a nonlinear programming (NLP) relaxation (theoriginal problem without integer restrictions) and a mixed-integer linear-programming (MILP) relaxation (an originalproblem where the nonlinearities are replaced by supportinghyperplanes) To form the MILP and NLP relaxations to(12a)ndash(12e) let us first represent in equivalent form thefollowing

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (18a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(18b)

Mobile Information Systems 9

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (18c)

1198921119896 (a119871119905 c119905) = sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) minus 1 le 0 forall119896 isin K (18d)

1198922119899 (a119871119905 p119905) = SINR119899 (119905) minus SINRtar119899 le 0 forall119899 isin N (18e)

1198923119899119896 (a119871119905 p119905) = SINR119896119899 (119905) minus 2119889119896119899 + 1 le 0 forall119899 isin N forall119896 isin K (18f)

where objective (18a) and constraints (18b) and (18c) arelinear while constraints (18d)ndash(18f) are nonlinear The MILP

relaxation to (18a)ndash(18f) in a given point (a1198710119905 c0119905 p0119905 ) is given

by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (19a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(19b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (19c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (19d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (19e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (19f)

The NLP relaxation to (18a)ndash(18f) is given by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (20a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(20b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (20c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (20d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (20e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (20f)

10 Mobile Information Systems

where

A119871 = a119871119905 | 0 le 119886119896119899 (119905) le 1 forall119899 isin N forall119896 isin K (20g)

C

= c119905 | 0 le 119888119899 (119905) le 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (20h)

In general allMINLPproblems can be solved using eitherexact techniques (eg branch-and-bound [39]) or heuristicmethods (such as local branching [40] large neighborhoodsearch [41] and feasibility pump [42]) Since we are inter-ested in a reasonably simple and fast algorithm it is moreconvenient to use heuristics to solve (18a)ndash(18f) Amongnumerous heuristic techniques feasibility pump (FP) [43] isperhaps the most simple and effective method for producingmore and better solutions in a shorter average running

time (the local convergence properties of FP for nonconvexproblems have been proved in [44]) The fundamental ideaof an FP heuristic is to decompose the MINLP probleminto two parts integer feasibility and constraint feasibilityThe former is achieved by rounding (solving the MILPrelaxation to an original problem) the latter by projection(solving the NLP relaxation) The algorithm generates twosequences of integral and rounding pointsThe first sequenceof integral points (a119871119894119905 c119894119905 p119894119905)119868119894=1 119868 = 1 2 containsthe solutions that may violate the nonlinear constraints thesecond sequence (a119871119894119905 c119894119905 p119894119905)119868119894=1 comprises the roundingpoints that are feasible for the MILP relaxation but might notbe integral

Particularly with the input (a1198711119905 c1119905 p1119905 ) being a solution toan NLP relaxation (20a)ndash(20f) FP generates two sequencesby solving the following problems for 119894 = 1 119868

(a119871119894119905 c119894119905 p119894119905) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038171 (21a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(21b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (21c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (21d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (21e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (21f)

(a119871119894+1119905 c119894+1119905 p119894+1119905 ) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038172 (22a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(22b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (22c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (22d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (22e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (22f)

where sdot1 and sdot2 are l1-norm and l2-norm respectivelyTherounding is carried out by solving the problem (21a)ndash(21f)

and the projection is the solution to (22a)ndash(22f) Conse-quently an FP algorithmalternates between the rounding and

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

Mobile Information Systems 9

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (18c)

1198921119896 (a119871119905 c119905) = sum119899isinN

119886119896119899 (119905) 119887119899 (119905) 119888119899 (119905) minus 1 le 0 forall119896 isin K (18d)

1198922119899 (a119871119905 p119905) = SINR119899 (119905) minus SINRtar119899 le 0 forall119899 isin N (18e)

1198923119899119896 (a119871119905 p119905) = SINR119896119899 (119905) minus 2119889119896119899 + 1 le 0 forall119899 isin N forall119896 isin K (18f)

where objective (18a) and constraints (18b) and (18c) arelinear while constraints (18d)ndash(18f) are nonlinear The MILP

relaxation to (18a)ndash(18f) in a given point (a1198710119905 c0119905 p0119905 ) is given

by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (19a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(19b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (19c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (19d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (19e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (19f)

The NLP relaxation to (18a)ndash(18f) is given by

minimize sum119899isinN

[120592119899119901119899 (119905) minus 120596119871119887119899 (119905) sum119896isinK

119886119896119899 (119905) 119889119896119899 minus (1 minus 119887119899 (119905)) 119903119880119899 (119905)] (20a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(20b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (20c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (20d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (20e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (20f)

10 Mobile Information Systems

where

A119871 = a119871119905 | 0 le 119886119896119899 (119905) le 1 forall119899 isin N forall119896 isin K (20g)

C

= c119905 | 0 le 119888119899 (119905) le 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (20h)

In general allMINLPproblems can be solved using eitherexact techniques (eg branch-and-bound [39]) or heuristicmethods (such as local branching [40] large neighborhoodsearch [41] and feasibility pump [42]) Since we are inter-ested in a reasonably simple and fast algorithm it is moreconvenient to use heuristics to solve (18a)ndash(18f) Amongnumerous heuristic techniques feasibility pump (FP) [43] isperhaps the most simple and effective method for producingmore and better solutions in a shorter average running

time (the local convergence properties of FP for nonconvexproblems have been proved in [44]) The fundamental ideaof an FP heuristic is to decompose the MINLP probleminto two parts integer feasibility and constraint feasibilityThe former is achieved by rounding (solving the MILPrelaxation to an original problem) the latter by projection(solving the NLP relaxation) The algorithm generates twosequences of integral and rounding pointsThe first sequenceof integral points (a119871119894119905 c119894119905 p119894119905)119868119894=1 119868 = 1 2 containsthe solutions that may violate the nonlinear constraints thesecond sequence (a119871119894119905 c119894119905 p119894119905)119868119894=1 comprises the roundingpoints that are feasible for the MILP relaxation but might notbe integral

Particularly with the input (a1198711119905 c1119905 p1119905 ) being a solution toan NLP relaxation (20a)ndash(20f) FP generates two sequencesby solving the following problems for 119894 = 1 119868

(a119871119894119905 c119894119905 p119894119905) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038171 (21a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(21b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (21c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (21d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (21e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (21f)

(a119871119894+1119905 c119894+1119905 p119894+1119905 ) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038172 (22a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(22b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (22c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (22d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (22e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (22f)

where sdot1 and sdot2 are l1-norm and l2-norm respectivelyTherounding is carried out by solving the problem (21a)ndash(21f)

and the projection is the solution to (22a)ndash(22f) Conse-quently an FP algorithmalternates between the rounding and

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

10 Mobile Information Systems

where

A119871 = a119871119905 | 0 le 119886119896119899 (119905) le 1 forall119899 isin N forall119896 isin K (20g)

C

= c119905 | 0 le 119888119899 (119905) le 1 119888119894 (119905) = 1 forall119899 isin N forall119894 isin C (20h)

In general allMINLPproblems can be solved using eitherexact techniques (eg branch-and-bound [39]) or heuristicmethods (such as local branching [40] large neighborhoodsearch [41] and feasibility pump [42]) Since we are inter-ested in a reasonably simple and fast algorithm it is moreconvenient to use heuristics to solve (18a)ndash(18f) Amongnumerous heuristic techniques feasibility pump (FP) [43] isperhaps the most simple and effective method for producingmore and better solutions in a shorter average running

time (the local convergence properties of FP for nonconvexproblems have been proved in [44]) The fundamental ideaof an FP heuristic is to decompose the MINLP probleminto two parts integer feasibility and constraint feasibilityThe former is achieved by rounding (solving the MILPrelaxation to an original problem) the latter by projection(solving the NLP relaxation) The algorithm generates twosequences of integral and rounding pointsThe first sequenceof integral points (a119871119894119905 c119894119905 p119894119905)119868119894=1 119868 = 1 2 containsthe solutions that may violate the nonlinear constraints thesecond sequence (a119871119894119905 c119894119905 p119894119905)119868119894=1 comprises the roundingpoints that are feasible for the MILP relaxation but might notbe integral

Particularly with the input (a1198711119905 c1119905 p1119905 ) being a solution toan NLP relaxation (20a)ndash(20f) FP generates two sequencesby solving the following problems for 119894 = 1 119868

(a119871119894119905 c119894119905 p119894119905) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038171 (21a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(21b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (21c)

1198921119896 (a119871119905 c119905) + nabla1198921119896 (a1198710119905 c0119905 )119879 [a119871119905 minus a1198710119905c119905 minus c0119905

] le 0 forall119896 isin K (21d)

1198922119899 (a119871119905 p119905) + nabla1198922119899 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N (21e)

1198923119899119896 (a119871119905 p119905) + nabla1198923119899119896 (a1198710119905 p0119905 )119879 [a119871119905 minus a1198710119905p119905 minus p0119905

] le 0 forall119899 isin N forall119896 isin K (21f)

(a119871119894+1119905 c119894+1119905 p119894+1119905 ) = argmin100381710038171003817100381710038171003817(a119871119905 c119905 p119905) minus (a119871119894119905 c119894119905 p119894119905)1003817100381710038171003817100381710038172 (22a)

subject to a119871119905 isin A119871c119905 isin Cp119905 isin P

(22b)

sum119899isinN

119886119896119899 (119905) 119887119899 (119905) ge 1 forall119896 isin K (22c)

1198921119896 (a119871119905 c119905) le 0 forall119896 isin K (22d)

1198922119899 (a119871119905 p119905) le 0 forall119899 isin N (22e)

1198923119899119896 (a119871119905 p119905) le 0 forall119899 isin N forall119896 isin K (22f)

where sdot1 and sdot2 are l1-norm and l2-norm respectivelyTherounding is carried out by solving the problem (21a)ndash(21f)

and the projection is the solution to (22a)ndash(22f) Conse-quently an FP algorithmalternates between the rounding and

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

Mobile Information Systems 11

projection steps until (a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905) (which impliesfeasibility) or until the number of iterations i has reached itspredefined limit ITheworkflow of the algorithm is presentedin Algorithm 2 Note that to retain the local convergencethe problems (21a)ndash(21f) and (22a)ndash(22f) have to be solvedexactly in the FP algorithm The problem (22a)ndash(22f) (andhence (20a)ndash(20f)) can be solved using any standard NLPmethod In this paper an interior point algorithm (describedeg in [45]) which has a polynomial 119874(1198992) time complexityis applied to solve (20a)ndash(20f) and (22a)ndash(22f) The MILPproblem (21a)ndash(21f) is relatively simple and therefore it canbe solved to optimality by any technique from the family ofthe branch-and-bound methods (eg [46])

Note that in general finding an optimal solution to anyjoint resource allocation problem with integrality constraintsis NP-hard (which has been shown in [47]) Consequentlymost of the recent approaches to deal with such kindof problems focus on finding the high-quality suboptimalsolutions using for example relaxation (by removing allthe integer restrictions as it has been done in [47 48]) oriterative two-stage algorithms for determining the optimalintegral solutions given fixed power levels and then findingthe optimal power allocation with fixed integral points (eg[49]) In this paper instead of relaxation or iteration wedirectly apply a heuristic FP algorithm that has a polynomial119874(119899119888) time complexity in the size n of the problem (withc being some real constant) [43] (Note that in our casethe size n of the problem (18a)ndash(18f) is proportional to|A119871| times |C| times |P| = 1198733 times 119870 The numerical results showingthe complexity of a proposed algorithm will be presentedin Section 4) Hence the presented heuristic approach hasmoderate complexity compared to the previously proposedalgorithms for resource allocationwith integrality constraintswhose complexity ranges from linear [21 30 47 48 50] topolynomial [2 3 49 51ndash53]

4 Algorithm Implementation

41 Resource Allocation Procedure We now discuss theimplementation of the proposed algorithms (presented inSection 3) in an LTE-A network The following schedulingprocedure is repeated at the beginning of each slot t as follows

(i) All users send their SRs to the eNB via dedicatedPUCCHs Note that the SRs may contain someuseful control information such as updated targetSINR level SINRtar

119899 or observed throughput on theunlicensed channel 119903119880119899 (119905)

(ii) After receiving the SRs from all of the users the eNBperforms resource allocation (by assigning themodesRBs and unlicensed channels and power levels to userpairs according to Algorithms 1 and 2) and sends theSGs with optimal allocations to the correspondingusers via PDCCHs

(iii) After receiving the SGs the users start their datatransmissions over allocated RBsunlicensed chan-nels with assigned mode and power levels

As it was already beenmentioned we deploy a CSMACAfor outband D2D access using a procedure described in IEEE80211 [54] As dictated by [54] if a certain D2D pair PUn 119899 isinD is allocated with one or more unlicensed channels thenprior to transmission one of the users must first sense thechannel (to determine whether it is idle) for the duration ofa distributed coordination function interframe space (DIFS)DIFS (which is 34 120583s long) consists of a short interframespace (SIFS) equaling 16 120583s and 2 Wi-Fi slots (each equals9 120583s) After DIFS a user must typically defer its transmissionfor a random number of slots generated from 0 to CW-1(contentionwindow size) to allow the other devices to share achannel in a fair manner Given that the minimumCW valueis CWmin = 16 the device will on an average wait for about75Wi-Fi slots before transmissionThus the average channelaccess delay is 16120583s + 95 times 9 120583s = 1015 120583s (independent ofservice rate) Since the slot duration in LTE system (119879119904 =1ms) is much longer than the average channel access delay(1015120583s) it is expected that (in average) a D2D pair will beable to exchange the data within the scheduled period Inthis case each of the users in a D2D pair should observeachieved throughput 119903119880119899 (119905) and report this value to the eNBwhen sending its SR Otherwise (if a D2D pair is not able toexchange the data within one slot) the D2D users send thevalue 119903119880119899 (119905) = 0 to the eNB Note that a CSMACA does notallow two-way data transmission Hence the second devicein a D2D pair can start the data transmission only after thefirst user has finished its transmission

It is worth mentioning that at some point in time aJUSTE-RL will reach its equilibrium state However evenafter the equilibriumhas been reached the eNB continues thelearning process because the network environment (channelquality network traffic and the number of active users) islikely to change over time resulting in different optimalmodeRBunlicensed channel and power allocations

42 Simulation Model A simulation model of the networkhas been implemented upon a standard LTE-A platformusing the OPNET simulation and development package [55]The model consists of one eNB and N user pairs randomlypositioned inside a three-sector hexagonal cell (with theantenna pattern specified in [56]) It is assumed that theusers operate outdoors in a typical urban environment andare stationary throughout all simulation runs Each userdevice has its own traffic generator enabling a variety oftraffic patterns For simplicity in the examples below theuser traffic is modeled as a full buffer with load of 10packets per second and packet size of 1500 bytes In allsimulations |C| = 10 cellular pairs 120592120573 = 11120592120572 120592120574 = 11120592120573119879 = 1000 slots 119879119861 = 106 K and I = 1000 iterations thetarget SINR levels for each device pair are set as SINRtar

119899 =SINRtar = 0 dB for all 119899 isin N The licensed band of theeNB comprises K = 100 RBs (equivalent to 20MHz) Theunlicensed band comprises M = 4 nonoverlapping OFDMchannels with 120596119880119898 = 10MHz for 119898 isin M The main simu-lation parameters of our model are listed in Table 1 Otherparameters are set in accordance with 3GPP specifications[56]

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

12 Mobile Information Systems

Initialization(1) Input 119868 119879(2)While (119905 le 119879) do(3) Input 119903119880119899 (119905) a119880119905 b119905(4) Solve (20a)ndash(20f) to find the optimal (a1198711119905 c1119905 p1119905 )Main Loop(5)While (119894 le 119868) doRounding(6) Solve (21a)ndash(21f) to find the optimal (a119871119894119905 c119894119905 p119894119905)(7) If ((a119871119894119905 c119894119905 p119894119905) = (a119871119894119905 c119894119905 p119894119905)) then breakProjection(8) Solve (22a)ndash(22f) to find the optimal (a119871119894+1119905 c119894+1119905 p119894+1119905 )(9) Set 119894 larr 119894 + 1

(10) End

Algorithm 2 FP algorithm for inband resource allocation

Table 1 Simulation parameters of LTE-A Model

Parameter ValueCell radius 500mFrame structure Type 2 (time division duplex)Slot duration 1msTDD configuration 0eNodeB Tx power 46 dBmUE activeidle Tx power 232 dBmNoise power minus174 dBmHzPath loss and cellular link 1281 + 376 log(119889) 119889 [km]NLOS path loss and D2Dlink

40 log(119889) + 30 log(119891) + 49 119889 [km]119891 [Hz]LOS path loss and D2Dlink

169 log(119889) + 20 log(1198915) + 468 119889[m] 119891 [GHz]

Shadowing st dev 10 B (cell mode) 12 dB (D2D mode)

In this paper the evaluation of a proposed approachfor inband and outband resource allocation referred to asJRA (JUSTE-RL based resource allocation) is divided intotwo parts In the first part we analyze the performanceof JUSTE-RL with regret for unlicensed channel allocation(Algorithm 1) In the second part we examine the efficiencyof a proposed joint inbandoutband resource allocation(Algorithms 1 and 2) In the following the performanceof JRA is compared with the performance of the followingresource allocation techniques

(i) First is joint inbandoutband resource allocation with120576-greedy Q-learning (GQL) [57] based on formu-lations (12a)ndash(12e) and (18a)ndash(18f) where the unli-censed channels are allocated to the users by theLTE eNB In GQL at any slot t an action with thelargest Q-value is selected with probability 1 minus 120576 andthe other actions are selected uniformly at randomwith probability 120576 In all simulation experiments thevalue of 120576 is set in accordance with the most commonsuggestions (provided eg in [22]) as 120576 = 01

(ii) Second is centralized optimal strategy (COS) wherethe inband and outband network resources are allo-cated to the users by solving (11a)ndash(11f) directly basedon global channel and network knowledge Note thatCOS corresponds to the most efficient (in terms ofnetwork utility maximization) strategy although it isnot practically realizable (since in the real networkdeployment scenarios the precise information aboutquality and availability of unlicensed channels is notavailable) (In this paper we use an FP algorithm tofind the optimal solution to (11a)ndash(11f) or (18a)ndash(18f)in GQL and COS)

(iii) Third is social heuristic for multimode D2D com-munication (SMD) in an LTE-A network proposedin [21] to reduce the complexity of an original opti-mization problem for joint inbandoutband resourceallocation This algorithm assigns user modes andresources to maximize the social welfare based onthe global channel and network knowledge The eNBcreates a randomly ordered list of the D2D pairsThen it computes the aggregated network utility foreach mode of the first user in the list and assigns thisuser with amode that provides the highest aggregatedutility This process is repeated for all D2D pairs

(iv) Fourth is greedy heuristic for multimode D2D com-munication (GMD) in LTE-A networks [21] wherethe modes and inbandoutband network resourcesare allocated tomaximize the individual usersrsquo welfarebased on the global channel and network knowledgeSimilar to SMD the eNB creates a randomly orderedlist of the D2D pairs After this it computes the utilityfor each mode of the first user in the list and assignsthis user a mode assuring the highest individualutility This process is repeated for all D2D pairs

(v) Fifth is ranked heuristic for multimode D2D com-munication (RMD) in LTE-A networks [21] Here theeNB evaluates the utility of each user in each mode(based on the global channel andnetwork knowledge)

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

Mobile Information Systems 13Av

erag

e num

ber o

f RL

itera

tions

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e str

ateg

y co

nver

genc

e

Figure 2 The average number of RL iterations (slots) necessary forconvergence of strategies in JRAwith different values of 120592120572 and fixed|C| = 10

and sorts the D2D pairs according to their utilities ina descending order Next the eNB allocates the firstuser in the list a mode that guarantees the highestaggregatednetwork utilityThis process is repeated forall D2D pairs

Note that all algorithms used in our performance evaluationare simulated with identical system parameters

43 Performance of a Learning Algorithm We start withthe performance evaluation of JUSTE-RL with regret forunlicensed channel allocation (outlined in Algorithm 1)Figures 2 and 3 demonstrate the learning speed of JRAFigure 2 shows the average number of RL iterations (slots)necessary for convergence of strategies in JRA (at the pointwhere 120587119905(119861) = 120587119905minus1(119861) for all 119861 isin A119880) with differentvalues of 120592120572 isin [055 08] 120592120573 = 11120592120572 isin [061 088] and120592120574 = 11120592120573 isin [067 097] and a varying number of user pairs119873 isin [50 500] The average number of RL iterations (slots)necessary for convergence of utilities in JRA (at the pointwhere 119906119905(119861) = 119906119905minus1(119861) for all 119861 isin A119880) with 120592120572 isin [055 08]120592120573 isin [061 088] and 120592120574 isin [067 097] and 119873 isin [50 500])is plotted in Figure 3 The accuracy of estimation in JRA ispresented in Figures 4 and 5 Figure 4 shows the absoluteerror of strategy estimation in JRA denoted as 120575120587 defined asa sum of the absolute differences between the actual optimalstrategies and the estimated strategies upon the algorithmtermination That is

120575120587 = sum119861isinA119880

1003816100381610038161003816120587119879 (119861) minus 120587lowast (119861)1003816100381610038161003816 (23)

where 120587119879(119861) is an optimal strategy estimated in JRA uponthe algorithm termination (at slot 119879) and 120587lowast(119861) is the actual

Aver

age n

umbe

r of R

L ite

ratio

ns

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

150

200

250

300

350

400

(slo

ts) b

efor

e util

ity co

nver

genc

e

Figure 3 The average number of RL iterations (slots) necessary forconvergence of utilities in JRA with different values of 120592120572 and fixed|C| = 10

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of s

trat

egy

0

001

002

003

004

005

estim

atio

n120575120587

Figure 4 The absolute error of strategy estimation 120575120587 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

optimal strategy obtained by playing an action 119861 isin A119880Figure 5 demonstrates the absolute error of utility estimationin JRA denoted 120575119906 defined as a sum of the absolutedifferences between the actual and the estimated optimalnetwork utilities upon the algorithm termination that is

120575119906 = sum119861isinA119880

1003816100381610038161003816119906119879 (119861) minus 119906lowast (119861)1003816100381610038161003816 (24)

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 14: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

14 Mobile Information Systems

120592120572 = 08

120592120572 = 075

120592120572 = 07

120592120572 = 065

120592120572 = 06

120592120572 = 055

100 150 200 250 300 350 400 450 50050

Number of user pairs N

The a

bsol

ute e

rror

of u

tility

0

001

002

003

004

005

estim

atio

n120575u

Figure 5 The absolute error of utility estimation 120575119906 in JRAcalculated upon the algorithm termination with different values of120592120572 and fixed |C| = 10

where 119906119879(119861) is an optimal network utility estimated in JRAupon the algorithm termination (at slot 119879) and 119906lowast(119861) is theactual optimal network utility obtained by playing an action119861 isin A119880 The observations in Figures 2ndash5 show that the ratesof convergence of strategies and utilities and the accuracyof strategy and utility estimation are almost the sameFurthermore we find that the number of iterations necessaryfor the algorithm convergence and absolute estimation errorstrongly depend on the setting of the parameters 120592120572 120592120573 and120592120574 the worst performance is attainedwith 120592120572 = 08 120592120573 = 061and 120592120574 = 067 and the best with 120592120572 = 055 120592120573 = 088and 120592120574 = 097 Such results are rather predictable since theparameters 120592120572 120592120573 and 120592120574 are related to the parameters 120572119905120573119905 and 120574119905 (see (16b)) which have a direct influence on thelearning rate of JRA [22 23]

In Figures 6 and 7 the instantaneous network utility 119906119905 =sum119899isinN 119906119899(119905) is presented as a function of time in scenarioswith low network load (119873 = 100) and high network load(119873 = 500) and fixed Here a proposed JRA technique issimulated with the settings 120592120572 = 05 120592120573 = 061 and 120592120574 =067 The graphs in these figures show that the efficiency ofJRA and GQL improves gradually over time After about 300slots (which is the average time necessary for the convergenceof strategies and utilities in Algorithm 1) JRA demonstratesnear-optimal results GQL needs a little longer time (asymp400slots) to converge after which its performance also becomesvery close to the performance of COS Unlike JRA and GQLthe performance of COS SMDGMD and RMD is consistentover time (since these algorithms do not involve any learningprocess) We also observe that the network utility attained inSMD GMD and RMD is much smaller than that in COSTo understand such poor performance of SMD GMD andRMD note that in these algorithms the original resource

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

7

8

9

10

11

12

ut

(lowast10

6)

Figure 6 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and |C| = 10

GQL

100 150 200 250 300 350 400 450 50050

Number of iterations (slots) t

JRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

40

42

44

46

48

50

52

54

56

58

60ut

(lowast10

6)

Figure 7 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 500 and |C| = 10allocation problem is divided into two separate problems(i) mode selection and (ii) packet scheduling After that themode selection problem is solved using very plain heuristics(social greedy or ranked) which reduces the complexityof an original optimization problem (from exponential tolinear) but has a negative impact on the performance of thesetechniques in terms of network utility maximization [21]

44 Performance of Joint InbandOutband Allocation Wenow evaluate the efficiency of a proposed inbandoutbandresource allocation (Algorithms 1 and 2) The graphs inFigures 8ndash10 demonstrate the computational complexitysolution time and solution accuracy of different resource

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 15: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

Mobile Information Systems 15

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age n

umbe

r of F

P ite

ratio

ns

0

150

300

450

600

750

(per

slot

) bef

ore c

onve

rgen

ce

Figure 8 The average number of FP iterations (per slot) necessaryfor the convergence of the algorithms with fixed |C| = 10 collectedduring T slots

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

0

30

60

90

120

150

Aver

age s

olut

ion

time (

120583s)

Figure 9 The average solution time (in 120583s) of different algorithmswith fixed |C| = 10 collected during 119879 slots

allocation techniques in the experiments with 120592120572 = 055 120592120573= 061 and 120592120574 = 067 (in JRA) collected during the entiresimulation period T Particularly in Figures 8 and 9 theaverage number of algorithm iterations (per slot) and solutiontime (in 120583s) are presented as a function of the number of userpairs N Figure 10 shows the average relative deviation fromthe optimal solution denoted as Δ and calculated accordingto

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

The a

vera

ge re

lativ

e dev

iatio

n fro

m

0

02

04

06

08

1

the o

ptim

al so

lutio

Figure 10 The average relative deviation from the optimal solutionΔ in different algorithmswith fixed |C| = 10 collected duringT slots

Δ = 1119879119879sum119905=1

(10038161003816100381610038161003816a119871119894119905 minus a119871119905lowast10038161003816100381610038161003816

a119871119905lowast + 10038161003816100381610038161003816a119880119894119905 minus a119880119905

lowast10038161003816100381610038161003816a119880119905lowast + 10038161003816100381610038161003816c119894119905 minus c119905lowast

10038161003816100381610038161003816c119905lowast

+ 10038161003816100381610038161003816p119894119905 minus p119905lowast10038161003816100381610038161003816

p119905lowast)

(25)

in GQL JRA and COS and

Δ = 1119879119879sum119905=1

10038161003816100381610038161003816119909119894119905 minus 119909119905lowast10038161003816100381610038161003816119909119905lowast (26)

in SMD FMD and RMD In the above equations(a119871119894119905 a119880119894119905 c119894119905 p119894119905) is the optimal solution found in GQLJRA and COS (a119871119905 lowast a119880119905 lowast c119905lowast p119905lowast) is the actual optimalsolution to the original resource allocation problem (11a)ndash(11f) 119909119894119905 is the mode allocation in SMD FDM and RMDand 119909119905lowast is the actual optimal allocation (a solution to theoptimization problem originally stated in [21]) It followsfrom these figures that all simulated strategies have moderatecomputational complexity Predictably COS has the highestcomplexity because the number of optimization variables(a119871119905 a

119880119905 c119905 p119905) in this algorithm is bigger than that in JRA

GQL SMD FMD and RMD The lowest complexity andsolution accuracy are achieved in SMD FMD and RMD(which have a linear time complexity but are based on veryraw approximations and plain heuristic assumptions)

Figures 11ndash13 present the observations collected at slott = 500 with 120592120572 = 055 120592120573 = 061 and 120592120574 = 067 (in JRA)The average user throughput 119903119905 (in kbitss) and the average

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 16: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

16 Mobile Information Systems

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age u

ser t

hrou

ghpu

tr t

50

65

80

95

110

125(k

bits

s)

Figure 11 The average user throughput 119903119905 (in kbitss) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

GQL

100 150 200 250 300 350 400 450 50050

Number of user pairs N

JRACOS

SMDGMDRMD

Aver

age t

rans

mit

pow

er p

er u

ser

14

16

18

20

22

24

pt

(dBm

)

Figure 12 The average user transmit power (in dBm) in differentalgorithms with fixed |C| = 10 observed at slot t = 500

transmission power (per user) 119901119905 (in dBm) in differentalgorithms estimated according to

119903119905 = 1119873 sum119899isinN

(119903119871119899 (119905) + 119903119880119899 (119905)) 119901119905 = 1119873 sum

119899isinN119901119899 (119905)

(27)

are shown in Figures 11 and 12 respectivelyThe instantaneousnetwork utility 119906119905 in different algorithms depending on thetarget SINR level SINRtar with fixed number of user pairs119873 = 100 is plotted in Figure 13 The obtained results

8

9

10

11

12

13

minus8 minus6 minus4 minus2 0 2 4 6 8 10

GQLJRACOS

SMDGMDRMD

Insta

ntan

eous

net

wor

k ut

ility

Target SINR level SINRtar

ut

(lowast10

6)

Figure 13 The instantaneous network utility 119906119905 in different algo-rithms with fixed 119873 = 100 and fixed |C| = 10 observed at slot t =500

demonstrate that the average user throughput decreaseswith the number of user pairs N (Figure 11) This is ratherpredictable because when the network load increases thenumber of RBs or unlicensed channels available for eachuser decreases resulting in a reduced throughput Besides toachieve the desired SINR levels the users tend to transmit ata higher power level (see Figure 12) when the total number ofuser pairs in the network increases The graphs in Figure 13show that the network utilities in different resource allocationschemes are described by some concave functions of SINRtarTo understand such results note that with too low settingsof SINRtar (SINRtar lt 0 dB) the total user throughputreduces because of the bad channel conditions leading tothe decreased network utility On the other hand whenSINRtar is too high (SINRtar gt 4 dB) the throughput (andconsequently network utility) degrades due to the shortageof available bandwidth since the number of channels withsuitable data transmission conditions becomes very small(because not all of them satisfy the SINR requirements ofthe users) We also observe that in all simulated scenariosthe performance of JRA is very close to optimal (ie the oneachieved in COS) GQL performs a little worse than JRA butstill better thanheuristic algorithms (SMDGMD andRMD)

5 Conclusion

This paper introduces a JRA algorithm for a D2D-enabledLTE-A network with access to unlicensed band providedby one or more RATs based on different channel accessmethods (OFDMA CSMACA FH-CDMA etc) In thepresented framework the inbandoutbandnetwork resources(cellularD2D modes spectrum and power) are allocatedjointly by the LTE eNB to maximize the total networkutility Unlikemost of the previously proposed techniques for

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 17: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

Mobile Information Systems 17

outbandD2D communication (which presume a certain levelof coordination and information exchange between licensedand unlicensed systems) our JUSTE-RL based approachfor unlicensed channel assignment is fully autonomousand has demonstrated relatively fast (asymp300 RL iterations)convergence to 120576-Nash equilibrium (given the appropriatesettings of learning rates) Simulations results also showthat the proposed joint inbandoutband resource allocationstrategy outperforms other relevant spectrum and powermanagement schemes in terms of energy efficiency andthroughput maximization

Competing Interests

The authors declare that they have no competing interests

References

[1] A Asadi Q Wang and V Mancuso ldquoA survey on device-to-device communication in cellular networksrdquo IEEE Communi-cations Surveys and Tutorials vol 16 no 4 pp 1801ndash1819 2014

[2] A Asheralieva and Y Miyanaga ldquoDynamic buffer status-basedcontrol for LTE-a networkwith underlayD2DcommunicationrdquoIEEE Transactions on Communications vol 64 no 3 pp 1342ndash1355 2016

[3] AAsheralieva andYMiyanaga ldquoQoS orientedmode spectrumand power allocation for D2D communication underlayingLTE-A networkrdquo IEEE Transactions on Vehicular Technology2016

[4] J Kim S Kim J Bang and D Hong ldquoAdaptive mode selectionin D2D communications considering the bursty traffic modelrdquoIEEE Communications Letters vol 20 no 4 pp 712ndash715 2016

[5] D Penda L Fu andM Johansson ldquoEnergy efficient D2D com-munications in dynamic TDD systemsrdquo httpsarxivorgabs150600412

[6] K Yang S Martin L Boukhatem J Wu and X Bu ldquoEnergy-efficient resource allocation for device-to-device communica-tions overlaying LTE networksrdquo in Proceedings of the IEEE 82ndVehicular Technology Conference (VTC Fall rsquo15) pp 1ndash6 BostonMass USA September 2015

[7] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo Acm Sigmetrics vol 42 no 2 pp 55ndash57 2014

[8] F Malandrino C Casetti C F Chiasserini and Z LimanildquoUplink and downlink resource allocation in D2D-enabledheterogeneous networksrdquo in Proceedings of the IEEE Wire-less Communications and Networking Conference Workshops(WCNCW rsquo14) pp 87ndash92 April 2014

[9] D Feng L Lu Y-W Yi G Y Li G Feng and S Li ldquoDevice-to-device communications underlaying cellular networksrdquo IEEETransactions on Communications vol 61 no 8 pp 3541ndash35512013

[10] L Su Y Ji P Wang and F Liu ldquoResource allocation usingparticle swarm optimization for D2D communication underlayof cellular networksrdquo in Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC rsquo13) pp129ndash133 IEEE Shanghai China April 2013

[11] Wi-Fi Alliance Wi-Fi Peer-to-Peer (P2P) Specification version11 Wi-Fi Alliance Specification 1 2010

[12] Z Alliance Zigbee Specification Document 053474r06 (ver-sion) 1 2006

[13] Bluetooth Specification Bluetooth Specification version 112001 httpwwwbluetoothcom

[14] A Asadi and V Mancuso ldquoEnergy efficient opportunisticuplink packet forwarding in hybrid wireless networksrdquo inProceedings of the 4th ACM International Conference on FutureEnergy Systems (e-Energy rsquo13) pp 261ndash262 Berkeley Calif USAMay 2013

[15] A Asadi and V Mancuso ldquoOn the compound impact ofopportunistic scheduling and D2D communications in cellularnetworksrdquo in Proceedings of the 16th ACM International Con-ference on Modeling Analysis and Simulation of Wireless andMobile Systems (MSWiM rsquo13) pp 279ndash287 November 2013

[16] A Asadi andVMancuso ldquoWiFiDirect and LTED2D in actionrdquoin Proceedings of the 6th IFIPIEEE Wireless Days Conference(WD rsquo13) pp 1ndash8 Valencia Spain November 2013

[17] Q Wang and B Rengarajan ldquoRecouping opportunistic gain indense base station layouts through energy-aware user coopera-tionrdquo in Proceedings of the IEEE 14th International Symposiumon a World of Wireless Mobile and Multimedia Networks(WoWMoM rsquo13) pp 1ndash9 June 2013

[18] B Zhou S Ma J Xu and Z Li ldquoGroup-wise channel sensingand resource pre-allocation for LTE D2D on ISM bandrdquo in Pro-ceedings of the IEEE Wireless Communications and NetworkingConference (WCNC rsquo13) pp 118ndash122 IEEE Shanghai ChinaApril 2013

[19] M Ji G Caire and A F Molisch ldquoWireless device-to-devicecaching networks basic principles and system performancerdquoIEEE Journal on Selected Areas in Communications vol 34 no1 pp 176ndash189 2016

[20] H Cai I Koprulu and N B Shroff ldquoExploiting double oppor-tunities for deadline based content propagation in wirelessnetworksrdquo in Proceedings of the 32nd IEEE Conference onComputer Communications (IEEE INFOCOM rsquo13) pp 764ndash772Turin Italy April 2013

[21] AAsadi P Jacko andVMancuso ldquoModelingmulti-modeD2Dcommunications in LTErdquo httpsarxivorgabs14056689

[22] R S Sutton and A G Barto Reinforcement Learning An Intro-duction MIT Press Cambridge Mass USA 1998

[23] S M Perlaza H Tembine and S Lasaulce ldquoHow can ignorantbut patient cognitive terminals learn their strategy and utilityrdquoin Proceedings of the IEEE 11th InternationalWorkshop on SignalProcessing Advances in Wireless Communications (SPAWC rsquo10)June 2010

[24] Y Xing and R Chandramouli ldquoStochastic learning solutionfor distributed discrete power control game in wireless datanetworksrdquo IEEEACM Transactions on Networking vol 16 no4 pp 932ndash944 2008

[25] L Rose S Lasaulce S M Perlaza and M Debbah ldquoLearningequilibria with partial information in decentralized wirelessnetworksrdquo IEEE Communications Magazine vol 49 no 8 pp136ndash142 2011

[26] Y Xu Q Wu L Shen J Wang and A Anpalagan ldquoOppor-tunistic spectrum access with spatial reuse graphical game anduncoupled learning solutionsrdquo IEEE Transactions on WirelessCommunications vol 12 no 10 pp 4814ndash4826 2013

[27] Y Xu Q Wu J Wang L Shen and A Anpalagan ldquoOppor-tunistic spectrum access using partially overlapping channelsgraphical game and uncoupled learningrdquo IEEE Transactions onCommunications vol 61 no 9 pp 3906ndash3918 2013

[28] D Kalathil N Nayyar and R Jain ldquoDecentralized learningfor multiplayer multiarmed banditsrdquo IEEE Transactions onInformation Theory vol 60 no 4 pp 2331ndash2345 2014

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 18: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

18 Mobile Information Systems

[29] S Maghsudi and S Stanczak ldquoChannel selection for network-assisted D2D communication via no-regret bandit learningwith calibrated forecastingrdquo IEEE Transactions on WirelessCommunications vol 14 no 3 pp 1309ndash1322 2015

[30] A Asheralieva and Y Miyanaga ldquoAn autonomous learning-based algorithm for joint channel and power level selection byD2D pairs in heterogeneous cellular networksrdquo IEEE Transac-tions on Communications vol 64 no 9 pp 3996ndash4012 2016

[31] 3rd Generation Partnership Project ldquoPhysical channels andmodulationrdquo Technical Specification 3GPP TS 36211 V9102010

[32] 3rd Generation Partnership Project Technical SpecificationldquoE-UTRA MAC protocol specificationrdquo 3GPP TS 36211V1250 2015

[33] L Lei Z Zhong C Lin and X Shen ldquoOperator controlleddevice-to-device communications in LTE-advanced networksrdquoIEEE Wireless Communications vol 19 no 3 pp 96ndash104 2012

[34] H Holma and A Toskala LTE for UMTS Evolution to LTE-Advanced John Wiley amp Sons New York NY USA 2011

[35] I F Akyildiz D M Gutierrez-Estevez and E C Reyes ldquoTheevolution to 4G cellular systems LTE-Advancedrdquo PhysicalCommunication vol 3 no 4 pp 217ndash244 2010

[36] Y Xu J Wang Q Wu A Anpalagan and Y-D Yao ldquoOppor-tunistic spectrum access in unknown dynamic environment agame-theoretic stochastic learning solutionrdquo IEEE Transactionson Wireless Communications vol 11 no 4 pp 1380ndash1391 2012

[37] H Li ldquoMulti-agent Q-learning of channel selection in multi-user cognitive radio systems a two by two caserdquo in Proceedingsof the IEEE International Conference on Systems Man andCybernetics (SMC rsquo09) pp 1893ndash1898 October 2009

[38] T Berthold Heuristic Algorithms in Global MINLP SolversVerlag Dr Hut 2014

[39] S Burer and A N Letchford ldquoNon-convex mixed-integer non-linear programming a surveyrdquo Surveys in Operations Researchand Management Science vol 17 no 2 pp 97ndash106 2012

[40] M Fischetti and A Lodi ldquoLocal branchingrdquo MathematicalProgramming vol 98 no 1ndash3 pp 23ndash47 2003

[41] A Lodi ldquoThe heuristic (dark) side of MIP solversrdquo in HybridMetaheuristics vol 434 of Studies in Computational Intelligencepp 273ndash284 Springer Berlin Germany 2013

[42] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming B vol 136 no 2 pp 375ndash402 2012

[43] M Fischetti and D Salvagnin ldquoFeasibility pump 20rdquo Mathe-matical Programming Computation vol 1 no 2-3 pp 201ndash2222009

[44] C DrsquoAmbrosio A Frangioni L Liberti and A Lodi ldquoA stormof feasibility pumps for nonconvex MINLPrdquo MathematicalProgramming vol 136 no 2 pp 375ndash402 2012

[45] S P Boyd and L Vandenberghe Convex Optimization Cam-bridge University Press 2004

[46] S Leyffer ldquoIntegrating SQP and branch-and-bound for mixedinteger nonlinear programmingrdquo Computational Optimizationand Applications vol 18 no 3 pp 295ndash309 2001

[47] R Sun M Hong and Z-Q Luo ldquoJoint downlink base stationassociation and power control for max-min fairness com-putation and complexityrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 6 pp 1040ndash1054 2015

[48] Q Kuang W Utschick and A Dotzler ldquoOptimal joint userassociation and resource allocation in heterogeneous networksvia sparsity pursuitrdquo httpsarxivorgabs14085091

[49] K Shen andW Yu ldquoDistributed pricing-based user associationfor downlink heterogeneous cellular networksrdquo IEEE Journal onSelected Areas in Communications vol 32 no 6 pp 1100ndash11132014

[50] M Peng X Xie Q Hu J Zhang and H V Poor ldquoContract-based interference coordination in heterogeneous cloud radioaccess networksrdquo IEEE Journal on Selected Areas in Communi-cations vol 33 no 6 pp 1140ndash1153 2015

[51] D Fooladivanda and C Rosenberg ldquoJoint resource allocationand user association for heterogeneous wireless cellular net-worksrdquo IEEE Transactions on Wireless Communications vol 12no 1 pp 248ndash257 2013

[52] M Sanjabi M Razaviyayn and Z-Q Luo ldquoOptimal jointbase station assignment and beamforming for heterogeneousnetworksrdquo IEEE Transactions on Signal Processing vol 62 no8 pp 1950ndash1961 2014

[53] Q Han B Yang X Wang K Ma C Chen and X GuanldquoHierarchical-game-based uplink power control in femtocellnetworksrdquo IEEE Transactions on Vehicular Technology vol 63no 6 pp 2819ndash2835 2014

[54] IEEE Standards Association IEEE 802 Local andMetropolitanArea Network Standards IEEE 80211 Standard 2012

[55] OPNET httpwwwopnetcom[56] ldquoEvolved Universal Terrestrial Radio Access (E-UTRA) and

Evolved Universal Terrestrial Radio Access Network (E-UTRAN)rdquo 3GPP TS 36300 version V940 2010

[57] M Bennis S Guruacharya and D Niyato ldquoDistributed learn-ing strategies for interferencemitigation in femtocell networksrdquoin Proceedings of the IEEE Global Telecommunications Confer-ence (GLOBECOM rsquo11) pp 1ndash5 Houston Tex USA December2011

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 19: Research Article Dynamic Resource Allocation with Integrated Reinforcement Learning ...downloads.hindawi.com/journals/misy/2016/4565203.pdf · 2019-07-30 · Research Article Dynamic

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014