smart vehicular communication via 5g...

Smart Vehicular Communication via 5G mmWaves

Xiaotong Lia, Ruiting Zhoub,!, Ying-Jun Angela Zhangc, Lei Jiaod, Zongpeng Lia

aSchool of Computer Science, Wuhan University, Wuhan, ChinabKey Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education,

School of Cyber Science and Engineering, Wuhan University, Wuhan, ChinacDepartment of Information Engineering, the Chinese University of Hong Kong, Shatin, New Territories, Hong Kong

dDepartment of Computer and Information Science, University of Oregon, Eugene, OR, USA

Abstract

Millimeter-Wave (mmWave) communication is conceived as a viable approach for 5G vehicular communicationsystems, where vehicles are equipped with more sensors that generate Gbps data for future autonomous driving.However, such directional mmWave communication relies on accurate beam alignment and is sensitive toblockage. Dense deployment of mmWave base stations (mmBSs) and high mobility of vehicles also lead tofrequent handovers and complex beam alignment calculation. 5G mmWave vehicular communication calls fora smart and stable solution. To this end, we propose an online learning scheme to address the problem of beamselection with blockage-free guarantee in 5G mmWave vehicular networks. We first model this problem asa contextual combinatorial multi-armed bandit (MAB) problem with QoS constraints and delayed feedback.Next, we propose an online learning algorithm, BPG, to predict beam directions, with provable sub-linearregret and blockage-free bounds. BPG exploits the context space and learns the expected weight of each beamfrom arrived vehicles’ contexts and the delayed feedback. To validate the e!ciency of BPG, we also conducttrace-driven simulations based on real-world tra!c patterns. Simulation results show that BPG achievesclose-to-optimal throughput with low violation and outperforms other benchmark algorithms.

1. Introduction

Vehicle-to-everything (V2X) represents a criticalarchitecture for vehicular communication, paving theway for the upcoming fully autonomous driving era.Future intelligent transportation systems emphasizethe need for communication links with high band-width, low latency and high reliability. For exam-ple, cooperative collision avoidance and high-densityplatooning leverage enormous Gbps of data gener-ated from sensors [1], which requires new levels ofcommunication reliability and bandwidth; V2X aug-mented reality transmits large amounts of real-timevideo for navigation systems [2], which challenges ex-isting crowded sub-6GHz bands. Since lower bandsare heavily congested with Wi-Fi, Bluetooth and cur-

!Corresponding authorEmail addresses: [email protected] (Xiaotong Li),

[email protected] (Ruiting Zhou),[email protected]. (Ying-Jun Angela Zhang),[email protected] (Lei Jiao), [email protected](Zongpeng Li)

rent 4G LTE signals, mmWaves (30G-300GHz) withsignificantly larger bandwidth have become a viablecandidate to power 5G networks. However, mmWavecommunication is still challenging in terms of severepropagation loss and sensitivity to blockage [3]. Tocompensate for these impairments, beamforming (BF)is adopted as an essential technique, utilizing direc-tional beams for mmWave communications. Densedeployment of mmWave base stations (mmBSs) bringseven more potential for wide coverage. All the devel-opments have triggered the interest in developing ahigh reliability and low latency 5G mmWave vehicu-lar communication system.

However, supporting high mobility vehicular net-works via mmWaves still faces severe challenges. First,directional mmWave communication between mmBSsand vehicles requires accurate beam alignments to en-sure successful data transmission. Di"erent beamsneed to be chosen to deal with di"erent tra!c pat-tern. Second, mmWaves are sensitive to blockagesdue to weak di"raction ability [3]. Surrounding build-ings, moving vehicles and passengers constitute pos-

@ 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

sible obstacles during communication. Once block-age happens, the communication link is interrupted,which severely harms vehicular communication. Inorder to establish stable communication links, it isextremely important to take blockage into consider-ation. Thus, a minimum threshold which guaranteesthe average blockage-free probability needs to be met.Therefore, mmBSs should balance between maximiz-ing the throughput and avoiding blockages as muchas possible to better guarantee system performance.Third, given that mmWave has high propagation lossand poor penetration, the e"ective communicationrange of a mmBS remains around 100m at best [4].5G mobile communication no longer relies on the de-ployment structure of large base stations, and a largenumber of small base stations will become a newtrend, which can cover the peripheral communicationthat cannot be touched by large base stations. Vehi-cles with high-mobility in a dense mmWave deploy-ments need to frequently hand over between neigh-bouring mmBSs (from the source mmBS to the targetmmBS). Frequent handovers bring about high beamrealignment overhead and decision latency for the tar-get mmBS. Therefore, we come up with an idea thatthe source mmBS can make beam alignment predic-tions ahead for the target mmBS according to currenttra!c pattern. As a result, smart beam predictionmethods, which can autonomously adapt to the real-time tra!c situations and blockages with low com-plexity and latency, are in urgent need.

A fundamental problem for the source mmBS ex-ists. How to select beams for the target mmBS suchthat the throughput is maximized and the average block-age free probability satisfies the minimum threshold?To e"ectively handle handovers and keep the vehic-ular communication stable, we propose a smart ve-hicular communication scheme to predict the beamdirections of the target mmBS, from the perspectiveof the source mmBS.

The basic idea of our method is that the sourcemmBS (which is communicating with vehicles cur-rently) infers the beam directions for target mmBSin advance according to current tra!c pattern. Wecarefully design an online beam prediction algorithmwith performance guarantee, BPG, leveraging the ba-sic idea of contextual multi-armed bandit. Di"er-ent from existing beam selection algorithms whicheither neglect the system QoS requirement [5][6] orhardly deal with the handover situations [7][8], ouralgorithm, BPG, shows better performance especially

in the following aspects: (i) as an online learningmethod, BPG autonomously explores and exploitsthe search space, adapts to the environment and im-proves its prediction strategy based on the informa-tion learned from past choices. It can reduce thedecision latency caused by frequent handovers andconverges to near-optimal solutions faster than otherstate-of-the-art beam selection methods; (ii) BPG si-multaneously and separately learns the best beam se-lection strategy for di"erent types of vehicles, thus itcan deal with dynamic tra!c patterns; (iii) BPG es-pecially focuses on the communication stability re-quirement of the vehicular communication system,and strikes a good balance between maximizing thetotal throughput and minimizing the violation of av-erage blockage-free guarantee; (iv) to eliminate theimpact of delayed feedback, BPG transfers the orig-inal problem into a non-delayed setting by splittingtime slots into consecutive and overlapped segments.The technical contributions of our algorithm BPG aresummarized as follows:

First, we model the beam prediction problemfor the target mmBS as an online learning problemunder the 5G network framework. We propose anew contextual-combinatorial MAB framework, tak-ing blockage-free requirements and delayed feedbackinto consideration. The proposed policy, BPG, suc-cessfully balances between maximizing the through-put and controlling the violation of blockage-free con-straints. It can also make specific beam predictionsunder di"erent tra!c patterns and blockage situa-tions. Our model can be adapted to various con-texts and can be extended to fit more complicatedconstrained settings. Di"erent from existing beamselection policies aiming to maximize the overall ag-gregated received data, our algorithm further tries tosatisfy the minimum blockage-free guarantee thresh-old to keep vehicular communication stable.

Second, we tackle three main challenges when de-signing BPG: (i) we leverage the theory of Lagrangianmethod in constrained optimization and carefully uti-lize an adjustable penalty coe!cient to balance be-tween maximizing the throughput and lowering theviolation; (ii) we partition the context space into sub-hypercubes based on the natural assumption that ve-hicles with similar contexts tend to reveal the samefeedback. Each sub-hypercube maintains a set ofweights for each beam, which can be used to calcu-late the expected weight under inferred tra!c pattern.We obtain the selection probabilities based on the ex-

2

pected weights and use a dependent rounding algo-rithm to conduct beam predictions; (iii) by splittingthe entire time span into consecutive overlapped timesegments, the delayed problem can be transformedinto a standard non-delayed setting.

Third, through rigorous theoretical analysis, weobtain sub-linear bounds for both regret and viola-tion by taking blockage constraints into considera-tion. The sub-linear bounds suggest that our onlinelearning algorithm makes asymptotically optimal pre-dictions. We further conduct extensive simulationstudies to verify the e"ectiveness of BPG by usingreal-world trace of taxi cabs in Rome [9]. The re-sults show that BPG achieves 95% of the optimalthroughput and always generates 25%! 70% less vi-olations than other benchmark algorithms. In sum-mary, BPG significantly outperforms other strategiesand always shows better system performance in termsof maximizing the throughput while maintaining thelowest performance violation in vehicular communi-cation systems.

In the rest of the paper, related work is reviewedin Sec. 2. We present the system model in Sec. 3. Thedetailed online beam prediction algorithm and thetheoretical analysis are presented in Sec. 4. Sec. 5 isthe simulation study and Sec. 6 concludes the paper.

2. Related Work

Beam Selection Problems. Traditional beam se-lection solutions rely heavily on accurate localizationinformation and complex transceiver chain[10][11][12],resulting in high overhead and latency. Several learn-ing-based architectures have been proposed recentlyfor beam selection. Ali et al. [13] leverage out-of-band side information, such as similarity extractedfrom sub-6 GHz and mmWave, to help beam training.Va et al. [7] make beam alignment using a risk-awareonline learning approach, which utilizes sensor dataabout position for beam selections. Wang et al. [5]leverage a situation-aware machine learning methodto obtain beam information. Asadi et al. [14] proposean online learning method to solve the environment-aware beam-selection problem. However, the aboveliterature fails to deal with requirements of systemperformance guarantee, and only focuses on maxi-mizing the system throughput. Therefore, the abovemethods may lead to great violation of mmWave com-munication requirements, which severely harms thesystem performance in the long term.

Due to the high mobility of vehicles, the afore-mentioned methods can barely handle handover situ-ations. Alkhateeb et al. [15] leverage a deep learningbased beamforming method to enable highly-mobilevehicular systems. Mavromatis et al. [16] proposea MAC-layer based smart motion-prediction beamalignment algorithm. Given the CSI of source BS ob-tained previously, Chen et al. [6] leverage a sequence-to-sequence neural network to make beam predictionsat target BS. Di"erent from the above literature, ourbeam prediction framework can adapt to dynamictra!c systems. This is because our algorithm, BPG,can autonomously and separately learns beam predic-tion strategy for di"erent types of vehicles from theenvironment each time.Multi-armed bandit (MAB) optimization. MABis a classic online learning method to address sequen-tial decision making problems under partial feedback.The basic MAB focuses on learning how to choose thebest single arm among a finite set of arms to maxi-mize the total reward, without considering any con-straints. Our bandit formulation in BPG is related tocombinatorial-contextual bandits with delayed feed-back. Especially, combinatorial bandits allows multi-ple plays each round [17][18], while contextual banditsconsiders context-dependent rewards [19][20]. Mulleret al. [21] propose a bandit model combining con-textual bandits together with combinatorial bandits.Nue et al. [22] show a multiplicative regret for theadversarial bandit with delayed feedback. Joulani etal. [23] provide a systematic algorithm framework tohandle delayed feedback for di"erent online learningalgorithms. Chen et al. [24], rather recently, combinethese concerns together.

However, the above algorithms are all formulatedwithout considering communication stability. Whenapplying MAB to realistic applications, communica-tion QoS constraints cannot be neglected. Therefore,constrained bandits becomes another concern. Mah-davi et al. [25] design an e!cient online learning al-gorithm under stochastic constraints. This algorithmleverages the theory of Lagrangian method in con-strained optimization and takes both the regret andviolation into consideration. Cai et al. [26] extend theformer algorithm to the multi-play setting and pro-pose a MAB framework with multi-level rewards forseveral network applications. Inspired but di"erentfrom the literature above, our algorithm BPG extendsthe constrained combinatorial MAB into contextualand delayed settings, which is more suitable for vehic-

3

ular communication system requirements, since we doneed to consider vehicles with di"erent types to ad-just beam predictions. Our algorithm achieves near-to-optimal performance when applied in vehicular com-munication system, make a good balance betweenmaximizing the aggregated received data and satis-fying the system performance guarantee.

3. Problem Model

We consider a scenario where multiple mmBSsdensely overlay the coverage area of a 5G NR gNB(New Radio gNodeB). The mmBSs are connected tothe gNB via a backhaul link, and each mmBS’s infor-mation can be learned from the gNB. Vehicles withhigh mobility arrive from time to time, coming acrossmultiple coverage areas of neighbouring mmBSs. Eachvehicle is equipped with two types of interfaces forcommunication: an NR interface connected to thegNB and an mmWave interface for ultra-low latencydirectional mmWave communication with mmBSs.Limited by physical constraints such as RF chains,mmBSs can simultaneously transmit a limited num-ber of beams. Both the gNB and the mmBSs have noprior knowledge of their surroundings, the only infor-mation they have access to is vehicles’ contexts (i.e.,directions and speeds) upon arrival. In this work, wefocus on the downlink model, while the solution canalso be adapted to the uplink situation, except thatthe mmBS needs to identify the owner of each signalit receives. Another problem in the uplink scenario iscollision avoidance. If more than one user transmitsin the same beam, then interference occurs. To avoidcollision or to cancel interference is beyond the scopeof the current paper.

Once the gNB detects that a group of vehicles aregoing to hand over from one mm-cell to another, itinforms the source mmBS the target mmBS’s beaminformation and current vehicle contexts. The sourcemmBS analyzes the future tra!c pattern accordingto current vehicle contexts, and predicts the mostpromising beam directions for the target mmBS. Inthis way, the performance of target mmBS can beguaranteed especially during peak hours. After choos-ing specific beams according to the predictions, thetarget mmBS receives feedback from vehicles and sendsit back to the source mmBS with a fixed delay (onetime slot).

As shown in Fig. 1, the red car is going to handover from the source mmBS to the target mmBS. The

mmBSmmBS coverage boundarymmWave beamblockage

Predictions

Source mmBS

Target mmBS

Figure 1: System model.

source mmBS sends predicted beam directions to tar-get mmBS according to the red car’s current context.After choosing the suggested beams, target mmBSsends the feedback to source mmBS in the next timeslot.

To better describe our problem, we assume thatthe target mmBS has M available distinct orthogonalbeams, but can only simultaneously select m beams(1 " m " M) each round due to physical constraints.Let [X] denote the integer set {1, 2, . . . , X}. Duringsystem life span T , let It denote the set of selectedbeams at time t # [T ], i.e., It $ [M ], |It| = m. Letpt = (pt1, ..., p

ti, ..., p

tM ) be chosen probability vector for

target mmBS’s beams at time t. Here, pti refers tothe probability of selecting beam i at t. We use a de-pendent rounding technique [27] to choose m beamsin time slot t, i.e., 1!pt = m, where 1 = (1, . . . , 1).There are total Nt vehicles arriving at time t. Eachvehicle j # [Nt] registers to the mmBS via the gNB,coming with its own context xj,t. Let !t denote thecontext set received from vehicles at time t, i.e., !t =

{{xj,t}j=1,...,Nt}, where each element xj,t is a vector

with dX dimensions taken from the bounded contextspace ! = [0, 1]dX .

Suppose that each beam i is associated with anunknown random process V i

x(t), %t # [T ], which char-acterizes achievable throughput (without blockage)at each time slot t for each context x # !. V i

x(t)is not necessarily stationary but bounded across i.Let vti,x be the realization of V i

x(t), which denotes thethroughput of beam i at time t under context x. Dueto the property that mmWave signals can be easilya"ected by static or moving blockages, only beamswithout blockage can successfully transmit data to ve-

4

hicles. However, mmBSs have no knowledge of theirsurroundings (the pattern of arrivals and the prob-ability of existence of blockage), thus for each beami, it is also combined with an unknown random pro-cess, U i

x(t), %t # [T ], which characterizes blockage-free probability with context x at time t. Let uti,x be

the realization of U ix(t). Given that the environment

is relatively stable, the tra!c pattern is regular inthe long run. We make an assumption that U i

x(t) isstationary. Suppose that U i

x(t) is stationary and inde-pendent across i with unknown mean ui,x = E[U i

x(t)],ux = {u1,x, ..., uM,x}. In order to guarantee the sys-tem performance, there exists a minimum blockage-free threshold " > 0. For each context x # !t, theaverage of the sum of blockage-free probability needsto be above this threshold, i.e., E[u!

xpt] > ". Wenormalize U i

x(t) # [0, 1] and V ix(t) # [0, 1] and as-

sume that they are independent of each other. There-fore, the successful throughput at time t with contextx is characterized by a compound throughput vectorgtx = {gt1,x, ..., gtM,x}, where each compound through-put gti,x of a beam i at time t is generated by the ran-dom process Gi

x(t) = U ix(t)V

ix(t), thus gti,x = ut

i,xvti,x.

At time t, the expected total compound throughputis E[

!

x!!tgt!x pt]. Since the source mmBS can only

observes the performance of predictions at the end ofthe next time slot, one-slot delay feedback exists.

Specifically, the o#ine optimization problem isformulated into a linear program (LP) as follows:

maximize"

t![T ]

"

x!!t

"

i![M ]

gti,xpti (1)

subject to:"

i![M ]

pti = m, %t # [T ], (1a)

"

i![M ]

uti,xp

ti > ", %t # [T ], %x # !t, (1b)

pti # [0, 1], %i # [M ], t # [T ]. (1c)

Note that in the online setting, LP (1) is non-trivial to solve since feedback can be observed onlywhen beams are selected, which means the value ofuti,x and vti,x are observed one time-slot later. Ourmodel can be seen as an online learning problem, inwhich data becomes available in a sequential orderand is used to update the decision policy for futuredata during each time slot.

The fundamental problem for each source mmBSis to learn how to e!ciently predict a subset of m

beams for target mmBS, taking vehicles’ contexts,performance constraints and delayed feedback intoconsideration. Therefore, we formulate the beam pre-diction problem into a contextual-combinatorial MABproblem with performance constraints and delayedfeedback. Selecting beams with large throughput helpsto achieve better network performance, and choosingblockage-free beams helps to maintain stable vehicu-lar communication.

We consider the optimal solution of LP (1) as anoracle, in which we know all the information in ad-vance and make the best selection each round fromGod’s perspective. However, learning a policy tomaximize the compound reward is challenging in re-alistic settings without full knowledge in prior. Wechange our goal into designing a beam prediction pol-icy #, which updates the beam chosen vectors pt fortarget mmBS, such that the regret, which is referredto a loss compared to the Oracle, is as small as pos-sible. The regret for a learning policy # is defined asfollows:

R(T ) = maxu

!

xpt>"

"

t![T ]

"

x!!t

gt!x pt ! E[

"

t![T ]

"

x!!t

gt!x p!

t ]. (2)

where pt denotes the optimal beam selection vectorwhile p!t denote the beam prediction decision madeby our policy #. In order to achieve close-to-oraclereward, the source mmBS needs to balance betweenexploration and exploitation in each time slot to learnfrom each choice to improve its policy. Note that dur-ing early time exploration, source mmBS might makebeam predictions that violate the blockage constraint,since it has little knowledge of its surroundings. Ifthe source mmBS violates the constraint during ex-ploration, a low throughput or even zero throughputmay be obtained. To make sure that the total vi-olation of constraints after T time slots as small aspossible, we define the violation as follows:

V (T ) = E["

t![T ]

"

x!!t

("! u!

xp!t )]

+, (3)

where [x]+ = max{0, x}. A lower regret means that #gets closer to the Oracle. A smaller violation meansthat # becomes better in satisfying the constraint astime t increases. We next design an algorithm inorder to balance between the regret and the violation.

4. An Online Learning Algorithm

In this section, we design an online learning al-gorithm BPG to predict beam directions. We first

5

Table 1: Notation

M # of beams T # of time slotsm # of selected beams X integer set {1, 2, ...X}Nt # of vehicles at time t It beam set selected at tdX # of context dimensions PT # of sub-hypercubesnT # of parts each dimension can be divided intopti chosen probability of beam i at txj,t vehicle j’s context at t!t context set received from vehicles at time t

V ix(t) random process of beam i at t

which characterizes varying throughputU ix(t) random process of beam i at t

which characterizes blockage-free probabilityGi

x(t) = U ix(t)V

ix(t), random process of compound throughput

vix(t) throughput of beam i at tuix(t) blockage-free probability of beam i at tgix(t) compound throughput of beam i at t" minimum blockage-free threshold

wti,h weight for beam i in sub-hypercube h at t

htj sub-hypercube which vehicular j belongs to at t

Ht set of sub-hypercubes at t!t collection of arrival distributions of hypercube h at t$th the percentage of contexts which belong

to hypercube h at tSht set of weights which need to be reduced

in sub-hypercube h at t%ht Lagrange multiplier in sub-hypercube h at t

present several algorithm design challenges and pro-pose corresponding solutions in Sec. 4.1, and we elab-orate the details of the algorithm design in Sec. 4.2.Finally, we carry out rigorous theoretical analysis inSec. 4.3.

4.1. Algorithm Design Challenges

Towards designing the beam prediction policy, akey challenge exists. How to balance between max-imizing the aggregated received data and satisfyingthe performance constraint? Inspired by [25][26], weleverage the theory of Lagrangian method in con-strained optimization. Our target is to minimize amodified regret function with the component of vio-lation and an adjustable Lagrange multiplier %(T ). Ifthe constraint is substantially violated, our algorithmplaces more weight on the Lagrange multiplier %(T );it lowers the weight when the constraint is satisfiedreasonably. Our algorithm introduces a sub-linearbound for the modified regret function as follows:

R(T ) + %(T )(V (T ))2 " T 1"#, 0 < & < 1. (4)

Since !R(T ) " O(mT ) for any policy #, we derivea bound for R(T ) and V (T ):

R(T ) " O(T 1"#), V (T ) "#

O(T 1"# +mT )/%(T ), (5)

We can achieve sub-linear bounds for both R(T ) andV (T ) if we properly adjust %(T ).

Considering the basic idea of our algorithm is tochoose m beams according to the chosen probabil-ity vector pt, the second challenge is how to updateeach time slot. Our solution is to calculate the chosenprobabilities by maintaining a set of M weights forM beams. Due to the fact that vehicles with di"er-ent contexts tend to have di"erent beam preference,even the same beam’s weight varies when facing dif-ferent contexts. How to set and update the weightsfor M beams each time becomes another challenge.A straightforward approach to tackle this problem isto maintain separate weights for di"erent contexts.However, considering the large amount of contexts,this method incurs high computation complexity. Inorder to deal with the large context space, we proposetwo basic assumptions: (i) vehicles with similar con-textual information reveal similar feedback under thesame condition; (ii) vehicles with di"erent contextsprefer di"erent sets of beams. Under these assump-tions, similarities between contexts can be exploitedonline for future predictions.

Our algorithm BPG partitions the large contextspace into uniform sub-hypercubes, each representinga certain type of vehicles. Each sub-hypercube in ouralgorithm maintains a set of weights for M beams,which can be used to compute the chosen probabil-ities under specific context and will be modified ineach time slot according to hitherto feedback. Theaccuracy of weights for each sub-hypercube increasesas time elapses. However, since we need to take thecombination of all the arrival contexts into consid-eration, we propose an expected combination hyper-cube, which maintains a set of expected weights for Mbeams. The expected weights are calculated by cur-rent context distribution (tra!c pattern) and corre-sponding sub-hypercubes’s weights (see Fig. 3). Thisis easy to understand, since the more a certain typeof vehicles come, the more biased it is to choose pre-ferred beam directions for that type. Based on theexpected weights, our policy calculates the proba-bilistic distribution pt for M arms. According to pt,BPG selects m beams from M beams using the de-pendent rounding method. Then the feedback of cho-sen beams are used to modify each sub-hypercube’sweights. Finally, we use calculated weights to com-pute each beam’s chosen probability.

Traditional MAB problems receive feedback (re-ward) at the current time slot. In contrast, in our

6

beam prediction problem, dealing with the delayedfeedback becomes the last challenge. Rewards andviolations can be observed only one time slot later,which will influence the analysis of the regret boundand violation bound. Nonetheless, we can transformthe delayed setting into non-delayed setting by com-bining two continuous time slots into one compoundslot. We split T time slot sequence into &K(T )' con-secutive overlapped segments {[t#n, t#n+2)}

$K(T )%n=1 ,n # N+,

and K(T ) = t2

3+dX log(t), &K(T )' = n for any t #[t#n, t

#n+2). In this way, this problem can be seen as a

standard non-delayed problem, where we make beampredictions at t"n and receive feedback at the end oft"n+1, n # &K(T )'.

4.2. Algorithm Design

The general idea of our algorithm BPG is as fol-lows: First, we partition the context space into uni-form sets of similar contexts (sub-hypercubes). Ineach time period, BPG observes the arriving vehi-cles’ contexts and classifies them into di"erent sub-hypercubes. The algorithm then calculates the ex-pected weights according to the context distributionand corresponding sub-hypercube’s weights. The cho-sen probability vector pt is computed based on theexpected weights. Finally, BPG updates each sub-hypercube’s weights and Lagrange multiplier accord-ing to the delayed feedback.

Context dimension 1

Co

ntex

t d

imen

sio

n 2

Contex

t

dimen

sion

3

Central point

Contextual information point

n =2T

The selected sub-hypercube

Figure 2: Context Partion Model.

The details of BPG are shown in Alg. 1. In itsinitialize phase, BPG partitions the contextual space! = [0, 1]dX into (nT )dX hypercubes. Each sub-hypercubehas an identical size 1

(nT )dX, where nT is an input con-

stant which determines the number of parts each di-mension can be divided into (see Fig. 2). During eachtime slot, we first observe the number of arriving vehi-cles, Nt, and their contextual information {xj,t}j!Nt .

For each context xj,t, BPG determines which hyper-cube it belongs to. As shown in Fig. 3, we willuse the center point xh to represent all context in-formation points in each sub-hypercube h # PT forcomputation. Let Ht = {hj,t}j!Nt denote the collec-tion of hypercubes at time t. Since we need to choosethe beams given the combination of all these con-texts, we also observe the arrival distribution of each

sub-hypercubes !t = {$th}h!Ht , where $th =|!

j:{xj,t!h} |Nt

denotes the percentage of contexts which belong tohypercube h. (Lines 4-5). For each sub-hypercubeh # PT , BPG maintains a set of weights for M beamsrespectively, i.e., wt

h = {wt1,h, ..., w

tM,h}, which are ini-

tialized into 1 at the beginning.

Weights for beam i of sub-hypercube n

wi,2wi,1 wi,3

Arriving distributionof each sub-hypercube

t t t

Ũ1t Ũ2

t Ũ3t

× ××

wi,1t Ũ1

t wi,2t Ũ2

t wi,3t Ũ3

t＋＋

w it

Expected weight

Figure 3: Expected Weight Model.

Since we need to choose m beams for the com-bination of these vehicles with di"erent contexts to-gether, we transform the set of hypercubes into oneexpected hypercube. As can be seen in Fig. 3, theexpected weights for M beams are calculated follow-ing wt

i =!

h!Ht$thw

ti,h. Let wt = {wt

1, wt2, ..., w

tM} de-

note the set of expected weights for M arms at timet, which is used to calculate the chosen probabilityvector pt. At line 17, we show the similar idea toExp3 [28] for exploration and exploitation by using(1!')wt

i/!M

i=1 wti for exploitation and '/M for explo-

ration. Lines 6- 14 guarantee that the probabilitiesin pt are less than or equal to 1.

At line 19, we utilize a dependent rounding al-gorithm (Alg. 2) to update pt until 1!pt = m, pti #{0, 1}. Since the feedback of predictions can onlybe obtained after one time slot, we obtain each ve-hicle’s feedback ut#1

i,hj,tand vt#1

i,hj,tat time t(t ( 2).

For each sub-hypercube, we calculate the unbiased

7

estimates of uti,h and gti,h for each arm i # M atline 24, where ut

i,xh = (!

j:xj,t!h uti,xj,t

)/!

j:xj,t!h ·1 andvti,xh = (

!

j:xj,t!h vti,xj,t

)/!

j:xj,t!h ·1. It is easy to ver-ify that E[ut

i,h] = uti,xh and E[gti ] = ut

i,xhvti,xh . Let(K) be the indicator function: (K) = 1 if the event

K happens and 0 otherwise. Finally, for each sub-hypercube h, the weight vector {wt

i,h}i$M and the

Lagrange multiplier %ht are updated using the esti-

mations at the end of each iteration as shown in lines26-27.

Algorithm 1 Online Beam Direction Predictionwith Performance Guarantee: BPG1: Initialize context partition: Create partition PT

of context space into (nT )dX hypercubes of iden-tical size

2: Initialize w1h = 1,%1 = 0, " > 0,( = ( 1

m ! $M )(1 !

'), ) = $%m(%+m)M , * = 4(e"2)$m

1"$ , Sht = 0, It = 0

3: for t = 1, 2, ..., T do4: Observe vehicle contexts set !t = {xj,t}j=1,...,Nt

5: Find Ht = {hj,t}j=1,...,Nt such that xj,t # hj,t #PT , observe !t = {$th}h!Ht

6: for each sub-hypercube h # Ht do7: if maxi!M wt

i,h ( (!M

i=1 wti,h then

8: Find +h,t such that

+h,t/!M

i=1,wti,h&&h,t

+h,t +!M

i=1,wti,h'&h,t

wti,h = (

9: Sht = {i : wt

i,h ( +h,t}10: end if11: for i = 1, ...,M do12: wt

i,h = +h,t if i # Sht , otherwise, w

ti,h = wt

i,h13: end for14: end for15: The expected weight wt

i =!

h!Ht$thw

ti,h

16: for i = 1, ...,M do17: pti = m[(1! ')wt

i/!M

i=1 wti + '/M ]

18: end for19: It = DRA(m, pt)20: if t ( 2 then21: Receive vehicles’ rewards ut"1 and vt"1

22: for each hypercube h # Ht"1 do23: for each beam i # M do24: ut"1

i,h = ut"1i,xh/p

t"1i (i # It"1),

gt"1i,h = (ut"1

i,xhvt"1i,xh)/p

t"1i (i # It"1)

25: end for

26: wt+1i,h =

$

wti,h i # Sh

t

wti,h ) exp()(gt"1

i,h + %tut"1i,h )) i /# Sh

t

27: %ht+1 = [(1! *))%h

t ! )(ut"1

h pt"1

1"$ ! ")]+

28: end for29: end if30: end for

Algorithm 2 Dependent Rounding Algorithm:DRA1: Initialize It = !

2: while exist i * pti # (0, 1) do3: Randomly select two jobs i1, i2, i1 += i2, pi1pi2 #

(0, 1)

4: Define $1 ! min{1! pti1 , pti2}

5: Define $2 ! min{pti1 , 1! pti2}6: With probability !2

!1+!2, set

7: pti1 = pti1 +$1, pti2 = pti2 !$1

8: With probability !1

!1+!2, set

9: pti1 = pti1 !$2, pti2 = pti2 +$2

10: end while11: return It = {i # [M ] : pti = 1}

4.3. Regret Analysis

The theorem below shows that both the regretand violation of BPG are sub-linear in the time hori-zon T , i.e., limT()

R(T )T = 0 and limT()

V (T )T = 0.

It guarantees that the algorithm converges to the op-timal beam prediction policy over time, and has anear-to-optimal performance when T is large enough.The regret and violation of BPG can be bounded asfollows:

Theorem 1. Let ) = $%m(%+m)M , nT = &T

19dX ' and ' =

min(1,%

2(e"2)M+Mmmln(M/m)T 1/2 ). By running the policy #, we

achieve sub-linear bounds for both the regret and vio-lation as follows:

RD(T ) " O(Ld!2

XmM ln(M)T56" !

6dX )

VD(T ) " O(L12 d

!4

Xm12M

12T

56 ). (6)

Proof. We first consider the regret and violation boundin one hypercube h # PT , without considering the in-fluence of delayed feedback, and then obtain its sum-mation over the number of sub-hypercubes (nT )dX .Finally, we discuss the influence of the bound due todelayed feedback.

Before considering the upper-bound in each hy-percube, we use the Lipschitz condition to define thedissimilarity between di"erent contexts:

Assumption 1. There exists constant L > 0 suchthat for all context xi, xj # !, we have DX(xi, xj) "L||xi ! xj ||&, where || " || denoted the Euclidian normin RdX .

Note that the Lipschitz constants L and the sim-ilarity degree parameter + are not required to beknown by our prediction algorithms. They will only

8

be used when quantifying BPG’ performance and bothof them will appear in our regret and violation bounds.

Lemma 1. For each sub-hypercube h, let rth = gth +

%ht u

th, where gt

h, uth # RM

+ , wht+1 = wh

t · exp()rht ). Letpt be an arbitrary probabilistic selection vector whichsatisfied pti # [0, 1], i # [M ],1!pt = m,ut!

h pt ( ". Let pt

denote the chosen probability vector of the predictionpolicy # at time t, we can get the following inequality:

E[T"

t=1

rt!h pt !1

1! '

T"

t=1

rt!h pt]

"m

)ln

M

m+

2(e! 2))M

1! 'T +

2(e! 2))M

1! '

T"

t=1

(%ht )

2. (7)

Proof. Let %ht+1 = [(1 ! *))%h

t ! )((ut"1

h )!pt"1i

1"$ ! ")]+ "[(1 ! *))%h

t + )"]+, by induction on %ht , we can get

%ht " "

% . we first show an upper-bound and a lower

bound of!T

t=1 ln(Wh

t+1

Wht) for the sequence of selected It

at t = 1, ..., T .

T"

t=1

ln(Wh

t

Wht

) = ln(Wh

t

W1) = ln

M"

i=1

wT+1i,h ! lnM

( lnM"

i=1

ptiwT+1i,h ! lnM (

M"

i=1

ptim

T"

t=1

)rti,h ! lnM

m

=)

m

M"

i=1

pti"

t:i/!St

rti,h ! lnM

m. (8)

Since ) = $%m(%+m)M , %h

t " "% , we have )rti,h " 1.

According to the fact that ex " 1 + x + (e ! 2)x2 forx " 1, we have

Wht

Wht

="

i!M/Sht

wt+1i,h

Wht

+"

i!Sht

wt+1i,h

Wht

="

i!M/Sht

wti,h exp()r

ti,h)

Wht

+"

i!Sht

wti,h

Wht

""

i!M/Sht

wti,h

Wht

(1 + )rti,h + (e! 2)()rti,h)2) +

"

i!Sht

wti,h

Wht

=Wh

t

Wht

("

i!M/Sht

wti,h

Wht

+"

i!Sht

wti,h

Wht

) +Wh

t

Wht

"

i!Sht

wti,h

Wht

)rti,h

(9)

+Wh

t

Wht

(e! 2)()rti,h)2

"1 +1

m(1! ')

"

i!M/Sht

)rti,h +e! 2

m(1! ')

"

i!M/Sht

)2(rti,h)2.

Due to the fact that ln(1 + x) " x and pti rti,h =

rti,h " 1 + %ht , we can transform the above inequality

as follows:

lnWh

t+1

Wht

" )

m(1! ')

"

i!M/Sht

pti rti,h+

(e!2))2

m(1!')

"

i!M/Sht

(1+%ht )r

ti,h.

Simultaneously summing t on both sides of theabove inequality and combining with (8), we have

)

m

M"

i=1

pti"

t:i/!St

rti,h ! lnM

m

"T"

t=1

()

m(1! ')

"

i!M/St

pti rti,h +

(e! 2))2

m(1! ')

"

i!M/St

(1 + %t)rti,h).

(10)

Since pti = 1 for i # Sht $ It, we have

!Mi=1 p

ti

!

t:i!Shtrti,h

" 11"$

!Tt=1

!

i!Shtrti,h. Combining this inequality with

(10) and multiplying both sides with m' , we can get

T"

t=1

(rth)!pt !

m

)ln

M

m

"!T

t=1(rth)

!pt1! '

+(e! 2))

1! '

T"

t=1

M"

i=1

(1 + %ht )r

ti,h.

Taking expectation on both sides, we can get

E[T"

t=1

(rth)!pt !

1

1! '

T"

t=1

(rth)!pt]

"m

)ln

M

m+

(e! 2))

1! 'E[

T"

t=1

M"

i=1

(1 + %ht %

ht )r

ti,h]

"m

)ln

M

m+

(e! 2))

1! 'E[

T"

t=1

M"

i=1

(1 + %ht )(g

ti,h + %h

t uti,h)]

"m

)ln

M

m+

2(e! 2))M

1! 'T +

2(e! 2))M

1! '

T"

t=1

(%ht )

2,

where the last inequality holds due to the factthat gti , u

ti " 1 and (x + y)2 " 2x2 + 2y2, which means

E[!M

i=1(1+%ht )(g

ti+%h

t uti)] " M(1+%h

t )2 " 2M+2M(%h

t )2.

Lemma 2. In each sub-hypercube h, let fh,t(%) =%2%

2+%( (uth)

!pt

1"$ ), %ht+1 = [%h

t !),fh,t(%ht )]+, and %1 = 0.

Assuming ) = $%m%+m , (ut

h)!pt

1"$ ( ", we can get

E[*

2

T"

t=1

((%ht )

2 ! %2) +T"

t=1

(%ht ! %)(

(uth)

!pt

1! '! ")]

"%2

2)+ (m2 +

)mM

(1! ')2))T. (11)

9

Proof. %ht+1 = [%h

t !),fh,t(%ht )]+ = [(1!*))%h

t !)( (uth)

!pt

1"$ !")]+ " [(1! *))%h

t + )"]+. For arbitrary %, we have

(%ht+1 ! %) = [%h

t ! )(*%ht +

(uth)

!pt

1! '! ")! %]2

"(%ht ! %)2 + ()(*%h

t ! ") + )(ut

h)!pt

1! ')2

! 2(%ht ! %)(),fh,t(%

ht ))

"(%ht ! %)2 + 2)2"2 + 2)2(

(uth)

!pt

1! ')+2)(fh,t(%)! fh,t(%

ht )).

Then, by rearranging the terms we get

fh,t(%ht )! fh,t%

" 1

2)[(%h

t+1 ! %)2 ! (%ht ! %)2] + )("2 + (

(uth)

!pt

1! ')2)

" 1

2)[(%h

t+1 ! %)2!(%ht ! %)2]+)m2+

)m2

(1! ')2m

M"

i=1

(ptiuti,h)

2

" 1

2)[(%h

t+1 ! %)2!(%ht ! %)2] + )m2 +

)m

(1! ')2

M"

i=1

uti,h.

Taking expectation over!T

t=1[fh,t%ht !fh,t(%)], then

Lemma 2 holds.

Applying rth = gth + %h

t uth to (7), and combining

(7) and (11) together givesT"

t=1

(gth)

!pt !1

1! 'E[

T"

t=1

(gth)

!pt] + E[%T"

t=1

("! (gth)

!pt

1! ')]

! E[(*T

2+

1

2))%2]

"m

)ln

M

m+

2(e! 2))MT

1! '+ )m2T +

)mMT

(1! ')2

+(2(e! 2))M

1! '! *

2)

T"

t=1

(%ht )

2 + E[T"

t=1

%ht ("!

(uth)

!pt

1! ')].

Since ) = $%m(%+m)M and * = 4(e"2)$m

1"$ ( 4(e"2)$m1"$ !m,

thus we have 2(e"2)'M1"$ ! %

2 " 0. Also, (uth)

!pt

1"$ ( ".Multiplying both sides with (1! '), we have

(1! ')T"

t=1

(gth)

!pt ! E[T"

t=1

(gth)

!pt]! E[(*T

2+

1

2))%2]

+ E[%T"

t=1

((1! ')"! (gth)

!pt)]

"(1! ')T"

t=1

(gth)

!pt!E[T"

t=1

(gth)

!pt]!(1! ')E[(*T

2+

1

2))%2]

+ E[%T"

t=1

(1! ')("! (gth)

!pt)]

"(1! ')m

)ln

M

m+2(e! 2))MT+(1! '))m2T+

)mMT

(1! ')

"m

)ln

M

m+ 2(e! 2))MT + )m2T +

)mMT

(1! ').

Let % =!T

t=1((1"$)""(uth)

!pt)%T+1/' . By taking maximiza-

tion over pt, we have

max(ut

h)!pt&"

T"

t=1

(gth)

!pt ! E[T"

t=1

(gth)

!pt]

+ E{[!T

t=1((1! ')"! (uth)

!pt)]+2

2(*T + 1))}

"m

)ln

M

m+ 2(e! 2))MT + )m2T +

)mMT

(1! '). (12)

The above inequation (12) is the upper-bound forRxh(T )+%xh(T )(Vxh(T ))2 (without considering the im-pact of delay) when we substitute all the points insub-hypercube h with the center point xh. Then weconsider the Context Gap, which is the di"erence be-tween the original point xj,t # h and center pointxh in context sub-hypercube h # Ht. We utilizemax{DX(xht

j1, xhtj2)} = max{L||xj1 ! xj2||&} = L(

*dX

nT)&

to denote the deviation in the context sub-hypercubeh # Ht, and let nT = &T (' " 2T (, 0 < , < 1

dX, thus we

can get the upper-bound of Rh(T )+%h(T )(Vh(T ))2 foreach sub-hypercube h # PT (without delay):

maxu

!

t pt&"

T"

t=1

g!

t pt!E[T"

t=1

g!

t pt]+E{[!T

t=1((1! ')"! u!

t pt)+]2

2(*T + 1))}

" L(

-dXnT

)&(m

)ln

M

m+ 2(e! 2))MT + )m2T +

)mMT

(1! '))

" Ld!2

XT"&([m(* +m)M

'*mln

M

m

+'*m

(* +m)M(2(e! 2)MT +m2T +

mMT

(1! '))].

Summing both sides over (nT )dX sub-hypercubes,we can derive the bound of the modified regret func-tion in the form of (4) without delay, B(T ):

max(ut

h)!pt&"

T"

t=1

"

h!Ht

g!

t pt ! E[T"

t=1

"

h!Ht

g!

t pt]

+ E{[!T

t=1

!

h!Ht((1! ')"! u!

t pt)]+2

2(*T + 1))}

"2dXLd!2

XT (dX"&)([m(* +m)M

'*mln

M

m

+'*m

(* +m)M(2(e! 2)MT +m2T +

mMT

(1! '))]. (13)

The performance of BPG with non-delayed feed-back has been discussed above. However, the non-delayed feedback assumption can be easily violatedin application since the source mmBS can only ob-serve the feedback of predictions after one time slot.Therefore, we analyze the performance of BPG withdelayed feedback. Since we split the time slot se-quence {1, 2, ..., T} into &K(T )' consecutive segments

10

{[t#n, t#n+2)}$K(T )%n=1 , n # N+. Such that for any t # [t#n, t

#n+2),

we have &K(t)' = n, K(T ) = t2

3+dX log(t). That is, werun in parallel d+1 (in our model d = 1) instances ofthe BPG for the standard (no delay) setting, and ateach time step t = 2r+ s, s # {0, 1}, for each r = 1, 2, ..,use instance s + 1 for the current play. Hence, thenon-delayed bound applies to every new compoundslot and we obtain BD(T ) "

!21 B(T2 ), where BD(T )

denotes the bound of (4) in delayed setting. There-fore, we obtain BD(T ) as follows:

maxu

!

t pt&"

T"

t=1

"

h!Ht

g!

t pt ! E[T"

t=1

"

h!Ht

g!

t pt]

+ E{[!T

t=1

!

h!Ht((1! ')"! u!

t pt)]+2

2(*T + 1))}

"2

"

n=1

2dXLd!2

X(T

2)(dX"&)([

m(* +m)M

'*mln

M

m

+'*m

(* +m)M(2(e! 2)M(

T

2) +m2(

T

2) +

mM

(1! ')

T

2)]

= 2&(+dX(1"()Ld!2

X [2m(* +m)M

'*mln

M

mT (dX"&)(

+'*m

(* +m)M

&

2(e! 2)M +m2 +mM

(1! ')

'

T 1+(dX"&)(].

Let G(T ) = 2&(+dX(1"()Ld!2

X [2m(%+m)M$%m ln M

m T (dX"&)(

+ $%m(%+m)M

&

2(e! 2)M +m2 + mM(1"$)

'

T 1+(dX"&)(]. Let , =1

9(dX) , we have results in the form of (5): R(T ) " G(T )

and V (T ) "#

2(G(T ) +mT )(*T + 1/)). Since ' =

min(1,%

2(e"2)M+Mmm ln(M/m)T 2/3 ) = %(T" 1

3 ) and * = 4(e"2)$m1"$ =

%(T" 13 ), we have ) = %( 1

M T" 23 ). Finally, we have

sub-linear bounds for the regret and the violation:R(T ) " O(Ld

!2

XmM ln(M)T56" !

6dX ),

V (T ) " O(L12 d

!4

Xm12M

12T

56 ).

5. Performance Evaluation

Simulation Settings. Our simulation study is con-ducted based on real-world mobility traces of taxicabs in Rome [9]. We observe the tra!c patternof a specific street (see Fig. 4), where there are asource mmBS and a target mmBS. We set the dis-tance between them to be 100m [4]. We consider twokinds of blockages in our simulations: static block-ages and moving blockages. Static blockages are thebuildings and trees along the road; Moving blockagesare the randomly appeared objects with large size,i.e., trunks and buses. To better simulate the movingblockage scenario, we randomly select some of thesetaxi cabs to be trucks and buses. We set the vehicles’

speeds between 40 km/h and 80 km/h. In our sim-ulation, a time slot is defined as the maximal timein which a vehicle is within the coverage of sourcemmBS, and we set t = 10s here.

target mmBSsource mmBS

Figure 4: The map of the simulation environment, where thebuildings are static blockages and trucks are moving blockages.

We normalize the throughput and blockage-freeprobability to [0,1]. Even for the same beam, thethroughput and blockage-free probability varies whenfacing di"erent types of vehicles. Follow the simi-lar settings in [14][4], we assume the target mmBShas 16 orthogonal beams, each beam’s width variesfrom 10% to 40% for omni-directional coverage. Weset vehicles’ beam width to 30%. The number of si-multaneously selected beams m is 4 by default. Weset transmission power to be 30 dBm, and systembandwidth to be 1GHz. The noise figure of mmBSis 4 dB, and 7dB for vehicles. The thermal noise is-174dBm/Hz. The path loss model, which is scaledin dB, is 32.4 + 17.3log10d(m) + 20log10(fc(GHz)) + $, $

" N (0, *), * = 1.1dB [29].Algorithms for comparison. We compare our on-line algorithm BPG with four benchmark algorithms:

• Optimum: We assume that this algorithm hasa prior knowledge of the whole system. Thenit can make the best beam predictions sinceit knows the blockage situation and expectedbeam performance in advance.

• FML: We modify the Fast Machine Learningalgorithm [14] to fit our prediction setting. Themodified FML aims to maximize the total com-pound throughput, i.e.,

!

t![T ]

!

j!Nt

!

i![M ] gti,xj,t

,and it receives the feedback with one slot delay.

• Adapted-UCB (AUCB): This is the variant ofthe classic bandit algorithm UCB [30]. The al-gorithm maintains a set of UCB bonus, which is

11

1000 2000 3000 4000 5000

Training Time slot

30

35

40

45

50

55

Pe

r-tim

e-s

lot

Re

ceiv

ed

Da

ta

Optimum

BPG

FML

AUCB

RAN

Figure 5: Prediction accuracy of five algo-rithms after several times of training.

1000 2000 3000 4000 5000

Training time slot

0.8

1

1.2

1.4

Pe

r-tim

e-s

lot

Vio

latio

n

BPGFMLAUCBRAN

Figure 6: Violation of four algorithms afterseveral times of training.

1 1.5 2 2.5 3

Training time slot

0

50

100

150

200

Perf

orm

ance

BPG

FML

AUCB

RAN

Figure 7: Performance of four algorithmsunder di!erent QoS weights.

2 2.5 3 3.5 4

Performance Guarantee Threshold

0

200

400

600

800

1000

1200

1400

Vio

lation

BPGFMLAUCBRAN

Figure 8: Violation under di!erent value of!.

2 2.5 3 3.5 4

Performance Guarantee Threshold

-6000

-5000

-4000

-3000

-2000

-1000

0

1000

2000

3000

Regre

t

BPGFMLAUCBRAN

Figure 9: Regret under di!erent value of !.

2 4 6 80.5

1

1.5

2

2.5

3

3.5

4

4.5

5

To

tal R

ece

ive

d D

ata

104

OptimumBPGFMLAUCBRAN

2 4 6 8

Number of Selected Beams m

0

100

200

300

400

500

600

700

800

Vio

latio

n

BPGFMLAUCBRAN

Figure 10: Impact of the number of selectedbeams (m).

used to characterize each beam’s performance.In each round, source mmBS predicts the top-m beams with the highest UCB indices gti +#

1.5 log(t)/(Ni(t)). Ni(t) is the number of timesthat beam i has been selected before t! 1.

• Ran: This algorithm randomly selects m beamsfor target mmBS in each round.

Performance Analysis. We collect the vehicu-lar system information within 5000 time slots. Wecompute the per-time-slot optimal beam predictionsolution accordingly. We utilize the collected sys-tem information as a set of training data and feedthem to four algorithms for training. We analyzethe per-time-slot throughput and the per-time-slotviolation after iterations of training (see Fig. 5 andFig. 6). We utilize the optimal solutions as standardsto verify the other four algorithms’ performance. Asseen in Fig. 5, the accuracy of BPG improves withthe increase of training time. During the first 1500time slots, the throughput obtained by our algorithmis much smaller than the optimum, since BPG maymake bad choices in the exploration phrase. Thethroughput of BPG approaches the optimum aftert = 3500, and we utilize the obtained model for the

following simulations. Note that the throughput ofFML and AUCB are always larger than the value ofoptimum, since these two algorithms merely tend tomaximize the total throughput without consideringthe system QoS constraints, leading to great viola-tion in the long term.

Fig. 6 shows that our algorithm’s violation is thesmallest, which means BPG’s prediction shows thebest QoS performance and guarantees the stabilityof vehicular communication. To better understandthe performance of each algorithm under vehicularcommunication system, we first define the QoS ofthe system as QoS = 1

V iolation . We then use TotalReceived Data · (QoS weight · QoS) to evaluate thealgorithm performance. As shown in Fig. 7, our algo-rithm has the best performance under di"erent QoSweights. Thus, It keeps a good balance between alarge throughput and a low violation. With the in-crease of QoS weight, the gap between BPG and otheralgorithms becomes larger.

We then investigate the impact of the performanceguarantee threshold " on the regret and violation ofeach algorithm. In Fig. 8 and Fig. 9, we can see thatwhen we increase the value of ", all the algorithms’violations and regrets increase. However, our algo-

12

0 a.m 4 a.m 8 a.m 12 p.m 4 p.m 8 p.m 12 a.m

Time

0

5

10

15

20

25

30

35

40

45

Nu

mb

er

of

ve

hic

les f

or

ea

ch

typ

e

Type1Type2Type3

Figure 11: Tra"c pattern for 24 hours atthe chosen street in Rome [9].

0 a.m 4 a.m 8 a.m 12 p.m 4 p.m 8 p.m 12 a.m

Time

0

20

40

60

80

100

120

140

Tota

l Rece

ived D

ata

BPGFMLAUCBRAN

Figure 12: Throughput under dynamictra"c pattern for 24 hours.

0 a.m 4 a.m 8 a.m 12 p.m 4 p.m 8 p.m 12 a.m

Time

0

20

40

60

80

100

Vio

lation

BPGFMLAUCBRAN

Figure 13: Violation of QoS under dynamictra"c pattern for 24 hours.

rithm BPG keeps the lowest regret level. We cansee from Fig. 10, when the number of selected beamsincreases, each algorithm’s throughput increases too.This is due to the increased coverage areas of the sys-tem. Also, if the value of m increases, the violation ofeach algorithm decreases to zero since it gets easier tosatisfy the minimum performance guarantee. The dif-ference between BPG and FML, AUCB gets smallerdue to the easily satisfied performance requirement.

We plot the dynamic tra!c pattern of one specificstreet for 24 hours in Rome based on the real-worlddata source [9] in Fig. 11, containing three types ofvehicles. Under this tra!c pattern, we compare BPGwith three other benchmark algorithms. As shown inFig. 12 and Fig. 13, the throughput of BPG, FMLand AUCB are nearly the same, however, our algo-rithm BPG always has the lowest violation. There-fore, BPG shows the best performance since we strikea good balance between maximizing the throughputwhile keeping a lowest violation.

4 a.m 8 p.m 12 p.m 4 p.m 8 p.m 12 p.m

Time

0

5

10

15

20

25

30

35

40

45

Nu

mb

er

of

veh

icle

s fo

r e

ach

typ

e Type1

Type2

Type3

Type4

Figure 14: Tra!fic pattern with congestion.

In order to deal with specific conditions such astra!c congestion and speeds starting from 0 km/h,we make tra!c congestion assumptions based on the

realistic congestion situations in Rome [31][32]. Wedivide the vehicles into four categories according totheir speeds. That is, we classify vehicles with speedsof 60-80km/h as type-1, 40-60km/h as type-2, 20-40km/h as type-3 and 0-20km/h as type-4. The num-ber of type-4 vehicles reflects the tra!c congestionsituations. Especially at rush hours during a day, allvehicles’ speeds are slow and the number of type-4cars increases. The new tra!c pattern are shown inFig. 14. As can be seen in Fig. 15 and Fig. 16,BPG almost receives the largest communication dataand the lowest violation even with tra!c congestion.Therefore, our algorithm BPG still shows the bestperformance under this tra!c pattern.

4 a.m 8 a.m 12 p.m 4 p.m 8 p.m 12 a.m

Time

0

10

20

30

40

50

60

70

To

tal R

ece

ive

d D

ata

BPG

FML

AUCB

RAN

Figure 15: Throughput under tra"c pattern with congestion.

As for the computation overhead, we compare therunning time of our algorithm BPG with others. Weapply the tic and toc functions in MATLAB to mea-sure the running time. We run 20 tests on MacBookAir (I1.4GHz inter Core i5/4GB /1600 MHz DDR3)and present the average result in Table 2, where TSstands for time consumption. Although BPG con-sumes longer time to compute each round, the dif-ference is not significant. Considering the near-to-

13

4 a.m 8 a.m 12 p.m 4 p.m 8 p.m 12 a.m

Time

0

10

20

30

40

50

60V

iola

tion

BPG

FML

AUCB

RAN

Figure 16: Violation under tra"c pattern with congestion.

optimal solution obtained by BPG, our algorithm stillshows a good performance compared with others.

Table 2: Algorithm Overhead.

Alg Optimal BPG FML UCB RANTS (ms) 1.401 0.018 0.012 0.006 0.002

6. Concluding Remarks

This work proposes an online learning algorithmthat makes close-to-optimal beam selection for a tar-get mmBS. Our algorithm, BPG, makes predictionsunder the consideration of vehicle contexts and sys-tem QoS constraints. We aim to strike a balancebetween maximizing the total throughput and satis-fying the minimum performance guarantee. Throughvalid theoretical analysis, we prove that our onlinelearning method achieves sub-linear bounds for re-gret and violation. We also verify our algorithm’sgood performance as compared to other benchmarkalgorithms through simulation studies.

We have mainly focused on the situation that thetarget mmBS selects beams completely according tothe predictions from the source mmBS. Our modelcan be extended to more complex settings, where theprediction policy is utilized to guide the beam se-lection strategy of the target mmBS. Our predictionpolicy will reduce the training overhead and increasethe selection accuracy. It can also reduce the beamrearrangement latency, which contributes to betterperformance of vehicular communication. Note thatour beam prediction policy can be adapted to handleadditional constraints in more realistic settings: (i)non-orthogonality can be formulated as an additionalconstraint, i.e., overlapping beams cannot be used si-multaneously; (ii) reflection from surrounding objects

can be seen as an interference, thus we can add aconstraint to keep the interference under a threshold.We will continue studying the multi-level constraintsvehicular communication model in our future work.

References

[1] M. R. Hafner, D. Cunningham, L. Caminiti, D. Del Vec-chio, Cooperative collision avoidance at intersections: Al-gorithms and experiments., IEEE trans. intelligent trans-portation systems 14 (3) (2013) 1162–1175.

[2] Leading the world to 5G: C-V2X technologies,https://dwz.cn/tAuj7Fp7.

[3] Y. Niu, Y. Li, D. Jin, L. Su, A. V. Vasilakos, A survey ofmillimeter wave communications (mmwave) for 5g: oppor-tunities and challenges, Wireless Networks 21 (8) (2015)2657–2676.

[4] D. Yuan, H.-Y. Lin, J. Widmer, M. Hollick, Optimal jointrouting and scheduling in millimeter-wave cellular net-works, in: Proc. of IEEE INFOCOM, 2018.

[5] Y. Wang, M. Narasimha, R. W. Heath, Mmwave beamprediction with situational awareness: A machine learningapproach, arXiv preprint arXiv:1805.08912 (2018) 1–5.

[6] S. Chen, Z. Jiang, S. Zhou, Z. Niu, Time-sequence channelinference for beam alignment in vehicular networks, arXivpreprint arXiv:1812.01220 (2018).

[7] V. Va, T. Shimizu, G. Bansal, R. W. Heath Jr, Onlinelearning for position-aided millimeter wave beam training,arXiv preprint arXiv:1809.03014 (2018).

[8] G. E. Garcia, G. Seco-Granados, E. Karipidis,H. Wymeersch, Transmitter beam selection in millimeter-wave mimo with in-band position-aiding, IEEE Trans-actions on Wireless Communications (TWC) PP (99)(2017) 1–1.

[9] L. Bracciale, M. Bonola, P. Loreti, G. Bianchi,R. Amici, A. Rabu", CRAWDAD datasetroma/taxi (v. 2014-07-17), Downloaded fromhttps://crawdad.org/roma/taxi/20140717 (Jul. 2014).doi:10.15783/C7QC7M.

[10] P. Kela, M. Costa, J. Turkka, M. Koivisto, J. Werner,A. Hakkarainen, M. Valkama, R. Jantti, K. Leppanen, Lo-cation based beamforming in 5g ultra-dense networks, in:2016 IEEE VTC, 2016.

[11] P. Kela, M. Costa, K. Leppanen, R. Jantti, Location-aware beamformed downlink control channel for ultra-dense networks, in: 2017 IEEE CSCN, 2017.

[12] I. Mavromatis, A. Tassi, R. J. Piechocki, A. Nix, mmwavesystem for future its: A mac-layer approach for v2x beamsteering, in: 2017 IEEE VTC, 2017.

[13] A. Ali, N. Gonzlez-Prelcic, J. R. W. Heath, Millime-ter wave beam-selection using out-of-band spatial infor-mation, IEEE Transactions on Wireless Communications(TWC) 17 (2) (2018) 1038–1052.

[14] A. Asadi, S. Muller, G. H. Sim, A. Klein, M. Hollick, Fml:Fast machine learning for 5g mmwave vehicular commu-nications, in: Proc. of IEEE INFOCOM, 2018.

[15] A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu,D. Tujkovic, Deep learning coordinated beamforming forhighly-mobile millimeter wave systems, arXiv preprintarXiv:1804.10334 (2018).

14

[16] I. Mavromatis, A. Tassi, R. J. Piechocki, A. Nix, Mmwavesystem for future its: A mac-layer approach for v2x beamsteering, in: Proc. of VTC, 2018.

[17] W. Chen, Y. Wang, Y. Yuan, Combinatorial multi-armedbandit: General framework and applications, in: Proc. ofICML, 2013.

[18] J. Komiyama, J. Honda, H. Nakagawa, Optimal re-gret analysis of thompson sampling in stochastic multi-armed bandit problem with multiple plays, arXiv preprintarXiv:1506.00779 (2015).

[19] A. Slivkins, Contextual bandits with similarity informa-tion, Journal of Machine Learning Research 15 (1) (2014)2533–2568.

[20] L. Li, W. Chu, J. Langford, R. E. Schapire, A contextual-bandit approach to personalized news article recommen-dation, in: Proc. of ACM WWW, 2010.

[21] S. Mller, O. Atan, M. V. D. Schaar, A. Klein, Context-aware proactive content caching with service di!erentia-tion in wireless networks, IEEE Transactions on WirelessCommunications (TWC) 16 (2) (2017) 1024–1036.

[22] G. Neu, A. Antos, A. Gyorgy, C. Szepesvari, Onlinemarkov decision processes under bandit feedback, in:Proc. of NIPS, 2010.

[23] P. Joulani, A. Gyorgy, C. Szepesvari, Online learning un-der delayed feedback, in: Proc. of ICML, 2013.

[24] L. Chen, J. Xu, Task o#oading and replication for vehic-ular cloud computing: A multi-armed bandit approach,arXiv preprint arXiv:1812.04575 (2018).

[25] M. Mahdavi, T. Yang, R. Jin, Online decision making un-der stochastic constraints, in: Proc. of NIPS workshop onDiscrete Optimization in Machine Learning, 2012.

[26] K. Cai, X. Liu, Y. Z. J. Chen, J. C. Lui, An online learningapproach to network application optimization with guar-antee, in: Proc. of IEEE INFOCOM, 2018.

[27] R. Gandhi, S. Khuller, S. Parthasarathy, A. Srinivasan,Dependent rounding and its applications to approximationalgorithms, Journal of the ACM (JACM) 53 (3) (2006)324–360.

[28] P. Auer, N. Cesa-Bianchi, Y. Freund, R. E. Schapire, Thenonstochastic multiarmed bandit problem, SIAM journalon computing 32 (1) (2002) 48–77.

[29] 3GPP, Technical specification group radio access network:Study on channel model for frequencies from 0.5 to 100ghz, in: Tech. Rep., Mar, 2017.

[30] P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysisof the multiarmed bandit problem, Machine learning 47 (2-3) (2002) 235–256.

[31] Rush hour in Rome, https://dwz.cn/ZGycfEfM.[32] Real-time tra"c information in Rome,

https://dwz.cn/wVxJCIbi.

15

smart vehicular communication via 5g...

Documents