3d massive mimo and arti cial intelligence for next generation … · for 3d massive mimo...

3D Massive MIMO and Artificial Intelligence for Next Generation

Wireless Networks

Rubayet Shafin

Dissertation submitted to the Faculty of the

Virginia Polytechnic Institute and State University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

in

Electrical Engineering

Lingjia Liu, Chair

Jeffrey H. Reed

Harpreet S. Dhillon

Yang (Cindy) Yi

Zhenyu ”James” Kong

February 25, 2020

Blacksburg, Virginia

Keywords: 3D Massive MIMO, Channel Estimaiton, Machine Learning for Wireless

Copyright 2020, Rubayet Shafin


Wireless Networks

Rubayet Shafin

(ABSTRACT)

3-dimensional (3D) massive multiple-input-multiple-output (MIMO)/full dimensional (FD)

MIMO and application of artificial intelligence are two main driving forces for next generation

wireless systems. This dissertation focuses on aspects of channel estimation and precoding

for 3D massive MIMO systems and application of deep reinforcement learning (DRL) for

MIMO broadcast beam synthesis. To be specific, downlink (DL) precoding and power allo-

cation strategies are identified for a time-division-duplex (TDD) multi-cell multi-user massive

FD-MIMO network. Utilizing channel reciprocity, DL channel state information (CSI) feed-

back is eliminated and the DL multi-user MIMO precoding is linked to the uplink (UL)

direction of arrival (DoA) estimation through estimation of signal parameters via rotational

invariance technique (ESPRIT). Assuming non-orthogonal/non-ideal spreading sequences of

the UL pilots, the performance of the UL DoA estimation is analytically characterized and

the characterized DoA estimation error is incorporated into the corresponding DL precoding

and power allocation strategy. Simulation results verify the accuracy of our analytical char-

acterization of the DoA estimation and demonstrate that the introduced multi-user MIMO

precoding and power allocation strategy outperforms existing zero-forcing based massive

MIMO strategies.

In 3D massive MIMO systems, especially in TDD mode, a base station (BS) relies on the

uplink sounding signals from mobile stations to obtain the spatial information for downlink

MIMO processing. Accordingly, multi-dimensional parameter estimation of MIMO channel

becomes crucial for such systems to realize the predicted capacity gains. In this work, we also

study the joint estimation of elevation and azimuth angles as well as the delay parameters

for 3D massive MIMO orthogonal frequency division multiplexing (OFDM) systems under

a parametric channel modeling. We introduce a matrix-based joint parameter estimation

method, and analytically characterize its performance for massive MIMO OFDM systems.

Results show that antenna array configuration at the BS plays a critical role in determining

the underlying channel estimation performance, and the characterized MSEs match well with

the simulated ones. Also, the joint parametric channel estimation outperforms the MMSE-

based channel estimation in terms of the correlation between the estimated channel and the

real channel.

Beamforming in MIMO systems is one of the key technologies for modern wireless commu-

nication. Creating wide common beams are essential for enhancing the coverage of cellular

network and for improving the broadcast operation for control signals. However, in order to

maximize the coverage, patterns for broadcast beams need to be adapted based on the users’

movement over time. In this dissertation, we present a MIMO broadcast beam optimization

framework using deep reinforcement learning. Our proposed solution can autonomously and

dynamically adapt the MIMO broadcast beam parameters based on user’ distribution in the

network. Extensive simulation results show that the introduced algorithm can achieve the

optimal coverage, and converge to the oracle solution for both single cell and multiple cell

environment and for both periodic and Markov mobility patterns.


Wireless Networks

Rubayet Shafin

(GENERAL AUDIENCE ABSTRACT)

Multiple-input-multiple-output (MIMO) is a technology where a transmitter with multi-

ple antennas communicates with one or multipe receivers having multiple antennas. 3-

dimensional (3D) massive MIMO is a recently developed technology where a base station

(BS) or cell tower with a large number of antennas placed in a two dimensional array com-

municates with hundreds of user terminals simultaneously. 3D massive MIMO/full dimen-

sional (FD) MIMO and application of artificial intelligence are two main driving forces for

next generation wireless systems. This dissertation focuses on aspects of channel estimation

and precoding for 3D massive MIMO systems and application of deep reinforcement learn-

ing (DRL) for MIMO broadcast beam synthesis. To be specific, downlink (DL) precoding

and power allocation strategies are identified for a time-division-duplex (TDD) multi-cell

multi-user massive FD-MIMO network. Utilizing channel reciprocity, DL channel state in-

formation (CSI) feedback is eliminated and the DL multi-user MIMO precoding is linked

to the uplink (UL) direction of arrival (DoA) estimation through estimation of signal pa-

rameters via rotational invariance technique (ESPRIT). Assuming non-orthogonal/non-ideal

spreading sequences of the UL pilots, the performance of the UL DoA estimation is analyt-

ically characterized and the characterized DoA estimation error is incorporated into the

corresponding DL precoding and power allocation strategy. Simulation results verify the

accuracy of our analytical characterization of the DoA estimation and demonstrate that the

introduced multi-user MIMO precoding and power allocation strategy outperforms existing

zero-forcing based massive MIMO strategies.

In 3D massive MIMO systems, especially in TDD mode, a BS relies on the uplink sounding

signals from mobile stations to obtain the spatial information for downlink MIMO process-

ing. Accordingly, multi-dimensional parameter estimation of MIMO channel becomes crucial

for such systems to realize the predicted capacity gains. In this work, we also study the joint

estimation of elevation and azimuth angles as well as the delay parameters for 3D massive

MIMO orthogonal frequency division multiplexing (OFDM) systems under a parametric

channel modeling. We introduce a matrix-based joint parameter estimation method, and

analytically characterize its performance for massive MIMO OFDM systems. Results show

that antenna array configuration at the BS plays a critical role in determining the underlying

channel estimation performance, and the characterized MSEs match well with the simulated

ones. Also, the joint parametric channel estimation outperforms the MMSE-based channel

estimation in terms of the correlation between the estimated channel and the real channel.

Beamforming in MIMO systems is one of the key technologies for modern wireless commu-

nication. Creating wide common beams are essential for enhancing the coverage of cellular

network and for improving the broadcast operation for control signals. However, in order to

maximize the coverage, patterns for broadcast beams need to be adapted based on the users’

movement over time. In this dissertation, we present a MIMO broadcast beam optimization

framework using deep reinforcement learning. Our proposed solution can autonomously and

dynamically adapt the MIMO broadcast beam parameters based on user’ distribution in the

network. Extensive simulation results show that the introduced algorithm can achieve the

optimal coverage, and converge to the oracle solution for both single cell and multiple cell

environment and for both periodic and Markov mobility patterns.

To my parents, my sister, and my brother

vi

Acknowledgments

First and foremost, I owe my deepest gratitude to my advisor, Dr. Lingjia Liu for his

unwavering support during my PhD studies. I would like to sincerely thank him for always

believing in me. His tremendous patience and the time and efforts he dedicated for me made

my PhD journey extremely rewarding and a joyful experience. His supervision and priceless

advices helped me grow not only as a researcher but also as a human being. I am also

thankful to my PhD dissertation committee members—Dr. Jeff Reed, Dr. Harpreet Dhillon,

Dr. Yang Yi, and Dr. James Kong for their valuable comments and suggestion that help me

improve the quality of this dissertation. I am grateful to all my lab members for their help

and support. Special thanks to Hao Chen, who helped me a lot in settling down during my

early years of PhD studies. I have always tried to mimic his great work ethics and dedication

for research. I am also thankful to all my friends in Wireless@VT for their support during

my PhD. Last, but not the least, I am grateful to my parents, my brother, and my sister for

their encouragement and unconditional love. Without their constant support, I could have

never been able to finish this dissertation.

vii

Contents

List of Figures xii

List of Tables xvi

1 Introduction 1

1.1 Massive MIMO as an Enabling Technology . . . . . . . . . . . . . . . . . . . 1

1.2 AI – The New Wireless Frontier . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Multi-Cell Multi-User Massive FD-MIMO 12

2.1 System and Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Uplink Channel Characterization Through DoA Estimation . . . . . . . . . . 15

2.2.1 UL DoA Estimation through Unitary ESPRIT . . . . . . . . . . . . . 15

2.2.2 RMSE Characterization . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.3 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

viii

2.3 Downlink Precoding and Achievable Rate Analysis . . . . . . . . . . . . . . 29

2.3.1 Optimum Precoding for Sum-rate Maximization . . . . . . . . . . . . 29

2.3.2 Large-Antenna System Analysis . . . . . . . . . . . . . . . . . . . . . 32

2.3.3 Precoding Complexity Analysis . . . . . . . . . . . . . . . . . . . . . 37

2.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3 Joint Parameter Estimation for 3D Massive MIMO 47

3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.2 Parameter Estimation Framework . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2.1 Joint Angle and Delay Estimation Using Standard ESPRIT . . . . . 50

3.2.2 Parameter Pairing and Channel Gains Estimation . . . . . . . . . . . 52

3.3 RMSE Characterization of the Joint Angle-Delay Estimation . . . . . . . . . 54

3.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 Superimposed Pilot for Massive FD-MIMO Systems 62

4.1 Motivation and Literature Review . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2 System and Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.3 Uplink Channel Estimation and Performance Characterization . . . . . . . . 68

4.3.1 Uplink DoA Estimation using Unitary ESPRIT . . . . . . . . . . . . 68

4.3.2 RMSE Characterization . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.4 Achievable Rate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

ix

4.4.1 Uplink Rate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.4.2 Optimum Downlink Precoding . . . . . . . . . . . . . . . . . . . . . . 81

4.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.6 Summary of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5 MIMO Broadcast-Beam Optimization Through DRL 106

5.1 Network Model and Problem Statement . . . . . . . . . . . . . . . . . . . . . 106

5.2 Learning Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.2.1 Beam Learning Framework . . . . . . . . . . . . . . . . . . . . . . . . 110

5.2.2 Offline Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.3 DRL for Broadcast Beam Optimization . . . . . . . . . . . . . . . . . . . . . 114

5.3.1 Background of DRL . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.3.2 Broadcast beam optimization for dynamic environment . . . . . . . . 119

5.4 Simulation Results and Performance Analysis . . . . . . . . . . . . . . . . . 123

5.4.1 Results for single sector dynamic environment: . . . . . . . . . . . . 123

5.4.2 Results for multiple sector dynamic environment: . . . . . . . . . . . 130

5.4.3 Multi-sectors environment with Markovian mobility pattern . . . . . 135

5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6 Conclusion 139

Appendices 141

x

Appendix A Proofs for Chapter 2, Chapter 3 142

A.1 Proof of Theorem 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142


A.3 Proof of Theorem 2.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149








Bibliography 162

xi

List of Figures

2.1 Network Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Elevation Angle Estimation for 64 Antennas. . . . . . . . . . . . . . . . . . . 39

2.3 Azimuth Angle Estimation for 64 Antennas. . . . . . . . . . . . . . . . . . . 40

2.4 Elevation Angle Estimation for 256 Antennas. . . . . . . . . . . . . . . . . . 42

2.5 Azimuth Angle Estimation for 256 Antennas. . . . . . . . . . . . . . . . . . 43

2.6 Average Achievable Sum-Rate Comparison. . . . . . . . . . . . . . . . . . . 44

2.7 Computational Complexity Comparison for DoA Estimation Algorithms. . . 45

2.8 Computational Complexity Comparison for Precoding Methods. . . . . . . . 46

3.1 Performance of Delay Estimation. . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2 Elevation Angle Estimation Performance. . . . . . . . . . . . . . . . . . . . 59

3.3 Azimuth Angle Estimation Performance. . . . . . . . . . . . . . . . . . . . . 60

3.4 Correlation Between Underlying True Channel and Estimated Channel. . . . 61

4.1 Uplink Transmission Phases in Superimposed Pilot System. . . . . . . . . . . 75

xii

4.2 Elevation Angle Estimation for 64 Antennas. . . . . . . . . . . . . . . . . . . 83

4.3 Azimuth Angle Estimation for 64 Antennas. . . . . . . . . . . . . . . . . . . 84

4.4 Angle Estimation for 16× 4 Antenna Array. . . . . . . . . . . . . . . . . . . 86

4.5 Elevation Angle Estimation for 256 Antenna Elements. . . . . . . . . . . . . 87

4.6 Azimuth Angle Estimation for 256 Antenna Elements. . . . . . . . . . . . . 88

4.7 Uplink Rate CDF when δs = 0.1 and δd = 0.9, and SNR= -5 dB. . . . . . . . 89

4.8 Uplink Rate CDF when δs = 0.9 and δd = 0.1, and SNR= -5 dB. . . . . . . . 90

4.9 Uplink Rate CDF when δs = 0.1 and δd = 0.9, and SNR= 20 dB. . . . . . . 92

4.10 Uplink Rate CDF when δs = 0.9 and δd = 0.1, and SNR= 20 dB. . . . . . . 93

4.11 Uplink Rate vs SNR when δs = 0.1 and δd = 0.9 . . . . . . . . . . . . . . . . 95

4.12 Uplink Rate vs SNR when δs = 0.9 and δd = 0.1 . . . . . . . . . . . . . . . . 96

4.13 Downlink Rate when the uplink SNR=20 dB . . . . . . . . . . . . . . . . . . 97

4.14 Downlink Rate when the uplink SNR=5 dB . . . . . . . . . . . . . . . . . . 98

4.15 Total Rate when δs = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100




4.19 Uplink Transmission Phases in Superimposed Pilot System. . . . . . . . . . . 105

5.1 Offline training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

xiii

5.2 Reinforcement Learning Framework for Beam Optimization . . . . . . . . . . 117

5.3 DRL State Representation for Beam Optimization Problem . . . . . . . . . . 117

5.4 Replay Buffer architecture for multiple sector case . . . . . . . . . . . . . . . 121

5.5 Neural Network architecture for multiple sector case . . . . . . . . . . . . . . 122

5.6 Periodic Change in Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.7 Beam pattern corresponding to a typical RL action. . . . . . . . . . . . . . . 128

5.8 Results for periodic mobility pattern in a single sector dynamic environment:

(a) average squared difference (ASD) between reward achieved by DRL agent

and the reward obtained by Oracle; (b) average mismatch (AM) between

actions taken by the DRL agents and the Oracle. . . . . . . . . . . . . . . . 129

5.9 Users’ Distribution Patterns for 2 Scenarios. . . . . . . . . . . . . . . . . . . 130

5.10 Results for periodic mobility pattern in a multiple sector dynamic environ-

ment: (a) average squared difference (ASD) between reward achieved by DRL

agent and the reward obtained by Oracle; (b) average mismatch (AM) between

actions taken by DRL agents for each sector and the corresponding Oracles. 131

5.11 Instantaneous rewards (a) and instantaneous actions (b) at convergence for

multiple sectors environment and periodic user-mobility pattern. . . . . . . . 132

5.12 Results for Global solution for periodic mobility pattern in a multiple sector

dynamic environment: (a) average squared difference (ASD) between reward

achieved by DRL agent and the reward obtained by Oracle; (b) average mis-

match (AM) between actions taken by DRL agents for each sector and the

corresponding Oracles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

xiv

5.13 Results for average squared difference (ASD) in reward between the DRL

agent and the Oracle for periodic mobility pattern in a single sector dynamic

environment. ASDs for different size of action space have been plotted in

figures (a) - (e). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.14 Results for ASD in reward between DRL agent and the Oracle for periodic

mobility pattern in a multiple sector dynamic environment. ASDs for different

size of action space have been plotted in figures (a) and (b). . . . . . . . . . 135

5.15 State Transition Diagram for Markov Mobility. . . . . . . . . . . . . . . . . 136

5.16 Results for Markov mobility pattern in a multiple sector dynamic environment:

(a) average squared difference (ASD) between reward achieved by DRL agent

and the reward obtained by Oracle; (b) average mismatch (AM) between

actions taken by DRL agent for each sector and the corresponding Oracles. . 137

5.17 Instantaneous reward (a) and instanteneous actions (b) at Convergence for

multiple sectors environment and Markov user mobility pattern. . . . . . . . 137

xv

List of Tables

5.1 Notation for System Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.2 Simulation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

xvi

Chapter 1

Introduction

Massive-MIMO or large-scale MIMO, has generated significant interest both in academia [1]

and industry [2]. Because of the promise of fulfilling future throughput demand via aggressive

spatial multiplexing, massive MIMO is considered as one of the key enabling technologies for

next generation wireless networks. On the other hand, artificial intelligence (AI) or machine

learning/deep learning techniques are envisioned as the game changer for future wireless

communication. Hence, both massive MIMO and AI will play critical role in the design of

Beyond 5G and 6G cellular networks.

1.1 Massive MIMO as an Enabling Technology

Due to form factor limitation at the base station (BS), three dimensional (3D) massive-

MIMO/Full Dimension MIMO (FD-MIMO) systems have been introduced in 3GPP to deploy

active antenna elements in a two dimensional (2D) antenna array enabling the exploitation

of the degrees of freedom in both azimuth and elevation domains. Due to the availability

of the huge spectrum in the millimeter wave (mmWave) band, mmWave communication

1

2 Chapter 1. Introduction

is considered as another enabling technology for future cellular networks: 5G and beyond.

However, due to its significantly higher path loss compared to the microwave channel, it

is extremely challenging to establish an effective communication for outdoor channels using

mmWave bands. This challenge can be tackled using beamforming techniques where the base

station serves multiple users with narrower beams. This can be possible if a large number of

antennas are deployed at the base station in order to realize the narrow beams. As a result,

massive MIMO is a natural counterpart for the mmWave cellular network.

Since the benefits of massive MIMO or massive FD-MIMO are limited by the accuracy of

the downlink (DL) channel state information (CSI) available at the base station, it is critical

for the BS to obtain corresponding DL CSI information. In general, the BS can obtain the

CSI knowledge through the following: 1) DL CSI feedback where the the CSI information is

fed back from mobile stations (MSs), and 2) DL/UL channel reciprocity where BS estimates

the uplink (UL) CSI and infers DL CSI information through channel reciprocity. Note that

DL CSI feedback is heavily used in frequency-division-duplex (FDD) systems where only a

few bits of the corresponding DL CSI information are fed back to the BS [3] to achieve a

good tradeoff between DL MIMO performance and UL feedback overhead/reliability. To

utilize the DL/UL channel reciprocity, the critical point becomes estimating the UL channel

at the BS. Based on UL pilots/reference signals sent from MSs, there are generally two

methods to estimate the UL channel. First is to estimate the corresponding channel transfer

function (e.g., UL channel matrix). Alternatively, UL direction of arrival (DoA) can be

estimated at the BS using ESPRIT algorithm [4, 5]. Even though the DoA only provides

partial information on the UL channel, it is shown in [6, 7, 8, 9] that it can be directly

linked to DL MIMO precoding in TDD systems. It is important to note that the DoA

based MIMO precoding strategy has also been introduced to FDD systems demonstrating

significant performance benefits in reality [10].

1.1. Massive MIMO as an Enabling Technology 3

Despite promising better performance, non-linear precoding schemes, such as dirty paper

coding (DPC) or vector perturbation, are not practical for MIMO systems due to its high

implementation complexity. In recent years, simple linear processing techniques have been

shown to offer significant performance gains for multi-user massive MIMO scenarios where

the base stations employ a large number of antennas[1]. Hence, most of the prior works

in massive MIMO literature have focused on maximum ratio transmission (MRT) and zero

forcing (ZF)-based methods for DL MIMO precoding [11, 12]. However, for mmWave mssive

FD-MIMO systems, it is possible to design low-complexity precoders with better performance

than the conventional ZF/MRT-based precoders.

Massive MIMO, with a large number of antennas deployed at the BS, promises a dramatic

increase in spectral efficiency compared to the traditional small-scale MIMO systems, and

is considered as a candidate technology for the next generation cellular networks (5G and

beyond). However, realizing the throughput-gains promised by massive MIMO is contingent

upon the availability of accurate channel state information (CSI) at the BS for downlink

precoding. CSI can be obtained by estimating the transfer function of the channel between

transmitter and receiver. However, because of the large dimensionality, this traditional

transfer function based estimation may not yield expected performance for massive MIMO

systems [13, 14]. Alternatively, channel estimation can be performed by estimating channel

parameters such as Direction of arrivals, delay, and channel gains [15]. When the system is

well-calibrated, the parametric approach of channel estimation can offer great performance

gains over the simple unstructured interpolation approaches [16].

An efficient method for estimating the angles and delays of multiple paths of a known signal

is presented in [17], however, the algorithm-complexity is prohibitively high due to its

iterative nature. [18] derived the analytical results on the performance of standard ESPRIT,

however, the results are associated with distribution of eigenvectors of the sample covariance


matrix. With the assumption of small additive perturbation, [19] provides an explicit first

order expression of the signal subspace. Nevertheless, the authors in [19] only consider

the 1D parameter estimation problem. A rather general framework for MSE analysis was

presented in [20], however, these results are very complicated, and can only be simplified

in the single path case. [6] simplifies the analytical results for 3D massive MIMO systems,

but only considers the angle estimation. [7] shows the simplified results for both angle and

delay estimation, but assumes a single carrier system.

1.2 AI – The New Wireless Frontier

Cellular data traffic has witnessed an exponential growth over the last few years primarily

due to the widespread use of mobile devices and novel application services. Cisco Visual

Networking Index (VNI) forecast predicts a threefold increase of global IP traffic from 122

exabyte (EB) in 2017 to 296 EB in 2022. In order to handle this massive data-flow and ensure

superior quality of experience (QoE) to the end users, wireless cellular networks are also be-

coming extremely complicated. With the coexistence of different types of networks, managing

networks efficiently has become a critical issue for 5G [21, 22, 23] and beyond systems. AI is

regarded as one of the frontiers of beyond 5G and 6G wireless systems [24, 25, 26, 27, 28, 29].

In order to reduce the network management complexity and operational cost, self organizing

network (SON) has been introduced in Third Generation Partnership Project (3GPP) as

one of the enabling technologies for advanced mobile networks [30, 31]. SON aims to achieve

autonomous functionalities within Radio Access Network (RAN). These self-X functionali-

ties include self-configuration, self-optimization, and self-healing [32, 33]. Self-optimization

within SON refers to the process of self-tuning of network parameters for achieving optimum

performance in terms of any predefined metric of interest. The idea is to dynamically update

1.2. AI – The New Wireless Frontier 5

the cellular radio resource parameters based on the changes in propagation characteristics,

traffic pattern or network deployment scenarios. User distribution in wireless cellular net-

work changes dynamically over time. This changes are the result of users’ mobility behavior.

For instance, in the day time, users are more densely populated in the commercial area

whereas at night, users are primarily clustered in the residential area. Users’ large time-scale

movement also depends on specific time within the week (workdays and weekends) or year

(holidays). Accordingly, to maximize the overall throughput and coverage of the wireless

networks, cell-specific cellular radio parameters should also be updated taking into account

the changes in users’ distribution.

Multiple Input Multiple Output (MIMO) system [3] is one of the back-bones for current and

next generation cellular network. Massive MIMO [1], where a large number of antennas are

deployed at the base stations (BS), is envisioned as a key enabler for 5G systems. Beamform-

ing refers to a MIMO technique for coherently combining the signals generated by multiple

antennas in the MIMO arrays. 3-dimensional (3D) massive MIMO/full-dimension (FD)

MIMO [2, 6, 34] promises tremendous throughput gain by enabling simultaneous beamform-

ing in both elevation and azimuth domain. With large antenna array, it is possible to create

sharp beams towards desired users, and hence reduce the interference significantly [35]; this

beamforming is used to improve user’s throughput and is therefore user-specific. Cellular

networks, on the other hand, also require to create cell-specific wide beams. In fact, sector-

ization can be viewed as a special case of a wide beam where a separate wide beam is used

to cover a separate sector belonging to the same cell-site. In reality, wide beams are essential

for connecting as many users as possible. This essentially provides the coverage for cellular

networks. Another important application for widebeam is the broadcast technologies for

sending out the wireless control and access signals as prescribed by LTE and LTE-Advanced

systems. As a result, generating the accurate wide beam patterns that cover the maximum


number of users in the network is critical.

Unfortunately, most of the work in the MIMO literature focuses on maximizing MIMO

throughput or increasing the reliability of the data plane. Meanwhile, widebeam parameters

are set manually in modern cellular networks. A group of network engineers do the drive test

and physically visit each base station site to fix the parameters controlling the shape, tilt

and beam-widths of widebeam. Once fixed these widebeam parameters are not changed until

some major fault/complain show up. In other words, these parameters remain unchanged for

a long period of time– often years. As a result, currently, these parameter cannot be updated

based on users’ movement or change in distribution. Accordingly, this fixed parameter setup

results in strictly suboptimal solution in terms of overall network coverage.

In cellular networks, users movement changes in a dynamic fashion. To maximize the cov-

erage area, the wide-beam parameters need to be dynamically updated based on the user

movement. Reinforcement learning (RL) is shown to be a useful tool for dynamic spectrum

access (DSA) as well as small cell networks. A Q-learning based framework has been in-

troduced in [36] for managing cumulative interference, originated from multiple cognitive

radios, at the primary users’ receivers in wireless regional area networks (WRANs). The

introduced RL system is shown to autonomously learn policy that handles the cumulative

interference at the primary users and keeps interference level at the primary protection con-

tour below a predefined threshold. A RL-based power control strategy has been developed

in [37] for cognitive femtocell networks, and it has been shown that RL can enhance the

capacity of femtocells while ensuring a minimum quality of service (QoS) to macrocells. In

a similar setup, [38] proposes an RL framework for interference management in small-cell

networks. The problem of dynamic channel assignment (DCA) has been addressed in [39] by

utilizing a real-time RL-based approach. A Multirate transmission control (MTC) strategy

has been proposed in [40] using Q-learning algorithm for wideband code division multiple

1.3. Contribution 7

access (CDMA) systems.

Recently deep reinforcement learning (DRL) [41, 42] has been proved to be capable of learning

human-level control policies on a varieties of different Atari games [43]. DRL agents learn to

estimate the Q-values of selecting the best possible actions from current state of the video

games. However, compared to traditional Q-learning, in deep learning based Q-network, the

Q-values are approximated using deep neural network instead of storing the Q-values for all

state-action pairs in a tabular form. As a result, DRL has the ability to predict the correct

Q-values even for very large state and action space. Our recent work [44] shows that DRL

based resource allocation can help improve the network performance of a DSA network.

1.3 Contribution

In this dissertation, we first characterize the optimal/near-optimal DL MIMO precoding and

power allocation strategies for a TDD multi-cell multi-user mmWave massive FD-MIMO net-

work. ESPRIT-based UL DoA estimation scheme will be introduced and performance of

the DoA estimation will be analytically characterized assuming non-orthogonal spreading

sequences used for UL pilots/reference signals. DL multi-user MIMO precoding and power

allocation strategies will be identified based on the UL DoA estimation and their correspond-

ing error performance. Performance evaluation will be conducted to illustrate benefits of the

introduced MIMO precoding and power allocation strategy over our previous scheme [6] as

well as popular zero forcing (ZF)-based precoding [11, 12].

Next, we propose a parametric channel estimation framework for 3D millimeter wave massive

MIMO OFDM systems. To be specific, we jointly estimate the angle and delay parameters,

and based on the estimated angles and delays, we formulate a maximum likelihood based

estimator for estimating the complex path gains. Moreover, we analytically characterize the


root mean squared error (RMSE) of the estimation of delays, and elevation and azimuth

angles, and simplify the results for massive MIMO system. Finally, using simulation work,

we study the performance of the proposed joint estimation algorithm through the correlation

between the estimated channel and the real channel.

Finally, we present a DRL-based framework for MIMO broadcast beam optimization to

maximize the coverage instead of the throughput. This will be an important step towards

realizing the potential of SON.

The detailed contribution of this dissertation can be summarized as the following:

• First, we present a unitary ESPRIT-based uplink DoA estimation method for multi-cell

multi-user mmWave massive FD-MIMO OFDM network. Unlike majority of existing

work, our scheme considers a more realistic scenario where non-orthogonal spreading

sequences are used as UL pilots for both intra-cell and inter-cell users. As a result,

due to the non-zero correlation coefficients among users’ spreading sequences, the UL

DoA/channel estimation is subject to intra-cell interference, inter-cell interference, and

the so called pilot contamination.

• Second, we analytically characterize the mean square error (MSE) of unitary ESPRIT-

based UL DoA estimation for the corresponding multi-cell multi-user FD-MIMO net-

work. Our analytical results show how different perturbation components, namely

noise elements, and intra-cell and inter-cell interferences, affect the UL DoA/chan-

nel estimation performance. The MSE has been related to key physical parameters

such as number of antennas, BS array geometry, complex path gains, and correlation

coefficients between users’ spreading sequences.

• Third, we derive the sum-rate maximizing DL precoding and power allocation strategy

for our FD-MIMO system. Furthermore, we perform a large antenna array regime anal-

1.3. Contribution 9

ysis for DL precoding and identify the optimum power allocation under both perfect

and imperfect DoA estimation scenarios.

• Fourth, regarding MU-MIMO precoding, we validate our algorithms and analytical

results through extensive simulation. The evaluation results demonstrate that our

simulated MSE for different antenna numbers and antenna array geometries match

well with those of analytical expressions for both elevation and azimuth estimation in

large SNR regimes. Moreover, we also show that the introduced sum-rate maximization

precoding strategy outperforms both eigenbeamforming and ZF-based precoding over

all SNR regimes.

• Fifth, we propose a novel framework for superimposed pilot based channel estimation

and downlink processing for 3D massive MIMO systems. We demonstrate that DoA can

be used for uplink channel estimation and downlink precoder design for superimposed

pilot systems and can offer significant performance gain over traditional orthogonal

pilot strategies under certain conditions.

• Sixth, we propose a novel parametric channel estimation framework for jointly estimat-

ing direction of arrivals and delays for 3D massive FD-MIMO systems, and analytically

characterize the estimation performance.

• Seventh, we have proposed a double DQN-based framework [45] for dynamically op-

timizing MIMO broadcast beams for cellular network. The proposed learning-based

algorithm can autonomously update the beam patterns based on changes in user dis-

tribution [46].

• Finally, we propose Beam optimization algorithms for both single cell and multiple cell

environments. For multiple cell environment, we have proposed a novel neural network

architecture for computing the Q-values while keeping the computational complexity


only linearly increasing with number of BSs in the network. We have presented exten-

sive simulation work for validating our proposed solution. We have considered both

periodic and Markov mobility patterns, and show that the proposed DRL-based algo-

rithm can achieve perfect convergence with Oracle for both single cell and multiple cell

environment and for any user distribution.

1.4 Organization of the Dissertation

The rest of the dissertation is organized as follows. Chapter 2 presents the channel estimation

and precoding framework for multi-cell massive MU-MIMO network. Section 2.1 describes

the system model and the channel model for the underlying multi-cell multi-user massive FD-

MIMO network. Section 2.2 presents the ESPRIT-based UL DoA estimation method and

the performance characterization for DoA estimation multi-cell massive MU-MIMO network.

The achievable sum-rate analysis under both perfect and imperfect DoA estimation as well

as the optimal MIMO precoding and power allocation strategies are contained in Section

2.3. Simulation results for massive MU-MIMO network are presented in Section 2.4.

Chapter 3 presents our work on joint parameter estimation for massive MIMO systems. Sec-

tion 3.1 describes the system model, section 3.2 presents the framework for joint parameter

estimation, section 3.3 studies the RMSE characterization for the joint parameter estimation,

while simulation results for joint parameter estimation are presented in section 3.4.

Chapter 4 presents our work on superimposed pilot based framework for 3D massive MIMO

systems. Section 4.1 provides the background and motivation of the superimposed pilot sys-

tem for FD-MIMO, and highlights our contribution in this dissertation. Section 4.2 presents

the system and channel model for superimposed pilot framework. Section 4.3 presents the

DoA estimation strategy for superimposed pilot and characterizes the uplink DoA estimation

1.4. Organization of the Dissertation 11

performance. Section 4.4 characterizes both uplink and downlink achievable rates for super-

imposed pilot system. Section 4.5 presents the simulation results and Section 4.6 summarizes

the chapter.

Our work on MIMO broadcast beam optimization using deep reinforcement learning (DRL)

is presented in chapter 5. Section 5.1 presents the network model and problem statement;

Section 5.2 presents the beam learning framework; Section 5.3 introduces the DRL-based op-

timization strategies for both single cell and multiple cell environments; Section 5.4 presents

the simulation work.

Finally, we conclude the dissertation in section 6

Chapter 2

Multi-Cell Multi-User Massive

FD-MIMO

2.1 System and Channel Model

!-th cell"-th cell

#-th MS

#-th MS

$-th MS

$-th MS

BS

Figure 2.1: Network Model.

We consider a multi-cell multi-user MIMO-OFDM system consisting of G BSs as depicted

in figure 2.1. Each BS with Nr number of antennas supports J number of mobile stations

(MSs)–each having Nt number of transmit antennas. After appending cyclic prefix (CP), the

12

2.1. System and Channel Model 13

resulting time domain transmit signal at each MS is first passed through a parallel-to-serial

converter followed by a digital-to-analog (DAC) converter, resulting in the baseband OFDM

signal. The baseband signal is then up-converted and sent through a frequency selective

fading channel, which is assumed to remain time-invariant during an OFDM symbol duration.

It is to be noted here that we assumed same number of antennas at all UEs because of better

clarity of exposition. However, we want to emphasize here that the algorithm and analysis

presented in this work are not restricted by this assumption. All the results presented in this

work can be straightforwardly extended to the scenario where users in the cell have different

number of antennas.

In the UL, each MS sends Nt spreading sequences of length Q as plots/reference signals: one

on each transmit antenna. Accordingly, the Nr×Q frequency-domain received signal for the

k-th subcarrier at the i-th BS can be written as

Zi(k) =G−1∑g=0

J−1∑j=0

√Λjg,iHjg,i(k)Xjg(k) + Wi(k), (2.1)

where Hjg,i(k) is the Nr×Nt channel matrix for the channel between the i-th BS and the j-th

MS in the g-th cell at the k-th subcarrier, and Λjg,i is the corresponding large scale fading

coefficient which is independent of subcarrier frequency; Xjg(k) is the Nt × Q frequency

domain transmit signal from the j-th MS in the g-th cell for the k-th subcarrier, and Wi(k)

is the corresponding Nr×Q noise matrix. Note that each row vector of Xjg(k) is a length-Q

spreading sequence. The channel transfer function, Hjg,i(k), can be written as

Hjg,i(k) =

Ljg,i−1∑`=0

Cjg,i(`)e−j2πk`Nc , (2.2)

where Cjg,i(`) is the Nr×Nt channel impulse response (CIR) for the `-th tap of the channel

between i-th BS and the j-th MS in the g-th cell. Nc denotes total number of subcarriers.

14 Chapter 2. Multi-Cell Multi-User Massive FD-MIMO

Here, we assume that the channel, which can be represented by an equivalent discrete-time

linear channel impulse response, has a finite number (Ljg,i) of non-zero taps.

Using the geometric channel model for mmWave frequencies, the impulse response for the

`-th tap of the channel between i-th BS and the j-th MS in the g-th cell can be represented

by[6, 7, 47]

Cjg,i(`) =

Pjg,i,`−1∑p=0

αjg,i(`, p)er,jg,i(`, p)eHt,jg,i(`, p), (2.3)

where αjg,i(`, p), er,jg,i(`, p), and et,jg,i(`, p) are, respectively, the channel gain, Nr×1 receive

antenna array response, and Nt × 1 transmit antenna array response for the p-th sub-path

within the `-th tap of the channel between the i-th BS and the j-th MS in g-th cell; Pjg,i,` is

the total number of sub-paths within the `-th tap of the channel; and (·)H denotes Hermitian

transpose operation. In the FD-MIMO network of interests, a 1D uniform linear array (ULA)

is assumed at each MS. The corresponding transmit antenna array response can be described

using the Vandermonde structure: et,jg,i(`, p) =

[1 ejωjg,i,`,p . . . ej(Nt−1)ωjg,i,`,p

]T, where

ωjg,i,`,p = (2π∆t/λ) cos Ωjg,i,`,p, ∆t is the spacing between the adjacent transmit antenna

elements, Ωjg,i,`,p is the transmit angle (DoD) for p-th sub-path within `-th tap of the channel

between i-th base station and the j-th user in g-th cell, and λ is the carrier wavelength.

On the other hand, for FD-MIMO networks the antenna array at the BS is a 2D planar

array placed in the X-Z plane, with M1 and M2 antenna elements in vertical and horizontal

directions, respectively. Accordingly, the number of total receive antenna elements at the

base station is Nr = M1 ×M2. Therefore, the receive antenna array response for the p-th

sub-path within `-th tap can be expressed as er,jg,i(`, p) = a(vjg,i,`,p) ⊗ a(ujg,i,`,p), where ⊗

represents the Kronecker product, and a(ujg,i,`,p) =

[1 ejujg,i,`,p . . . ej(M1−1)ujg,i,`,p

]Tand

a(vjg,i,`,p) =

[1 ejvjg,i,`,p . . . ej(M2−1)vjg,i,`,p

]Tcan be treated as the receive steering vectors

2.2. Uplink Channel Characterization Through DoA Estimation 15

in the elevation and azimuth domains, respectively. Here, ujg,i,`,p = 2π∆r

λcos θjg,i,`,p and

vjg,i,`,p = 2π∆r

λsin θjg,i,`,p cosφjg,i,`,p are the two receive spatial frequencies at the BS, ∆r is

the spacing between adjacent antenna elements in the receive antenna array, and θjg,i,`,p and

φjg,i,`,p are the elevation and azimuth DoAs for the p-th sub-path within `-th tap for the

channel between the i-th BS and the j-th MS in g-th cell, respectively. In this paper, we are

not considering user mobility or scheduler impact on the system performance, and we will

address these important issues in our future work.

2.2 Uplink Channel Characterization Through DoA Es-

timation

In this section, we will present the UL DoA estimation procedure for the multi-cell multi-user

massive FD-MIMO network and characterize assuming non-orthogonal/non-ideal spreading

sequences and characterize the corresponding estimation performance. We choose ESPRIT-

based method for DoA estimation over other high resolution DoA estimation methods, such

as MUSIC, since ESPRIT offers better resolvability and unbiased estimates with lower vari-

ance. Most importantly, ESPRIT provides significant computational advantages in terms of

faster processing speed, lower storage requirement and indifference to knowledge of precise

array geometry.

2.2.1 UL DoA Estimation through Unitary ESPRIT

Let the n-th MS at the i-th cell be the target user which tries to communicate to the i-th

BS. In massive FD-MIMO networks, the number of scheduled users may be quite large, and

hence due to limited availability of orthogonal spreading codes, it may not be possible to


assign orthogonal sequences to all scheduled users. With this in mind, in this work, we

assume a more realistic scenario that only the spreading sequences used by the same MS

are orthogonal while spreading sequences for different MSs within a cell are non-orthogonal.

Furthermore, we assume that the same pool of spreading sequences are reused across all cells

as UL pilots complying with the 3GPP LTE/LTE-Advanced standards [48].

Let the correlation among the spreading sequences from different MSs be denoted as ρ1.

Now, for estimating the UL channel of the n-th MS in i-th cell, at the i-th BS, the Nr ×Q

received signal at the k-th subcarrier, Zi(k), is first correlated with the spreading sequences

of n-th MS. Hence, after correlating the received signal with the target user’s sequence, from

(2.1), we have

Zi(k)XHni(k) =

G−1∑g=0

J−1∑j=0

√Λjg,iHjg,i(k)Xjg(k)XH

ni(k) + W′

i(k), (2.4)

where W′i(k) = Wi(k)XH

ni(k) is the equivalent noise element. Now, we can re-write (2.4) as

Zi(k)XHni(k) =

√Λni,iHni,i(k) +

J−1∑j=0j 6=n

√Λji,iHji,i(k)ρ11Nt

+G−1∑g=0g 6=i

J−1∑j=0j 6=n

√Λjg,iHjg,i(k)ρ11Nt

+G−1∑g=0g 6=i

√Λng,iHng,i(k) + W

′

i(k), (2.5)

where 1Nt denotes an Nt×Nt matrix with each element being unity. In (2.5), the first term,√Λni,iHni,i(k), represents the target user’s UL channel; first summation term represents the

inter-cell interference caused by users in other cells whose spreading sequences are exactly

the same as that of target user (pilot contamination); second summation term represents


the intra-cell interference; and third summation term represents the inter-cell interference

caused by users in other cells whose spreading sequences are different (non-orthogonal) than

that of target user. In realistic wireless networks such as LTE/LTE-Advanced networks,

there exists a nonzero correlation between different pilot sequences. For example, Zadoff-Chu

sequence is used to make sure different prime length Zadoff-Chu sequence has constant cross-

correlation [49]. To reflect this practical constraint as well as provide system design insights,

a correlation coefficient, ρ1, in (2.5) is introduced as a system design parameter to consider

the tradeoff between the training sequence length and corresponding system performance. It

is to be noted that ρ1 = 0 results in the special case where all the users in a cell are assigned

orthogonal codes. Also important to note that the value of ρ1 depends on the length of

scrambling sequences the system designer chooses, which again depends on the coherence

time of the channel. This provides us a way to investigate the impact of channel coherence on

the network performance: Smaller coherence time will lead to a shorter scrambling sequence

resulting in higher values for ρ1.

Because of the large path loss, which is manifested by the large-scale fading coefficients,

overall gains of the inter-cell interference channels are relatively small compared to that of

the target user’s channel. Furthermore, presence of the small correlation coefficients, ρ1,

intra-cell interference terms can also be considered relatively smaller compared to the term

for target user’s channel. Hence, during the ESPRIT-based parameter estimation phase, we

can treat the interference and noise elements together, and (2.5) can be written as

Hni,i(k) = Zi(k)XHni(k) =

√Λni,iHni,i(k) + W

′′

i (k) (2.6)

where W′′i (k) is the equivalent noise-plus-interference matrix. Now, using (2.2) and (2.3),


we can write Hni,i(k) as

Hni,i(k) =

Lni,i−1∑`=0

Pni,i,`−1∑p=0

αni,i(`, p)er,ni,i(`, p)eHt,ni,i,k(`, p) (2.7)

where et,ni,i,k(`, p) = et,ni,i(`, p)e−j2πk`Nc . In order to jointly estimate the elevation and azimuth

angles of the uplink channel between the i-th base station and the n-th user in i-th cell, we

can now apply a low-complexity DoA estimation algorithm based on unitary ESPRIT.

High frequency channels, especially millimeter-wave channels, usually have fewer number of

scattering clusters [50]. In this work, we focus on the simple case where each scattering cluster

contributes a single propagation path. This is a reasonable assumption for the analysis of FD-

MIMO systems [6, 51, 52]. Hence for the clarity of exposition, and notational convenience,

we can drop the subpath index, p, from αjg,i(`, p), er,jg,i,k(`, p), and eHt,jg,i(`, p). However,

our results also hold for multiple subpaths scenario due to the fact that ESPRIT can be

used to distinguish subpaths as long as the spatial resolvability of the array is higher than

the angular spread between two subpaths [53]. It is to be noted here that in this work,

instead of Standard ESPRIT, we utilize Unitary ESPRIT [54] for DoA estimation, which

provides superior estimation performance for the case where the sub-paths within the same

clusters are highly correlated. Moreover, because of Forward-Backward Averaging (FBA),

Unitary ESPRIT can still estimate the corresponding DoAs of two sub-paths which are

completely correlated or coherent. It is also noteworthy here that it is unlikely to have

more than two completely coherent sub-paths in the mmWave propagation channel based on

3GPP mmWave channel model [55] and the seminal work in [56]. Therefore, our introduced

algorithm is applicable for most general mmWave channels. However, for the very special

case where more than two sub-paths are completely coherent, and all such sub-path DoAs

are required to be estimated, the spatial smoothing technique can be applied in conjunction


with FBA to de-correlate the corresponding signals [57]. However, this is out of the scope

of current manuscript and we will consider this special case in our future work. Now, (2.7)

can be written as

Hni,i(k) = Ani,iDni,iBHni,i(k), (2.8)

where Ani,i =

[er,ni,i(0) . . . er,ni,i(Lni,i − 1)

], Dni,i = diag

[αni,i(0) . . . αni,i(Lni,i − 1)

],

and Bni,i(k) =

[et,ni,i,k(0) . . . et,ni,i,k(Lni,i − 1)

]. Hence, from (2.6), the channel matrix,

Hni,i(k), can be written as

Hni,i(k) =√

Λni,iAni,iDni,iBHni,i(k) + W

′′

i (k). (2.9)

By converting all the complex matrices to the real matrices, Unitary ESPRIT performs

the computations in real, instead of complex, numbers from beginning to the end of the

algorithm, and hence, reduces the computational complexity significantly. Since we are only

interested in estimating UL DoAs, the noisy channel from (2.9) can be expressed as

Hni,i(k) = Ani,iSni,i(k) + W′′

i (k), (2.10)

where Sni,i(k) =√

Λni,iDni,iBHni,i(k). In order to perform unitary ESPRIT, we need to use

forward-backward averaging on the received signal:

Hfbani,i(k) =

[Hni,i(k) ΠNrH

∗ni,i(k)ΠNt

]=

[Ani,iSni,i(k) ΠNrA

∗ni,iS

∗ni,i(k)ΠNt

]+

[W′′i (k) ΠNrW

′′i

∗(k)ΠNt

], (2.11)


where A∗ represents complex conjugate of A, and Πp denotes the p × p exchange matrix

with ones on its antidiagonal and zeros elsewhere. The subspace decomposition of the signal

space of the received signal through singular value decomposition then can be written as:


∗ni,iS

∗ni,i(k)ΠV

]

=

[Usigni,i Unoise

ni,i

]Σsigni,i 0

0 0

Vsig

ni,i

H

Vnoiseni,i

H

. (2.12)

From this step onward, we can now follow our line of work [6] in order to apply ESPRIT-based

techniques on (2.12). Hence, the details are not repeated here due to page limitation.

2.2.2 RMSE Characterization

The theoretical performance of the unitary ESPRIT-based UL DoA estimation can be char-

acterized where the root mean squared error (RMSE) of the estimation is served as the

performance metric. Let vni,i,` denote the estimated spatial frequency for `-th tap of the tar-

get user’s channel, i.e, the channel between the i-th BS and the n-th MS in the i-th cell; the

estimation error is then given by ∆vni,i,` = vni,i,` − vni,i,`. Similarly, ∆uni,i,` = uni,i,` − uni,i,`.

It has been shown in [20] that the unitary transformation does not affect the MSE of the

ESPRIT methods; however, the statistics of the noise and the signal subspace are changed

due to the forward and backward averaging performed in (2.11). To be specific, the covariance

and complementary covariance matrices for the equivalent noise-plus-interference matrix


W′′i (k) in (2.5) become, respectively [20]:

R(fba)i (k) =

Ri(k) 0

0 ΠNrNtR∗i (k)ΠNrNt

;

C(fba)i (k) =

0 Ri(k)ΠNrNt

ΠNrNtR∗i (k) 0

,(2.13)

where Ri(k) = Eα,θ,φ,ψ

vecW′′i (k)

vecW′′i (k)

H, and the expectation, Eα,θ,φ,ψ, is

taken with respect to different channel realizations (i.e., w.r.t. channel gains, DoA’s– both

azimuth and elevation– and DoD’s of the interference channels). Now the expression of the

covariance matrix, Ri(k), can be simplified using the following lemma:

Lemma 2.1. The covariance matrix, Ri(k), of the equivalent noise-plus-interference matrix,

W′′i , is given by:

Ri(k) = Ri,1(k) + Ri,2(k) + Ri,3(k) + Ri,4(k), (2.14)

where Ri,4(k) = σ2INrNt, where σ2 is the noise variance, and

Ri,1(k) = Eα,θ,φ,ψ

G−1∑g=0g 6=i

(√Λng,i

)2

Rng,i(k)

, (2.15)


ρ21

J−1∑j=0j 6=n

(√Λji,i

)2

X t,rRji,i(k)X t,r

, (2.16)



ρ21

G−1∑g=0g 6=i

J−1∑j=0j 6=n

(√Λjg,i

)2

X t,rRjg,i(k)X t,r

, (2.17)

whereX t,r = (1Nt ⊗ INr), and Rpq,r(k) = Ppq,r(k)PHpq,r(k), where Ppq,r(k) =

(B∗pq,r(k)⊗Apq,r

)vec Dpq,r.

Proof Sketch. Lemma 2.1 can be proved using the properties of matrix vectorization, and

with the assumption of the independence of channel gains.

It is noteworthy here that in Lemma 2.1, Ri,1(k), Ri,2(k), Ri,3(k), and Ri,4(k) correspond, re-

spectively, to the effects of pilot contamination, intra-cell interference, inter-cell interference,

and noise element of the noise-plus-interference signal. Now, the first order approximation

of the mean square estimation error of vni,i,` for the Unitary ESPRIT is given by [20]:

E

(4vni,i,`)2=

1

2

(r

(v)H

ni,i,` ·W∗ni,i,mat ·R

(fba)T

i ·WTni,i,matr

(v)ni,i,`

−Re

r(v)T

ni,i,` ·Wni,i,mat ·C(fba)i ·WT

ni,i,mat · r(v)ni,i,`

), (2.18)

where

r(v)ni,i,` = q` ⊗

([(J

(v)1 Usig

ni,i)+(J

(v)2 /ejvni,i,` − J

(v)1 )]T

p`

), (2.19)

Wni,i,mat = (Σsig−1

ni,i VsigT

ni,i )⊗ (Unoiseni,i UnoiseH

ni,i ). (2.20)

Here, Jv,1 = [IM2−1 0] ⊗ IM1 and Jv,2 = [0 IM2−1] ⊗ IM1 are the selection matrices for

the first and second subarrays, respectively, for the spatial frequency vni,i,`; T is the trans-

formation matrix, q` is the `-th column of matrix Tni,i, pT` is the `-th row of matrix T−1ni,i;

Rfbai and Cfba

i are the covariance and complementary covariance matrices of the noise-plus-

interference, respectively. Now, let us consider the following lemma:


Lemma 2.2. Covariance and complementary covariance matrices of the forward-backward

averaged signal can be decomposed as

R(fba)i (k) = R

(fba)i,1 (k) + R

(fba)i,2 (k) + R

(fba)i,3 (k) + R

(fba)i,4 (k), (2.21)

C(fba)i (k) = C

(fba)i,1 (k) + C

(fba)i,2 (k) + C

(fba)i,3 (k) + C

(fba)i,4 (k), (2.22)

where

R(fba)i,m (k) =

Ri,m(k) 0

0 ΠNrNtR∗i,m(k)ΠNrNt

;

C(fba)i,m (k) =

0 Ri,m(k)ΠNrNt

ΠNrNtR∗i,m(k) 0

,(2.23)

for m = 1, . . . , 4, where Ri,m(k)’s are given by Lemma 2.1.

Proof Sketch. This Lemma can be proved by substituting (2.21) into (2.13), and by utilizing

the definitions of Ri,m(k)s from Lemma 2.1.

Using Lemma 2.2, we can separately investigate the effects of different elements of noise-

plus-interference signal on the DoA estimation performance, and hence, can write (2.18) as

E

(4vni,i,`)2 =4∑

m=1

E

(4vni,i,`)2m

where

E

(4vni,i,`)2m

=1

2

(r

(v)H


(fba)T

i,m ·WTni,i,matr

(v)ni,i,`

−Re

r(v)T

ni,i,` ·Wni,i,mat ·C(fba)i,m ·WT


). (2.24)

Now, (2.24) depends on the singular value decomposition (SVD) of the noiseless received


signal, which can be difficult to obtain at the BS. However, for massive MIMO systems, this

becomes possible due to the fact that the steering vectors are orthogonal. We consider the

following Lemma [6] to facilitate the derivation of the MSE expression for massive MIMO

systems:

Lemma 2.3. If the elevation and azimuth angles are both drawn independently from a con-

tinuous distribution, the normalized array response vectors become orthogonal asymptotically,

that is, er,jg,i(m) ⊥ spaner,j′g′ ,i′ (n) | ∀(j, g, i,m) 6= (j

′, g′, i′, n)

when the number of an-

tennas at the base station goes large, where er,jg,i(m) = 1√Nr

er,jg,i(m).

Using this property, we can analytically characterize the effect of each individual perturbation

element on the DoA estimation performance. To be specific, the MSE of UL DoA estimation

due to pilot contamination is given by the following Theorem:

Theorem 2.4. For the massive MIMO network, the MSE of the unitary ESPRIT-based UL

DoA estimation due to pilot contamination is given by. . .

Eθ,φ,φ(∆vni,i,`)21 =1

8|αni,i(`)|2N2t Λni,i(M2 − 1)2M2

1

×G−1∑g=0g 6=i

Λng,iXng,i

Lng,i−1∑m=0

|αng,i(m)|2

×(Yng,i + Y

′

ng,i − 2<ejΦYng,i

), (2.25)

where Φ = ((M1 − 1)uni,i,` + (M2 − 1)vni,i,`), and Xng,i and Yng,i are given by

Xng,i = Eψ∣∣(1 + e−j(ωni,i,`−ωng,i,m) + . . .

. . .+ e−j(Nt−1)(ωni,i,`−ωng,i,m))∣∣2 , (2.26)


Yng,i = Eθ,φ∣∣(1 + ej(uni,i,`−ung,i,m) + . . .

+ej(M1−1)(uni,i,`−ung,i,m)) (ej(M2−1)(vni,i,`−vng,i,m) − 1

)∣∣2 , (2.27)

and Y′ng,i and Yng,i are given by

Y′

ng,i = Eθ,φ∣∣(ej(M1−1)ung,i,m + ejuni,i,`ej(M1−2)ung,i,m+

. . .+ ej(M1−1)uni,i,`) (ej(M2−1)vni,i,` − ej(M2−1)vng,i,m

)∣∣2 , (2.28)

Yng,i = Eθ,φ

[ (e−j(M1−1)ung,i,m + e−juni,i,`e−j(M1−2)ung,i,m

+ . . .+ e−j(M1−1)uni,i,`) (ej(M2−1)(vng,i,m−vni,i,`) − 1

)×(1 + . . .+ e(M1−1)(ung,i,m−uni,i,`)

)×(e−j(M2−1)vni,i,` − e−j(M2−1)vng,i,m

) ](2.29)

for m = 0, . . . Lng,i − 1, and Eψ and Eθ,φ denote, respectively, expectations with respect to

DoD and DoAs of the interference channel.

Proof. See Appendix A.1.

Remark 2.5. Based on Jacobian, MSEs of the elevation and azimuth angles can be obtained

from the MSEs of the spatial frequencies as follows [58]

Eθ,φ,φ(∆θ`)2 = Eθ,φ,φ(∆u`)2 1

π2 sin2(θ`), (2.30)

Eθ,φ,φ(∆θ`)2 =Eθ,φ,φ(∆u`)2 cot2(θ`) cot2(φ`)

π2 sin2(θ`)

+Eθ,φ,φ(∆v`)2

π2 sin2(θ`) sin2(φ`)(2.31)


Remark 2.6. The expectation expressed in (2.25) of Theorem 2.4 is NOT taken with respect

to time, rather, it is taken with respect to the DoAs or DoDs of interference users due to

random locations of those interference users.

Similarly, the effect of intra-cell interference, inter-cell interference, and noise elements on the

MSE performance are characterized in Theorem 2, Theorem 3, and Theorem 4, respectively:


DoA estimation due to intra-cell interference is given by

Eθ,φ,φ

(4vni,i,`)22

=ρ2

1|X′′

ni,i,`|2

8|αni,i(`)|2N2t Λni,i(M2 − 1)2M2

1

×J−1∑j=0j 6=n

Λji,iXji,i

Lji,i−1∑m=0

|αji,i(m)|2

×(Yji,i + Y

′

ji,i − 2<ejΦYji,i

)], (2.32)

where X′′

ni,i,` =Nt−1∑m=0

ejmωni,i,`.



DoA estimation due to inter-cell interference is given by

Eθ,φ,φ

(4vni,i,`)23

=ρ2

1|X′′

ni,i,`|2

8|αni,i(`)|2N2t Λni,i(M2 − 1)2M2

1

×

G−1∑g=0g 6=i

J−1∑j=0j 6=n

Λjg,iXjg,i

Ljg,i−1∑m=0

|αjg,i(m)|2

×(Yjg,i + Y

′

jg,i − 2<ejΦYjg,i

), (2.33)


Proof. The proof is similar to the proof of Theorem 2.


DoA estimation due to noise element is given by

E

(4vni,i,`)24

=σ2

2|αni,i(`)|2NtΛni,i(M2 − 1)2M1

,

where σ2 is the noise variance.

Proof. This theorem can be proved following the line of proof for Theorem-1 in [6].

Similarly, we can also obtain the MSE expressions for elevation spatial frequency, E

(4uni,i,`)2.

Accordingly, based on Jacobian matrices, we can characterize MSE expressions for UL ele-

vation and azimuth DoAs from the MSEs of the spatial frequencies.

Remark 2.10. From Theorem 2 and Theorem 3 we can observe that non-zero correlation

among the spreading sequences of different MSs does cause intra- and inter-cell interference

for UL DoA estimation, and the corresponding MSEs of the estimation are directly affected

by the correlation coefficient, ρ1. On the other hand, as we can see from Theorem 1, the

MSE due to pilot contamination is not dependent on the correlation coefficient.

Furthermore, these four theorems suggest that our original work in [6] may yield strictly

suboptimal solutions since in that work we only consider DoA estimation error due to noise

elements. This observation will be verified through performance evaluation in Section V.

2.2.3 Complexity Analysis

In this subsection, we discuss the computational coomplexity of unitary ESPRIT-based DoA

estimation procedure as presented in Section 2.2.1. We first summarize the computational


complexity of some basic operations in terms of floating point operations (FLOPS). It re-

quires 2(n − 1)mp FLOPS for computing product of two matrices of sizes (m × n) and

n× p. For taking inverse of a positive definite matrix of size (n× n) requires (n3 + n2 + n)

floating point operations; number of FLOPS required for taking SVD of an m × n matrix

is (4m2n + 22n3), and complexity for finding eigenvalues of an n × n matrix is n3. Now,

we can describe the computational complexity of each step of our algorithm presented. For

complexity analysis, we assume all the channels have L resolvable paths. Now, correlating

the received signal with training symbol matrix in (2.4) requires Ca = 2(Q−1)NrNt number

of FLOPS. Taking forward-backward averaging in (2.11) requires Cb = 2NrNt(Nr +Nt − 2)

FLOPS. Number of FLOPS required for taking SVD of the forward-backward averaged

received signal in (2.12) is Cc = (8N2rNt + 176N3

t ). Now, for solving shift-invariance equa-

tions for elevation and azimuth spatial frequencies, the number of FLOPS required are,

respectively, Cd = 2[M2(M1 − 1) − 1]L2 + [L3 + L2 + L] + [2(L − 1)M2L(M1 − 1)], and

Ce = 2[M1(M2−1)−1]L2 +[L3 +L2 +L]+[2(L−1)M1L(M2−1)]. Finally, for calculating the

eigenvalues of two shift-invariance operator matrices requires Cf = 2L3 number of FLOPS.

Hence, total computational complexity of our ESPRIT-based DoA estimation method can

be written as CESPRIT = Ca + Cb + Cc + Cd + Ce + Cf . Next, for comparison. we compute

the computational complexity of MUSIC algorithm. For computing the covariance matrix

of the received signal, the number of FLOPS required is Da = (Q+ 1)N2r . Next, computing

the SVD of the covariance matrix requires Db = 26N3r FLOPS. Let Ng denote the number of

grids for candidate DoA search. Hence, total number of FLOPS required for extracting the

eigenvectors corresponding to noise subspace is Dc = Ng ([2Nr(Nr − L)] + [2Nr − 3]). Hence

computational cost for MUSIC algorithm is CMUSIC = Da +Db +Dc.

2.3. Downlink Precoding and Achievable Rate Analysis 29

2.3 Downlink Precoding and Achievable Rate Analysis

2.3.1 Optimum Precoding for Sum-rate Maximization

In the DL, at the i-th BS, the Ns × 1 information symbol vector intended for the n-th MS

in the i-th cell on the k-th subcarrier can be expressed as sdlni[k] =

[sdlni,0[k], . . . , sdlni,Ns−1[k]

],

where sdlni,p[k] is the p-th information symbol intended for the n-th MS. Accordingly, the

Nr × 1 downlink frequency domain transmit signal from the i-th BS can be written as

xdli [k] =J−1∑j=0

xdlji[k] =J−1∑j=0

Vji[k]sdlji[k], (2.34)

where xdlji[k] = Vji[k]sdlji[k], and Vji[k] is the Nr × Ns precoding matrix for the j-th MS in

the i-th cell on the k-th subcarrier. Now, the Nt × 1 received signal at the n-th MS in the

i-th cell on the k-th subcarrier, ydlni[k], can be written as

ydlni[k] =G−1∑g=0

√Λni,gH

dlni,g[k]xdlg [k] + ndlni[k] =

G−1∑g=0

J−1∑j=0

√Λni,gH

dlni,g[k]Vjg[k]sdljg[k] + ndlni[k]

=√

Λni,iHdlni,i[k]Vni[k]sdlni[k] +

J−1∑j=0j 6=n

√Λni,iH

dlni,i[k]Vji[k]sdlji[k]

+G−1∑g=0g 6=i

J−1∑j=0j 6=n

√Λni,gH

dlni,g[k]Vjg[k]sdljg[k] + ndlni[k], (2.35)

where Hdlni,g[k] is the Nt × Nr downlink channel between the g-th BS and the n-th MS in

i-th cell on the k-th subcarrier, and ndlni[k] is the corresponding Nt × 1 noise vector at the

receiver with Endlni[m]ndlni[n] = σ2INtδ(m−n). In (2.35), the first term is the desired signal,

while the second and third terms represent the intra- and inter-cell interferences, respectively.


Now, the rate for n-th MS in i-th cell is given by

Ini[k] = log2 det(I + Λni,iH

dlni,i[k]Vni[k]VH

ni[k]HdlH

ni,i [k]× ∑(n,i)6=(j,g)

Λni,gHdlni,g[k]Vjg[k]VH

jg[k]HdlH

ni,g[k] + σ2I

−1 . (2.36)

Accordingly, the sum-rate maximization (SRM) problem can be expressed as

maxVji[k]

J−1∑j=0

Iji[k]

s.t.J−1∑j=0

Tr(Vji[k]Vji[k]H

)≤ Pt, (2.37)

where Pt is the total power available at the BS for each sucarrier. In general, it is challenging

to solve the problem in (2.37) since it is highly non-convex. Alternatively, sum-MSE (mean

square error) minimization is another popular utility maximization problem for DL multi-

user MIMO systems. Let Tni be the DL receive processing matrix for the n-th MS. The

estimated received symbol vector can then be written as sdlni[k] = THniy

dlni[k]. Now, n-th MS’s

MSE matrix can be defined as

Eni[k] = E[(

sdlni[k]− sdlni[k]) (

sdlni[k]− sdlni[k])H]

=(I−

√Λni,iT

HniH

dlni,i[k]Vni[k]

)(I−

√Λni,iT

HniH

dlni,i[k]Vni[k]

)H+

∑(n,i)6=(j,g)

Λni,gTHniH

dlni,g[k]Vjg[k]VH

jg[k]HdlH

ni,g[k]Tni + σ2THniRni (2.38)


Accordingly, the sum-MSE minimization problem can be defined as

minVji[k]

J−1∑j=0

εji[k]

s.t.J−1∑j=0

Tr(Vji[k]Vji[k]H

)≤ Pt. (2.39)

where εni[k] = TrEni[k]. The relationship between the problems in (2.37) and (2.39) can

be established by the following lemma [59]:

Lemma 2.11. The sum-rate maximization problem in (2.37) and the sum-MSE minimiza-

tion problem in (2.39) are equivalent in the sense that the optimal solutions, Vji[k]J−1j=0 ,

for both problems are identical.

In this work, we assume that no coordination is available among BSs, which is a typical

scenario in TDD-based FD-MIMO networks. Hence, problem in (2.39) can be written as

minVji[k]

∣∣∣∣THi Hdl

i,i[k]Vi[k]− I∣∣∣∣2F

s.t.J−1∑j=0

Tr(Vji[k]Vji[k]H

)≤ Pt, (2.40)

where THi = blkdiagTH

0i,TH1i, . . . ,T

H(J−1)i, , Vi[k] =

[V0i[k],V1i[k], . . . ,V(J−1)i[k]

]. and

Hdli,i[k] =

[√Λ0i,iH

dlT

0i,i[k],√

Λ1i,iHdlT

1i,i[k], . . . ,√

Λ(J−1)i,iHdlT

(J−1)i,i[k]]T

. Now, using channel

reciprocity property, the downlink channel can be written in terms of the uplink channel:

Hdlni,i[k] = HT

ni,i[k] = B∗ni,i[k]Dni,iATni,i = B∗ni,iDni,iA

Tni,i[k], (2.41)

where Ani,i[k] =

[er,ni,i,k(0) . . . er,ni,i,k(Lni,i − 1)

], where er,ni,i,k(`) = er,ni,i(`)e

−j2πk`Nc , and

Bni,i =

[et,ni,i(0) . . . et,ni,i(Lni,i − 1)

]. Assuming each MS will only use its own DL CSI


for receive processing, we have THn,iH

dlni,i[k] = Dni,iA

Tni,i. Accordingly, the problem in (2.40)

can be expressed as

minVji[k]

∣∣∣∣Di,iATi,i[k]Vi[k]− I

∣∣∣∣2F

s.t.

J−1∑j=0

Tr(Vji[k]Vji[k]H

)≤ Pt, (2.42)

where Di,i = blkdiagD0i,i, D1i,i, . . . , D(J−1)i,i, , Ai,i[k] =[A0i,i[k], A1i,i[k], . . . , A(J−1)i,i[k]

],

and Dni,i =√

Λni,iDni,i accounts for both the large and small scale fading effect. The

solution to this problem is given by the following theorem:

Theorem 2.12. Let Di,iATi,i[k] = Ui,i[Λi,i,0]WH

i,i be the SVD of the effective channel,

Di,iATi,i[k], where Λi,i = diagλ0i,i, λ1i,i, . . . , λ(JNs−1)i,i. Then the optimal precoding matrix

problem in (2.42) is given by Vi[k] = Wi,i[Ξi,i,0]T UHi,i, where Ξi,i = diagξ0i,i, ξ1i,i, . . . , ξ(JNs−1)i,i,

and ξmi,i = λmi,i/(λ2mi,i + η), with the smallest η ≥ 0 satisfying

∑JNs−1m=0 |ξmi,i|2 ≤ Pt.


From theorem 2.12, it can be seen that the optimal precoder that minimizes the sum-MSE,

and hence maximizes the sum-rate can be constructed from the estimated UL DoAs as well

as the path gains. In this paper, we assume that the BS has perfect knowledge of the path

gains. However, path gains can also be estimated using maximum likelihood (ML) method

once the DoAs have been estimated. Our work on this aspect can be found in [60].

2.3.2 Large-Antenna System Analysis

In this section, we present the achievable rate analysis and simplified precoding strategy for

massive FD-MIMO systems. Our discussions in this sub-section are based on asymptotic


analysis. This can be viewed as the special case of Section 2.3.1 where the number of antennas

at the base station goes large asymptotically.

Achievable Rate under Perfect Channel Estimation

In this case, (2.35) can be written as

ydlni[k] = B∗ni,iDni,iATni,i[k]Vni[k]sdlni[k] +

J−1∑j=0j 6=n

B∗ni,iDni,iATni,i[k]Vji[k]sdlji[k] + n

′

ni[k], (2.43)

where

n′

ni[k] =G−1∑g=0g 6=i

J−1∑j=0

√Λni,gH

dlni,g[k]Vjg[k]sdljg[k] + ndlni[k] (2.44)

is the equivalent noise-plus-inter-cell-interference vector. As the number of antennas grows

large, the right singular matrix, Wi,i, in Theorem 2.12 can be approximated as the DoA

matrix, A∗i,i. In other words, for massive FD-MIMO systems, eigen directions align with

the directions of arrivals, which is also validated in [6] and [7]. From Lemma 2.3, the array

steering vectors for different MSs become orthogonal as the number of antennas grows large,

i.e., we have (1/Nr)AHji,i[k]Aj′ i,i[k] → 0 as Nr → ∞ for all j 6= j

′. Hence for the massive

MIMO systems, beamforming in the DoA directions nullifies the intra-cell interferences.

Therefore, the optimum eigen-beamformer under perfect DoA estimation:

Veigni [k] =

1

Nr

A∗ni,i[k], (2.45)


and accordingly, the received signal in (2.43) can be written as

ydlni[k] = B∗ni,iDni,isdlni[k] + n

′

ni[k]. (2.46)

Now, the signal in (2.46), under the optimal receive processing, results in

ydlni[k] = Dni,isdlni[k] + n

′

ni[k], (2.47)

where ydlni[k] =(BTni,iB

∗ni,i

)−1BTni,iy

dlni[k], and n

′ni[k] =

(BTni,iB

∗ni,i

)−1BTni,in

′ni[k]. Accordingly,

the achievable rate for the n-th user in i-th cell, Ini[k], can be expressed as

Ini[k] = log2 det

ILni,i +Dni,iQ

dlni[k]DH

ni,i

G−1∑g=0g 6=i

J−1∑j=0

Λni,gBHni,iH

dlni,g[k]Vjg[k]Qdl

jg[k]VHjg[k]HdlH

ni,g[k]Bni,i + σ2I

.

(2.48)

where, BHni,i =

(BTni,iB

∗ni,i

)−1BTni,i, and Qdl

ni[k] = Esdlni[k]sdlH

ni [k] is the covariance matrix of

the transmit symbol vector from the i-th BS intended for the n-th MS on the k-th subcarrier.

Now, (2.48) can succinctly be written as

Ini[k] = log2 det(ILni,i + Dni,iQ

dlni[k]DH

ni,iR′−1

ni [k]), (2.49)

where inter-cell interference-plus-noise covariance matrix, R′ni[k], is defined as

R′

ni[k] =G−1∑g=0g 6=i

J−1∑j=0

Λni,gBHni,iH

dlni,g[k]Vjg[k]Qdl

jg[k]VHjg[k]HdlH

ni,g[k]Bni,i + σ2I. (2.50)

Let us now consider the following lemma:


Lemma 2.13. Assuming all BSs apply the same precoding strategy, equivalent inter-cell

interference-plus-noise covariance matrix, R′ni[k], is approximated by

R′

ni[k] ≈ (ζni + σ2)ILni,i , (2.51)

where ζni = J(G − 1)EΛni,gpjg,`[k]|αni,g(`)|2, where pjg,`[k] is the power allocated on the

`-th symbol for j-th user in g-th cell on the k-th subcarrier.

Proof. This lemma can be proved by substituting (2.41) in (2.50), and by utilizing the

orthogonality property from Lemma 2.3. Details are omitted due to page limitation.

Accordingly, (2.49) results in

Ini[k] = log2 det

(ILni,i +

1

ζni + σ2Dni,iQ

dlni[k]DH

ni,i

). (2.52)

Assuming Gaussian input signal, Qdlni[k] = Esdlni[k]sdl

H

ni [k] = diagpni,0[k], . . . , pni,L−1[k],

where pni,`[k] is the power to be allocated on the `-th information symbol on the k-th sub-

carrier for the target user. Now, using Hadamard inequality, (A.58) can be rewritten as

Ini[k] = log2 Π`

(1 +

Λni,i|αni,i(`)|2pni,`[k]

ζni + σ2

)=

Lni,i−1∑`=0

log2 (1 + γni,`pni,`[k]) , (2.53)

where γni,` = Λni,i|αni,i(`)|2/(ζni + σ2). Accordingly, the optimal power allocation under

perfect DoA estimation is the well-known water-filling solution which can be expressed as

pni,`[k] = [µni,`[k]− 1/γni,`]♦, (2.54)

where [x]♦ denotes a function with [x]♦ = 0 when x < 0, and [x]♦ = x when x > 0, and

µni,`[k] is the corresponding Lagrange multiplier.


System Achievable Rate under DoA Estimation Errors

In this case, since the BS does not have perfect DoA estimation, the array steering matrix

for the n-th MS in i-th cell, in the presence of DoA estimation error, can be expressed in the

form of

Âni,i[k] =

[er,ni,i,k(0) er,ni,i,k(1) . . . er,ni,i,k(Lni,i − 1)

],

where er,ni,i,k(`) = e−j2πk`Nc a(vni,i,` + ∆vni,i,`) ⊗ a(uni,i,` + ∆uni,i,`), and ∆uni,i,` and ∆vni,i,`

represent the DoA estimation errors in the azimuth and elevation spatial frequencies for the

`-th path of the channel between the i-th BS and the n-th MS in i-th cell. Now, let us

consider the following lemma:

Lemma 2.14. For the massive FD-MIMO OFDM system, the normalized steering vectors

er,jg,i,k(`) = 1/√Nrer,jg,i,k(`) and êr,j′g′,i,k(`

′) = 1/√Nrer,j′g′,i′,k(`

′), ∀j, g, i, ` 6= j′, g′, i′, `′,

becomes orthonormal asymptotically as the number of antenna, Nr →∞ .

Proof. A similar lemma is proved in [6], and hence omitted here.

Using Lemma 2.14 we have 1Nr

ATni,i[k] Â∗jg,i[k] = 0, ∀n, i 6= j, g, and 1

NrATni,i[k] Â∗ni,i[k] =

1Nr

diager,ni,i,k(0)er,ni,i,k(0), . . . , er,ni,i,k(Lni,i−1)er,ni,i,k(Lni,i−1) for n, i = j, g. Hence,

for massive MIMO systems, we can express the optimum eigen-beamformer under imperfect

DoA estimation as

Veigni [k] =

1

Nr

Â∗ni,i[k]. (2.55)

Accordingly, achievable rate, in the presence of UL DoA estimation error, can be written as

Ini[k] = E

Lni,i−1∑`=0

log(1 + γni,` |er,ni,i,k(`)er,ni,i,k(`)|2 pni,`[k]

) , (2.56)


where pni,`[k] denotes the power to be allocated in the presence of DoA estimation error,

γni,` = Λni,i|αni,i(`)|2/(N2r (σ2 +ζni)), and the expectation is taken with respect to estimation

error. Using method of Lagrangian multiplier, the optimal expected power allocation on `-th

information symbol for the n-th MS in the i-th cell on the k-th subcarrier is given by

Epni,`[k] =

[µni,`[k]− 1

γni,`E|eTr,ni,i,k(`)e∗r,ni,i,k(`)|2

]♦, (2.57)

where µni,`(k) is the corresponding Lagrange multiplier. Finally, (2.57) can be simplified as

[6]:

Epni,`[k] =

[µni,`[k]− 1

γni,`M21M

22

(1 +

M21E [(∆vni,i,`)

2]

12

)(1 +

M22E [(∆uni,i,`)

2]

12

)]♦.

(2.58)

It can be observed from that, in the absence of DoA estimation error, the optimal power

allocation algorithm in (2.58) converges to water filling solution in (2.54). It is also to be

noted here that both power allocations in (2.54) and (2.58) take into account the effects of

inter-cell interference, unlike the single-user eigen-beamforming presented in [6].

2.3.3 Precoding Complexity Analysis

In this subsection, we briefly discuss the computational complexity of the proposed DoA-

based precoding strategy as presented in Theorem 2.12. Similarly to the Section 2.2.3,

for computational complexity analysis, here we again assume that all the channels have

L resolvable paths. Now, for forming the effective channels, Di,iATi,i[k], the number of

FLOPS required is Ea = 2(JL − 1)JLNr. Taking SVD of the effective channel requires

Eb = 4J2L2Nr + 22N3r FLOPS. Now, for constructing the final precoder, number of FLOPS


required is Ec = [2(Nr−1)NrJL]+2(JL−1)NrJL. Hence, total number of FLOPS required

for our DoA-based precoder can be written as CDoA = Ea + Eb + Ec. Next, for comparison,

we calculate the computational cost for conventional Block Diagonalization-based precod-

ing. Let Lj denote the rank of the matrix [HdlT

0i,i, . . . ,HdlT

(j−1)i,i,HdlT

(j+1)i,i, . . . ,HdlT

(J−1)i,i]T . For

complexity analysis, we assume that Lj = L;∀j. Hence, following the steps of conven-

tional block diagonalization precoding, total number of FLOPS required for BD is CBD =

J([4(J − 1)2N2t Nr + 22N3

r ] + [2(Nr − 1)Nt(Nr − L)] + [4N2t (Nr − L) + 22(Nr − L)3])

2.4 Performance Evaluation

In this section, we evaluate the ESPRIT-based UL DoA estimation for multi-cell multi-

user massive FD-MIMO OFDM networks through simulation. For simulation evaluation, we

consider seven hexagonal cells with MSs uniformly distributed in each cell. Without loss

of generality, we assume that number of co-scheduled MSs in each cell is 10. An M1 ×M2

(antenna elements in elevation direction, and antenna elements in the azimuth direction)

rectangular antenna array is assumed at the BS, whereas the mobile device has a uniform

linear array Different MSs are using non-orthogonal spreading sequences as UL pilots, and

the same pool of sequences is reused in all seven cells. Therefore, in the UL, the target BS

is subject to intra-cell interference as well as interference from MSs in six other neighboring

cells for the purpose of DoA estimation. Cell radius is set to be 1000 meters. The system

is assumed to operate at the mmWave band with 28 GHz carrier frequency. 4 dominant

clusters are assumed for each UL channel from the MS to the BS, and each cluster contributes

one resolvable path. The antenna spacing for both the received and transmit antenna arrays

is assumed to be 0.5λ. The number of transmit antennas at each MS is set to be 8. In this

paper, we invoke the far field assumption, and the wavefront impinging on the antenna array

2.4. Performance Evaluation 39

is assumed to be planer. The transmission medium is assumed to be isotropic and linear.

-4 0 4 8 12 16 20 24

SNR (dB)

10-2

10-1

100

101

RM

SE

fo

r A

ng

le E

stim

atio

n (

in D

eg

)

Simulation Result, Elevation: 16 4 Array

Analytical Result, Elevation: 16 4 Array





Figure 2.2: Elevation Angle Estimation for64 Antennas.

The estimation performance of elevation and azimuth angles for 8 × 8, 4 × 16, and 16 × 4

antenna arrays are shown in Fig. 2.2 and Fig. 2.3, respectively, where the RMSE of the

DoA estimation has been used as the performance metric, and the correlation coefficient

of spreading sequences, ρ1, is chosen to be 0.1. As the figure suggests, the analytical re-

sults of DoA estimation match well with that of empirical results asymptotically with SNR.

Furthermore, antenna array geometry has a significant impact on estimation performance.


-4 0 4 8 12 16 20 24

SNR (dB)

10-2

10-1

100

101

RM

SE

fo

r A

ng

le E

stim

atio

n (

in D

eg

)

Simulation Result, Azimuth: 16 4 Array

Analytical Result, Azimuth: 16 4 Array





Figure 2.3: Azimuth Angle Estimation for 64 Antennas.

Fig. 2.2 clearly suggests that the 16× 4 antenna array performs better than 8× 8 and 4× 16

arrays in elevation angle estimation. However, 8×8 array configuration may outperform the

4× 16 configuration in azimuth angle estimation as shown in Fig. 2.3. This is quite counter-

intuitive since the 4×16 array has more elements in the azimuth domain. The reason mainly

comes from the fact that the azimuth DoA estimation is actually coupled with elevation DoA

estimation. For the 4×16 array, the performance of the elevation DoA estimation may be so

bad that it affects the azimuth DoA estimation performance. This dependence is manifested

through Jacobians (see Remark 2.5), which, in fact, results from the underlying physics/


coordinate system of the 3D MIMO model. On the other hand, elevation estimation is

not dependent on azimuth estimation, and hence, 16 × 4 array geometry still outperforms

8× 8 array in elevation angle estimation. These observations can provide important design

intuitions for FD-MIMO networks that adopt subspace-based channel estimation methods.

The elevation and azimuth angle estimation results for 16× 16, 8× 32, and 32× 8 antenna

arrays are shown in Fig. 2.4 and Fig. 2.5, respectively. Comparing the results with those

presented in Fig. 2.2 and Fig. 2.3, we can observe that as the total number of antennas

increases, the DoA estimation accuracy accordingly increases, which is also evident from our

analytical results.

In Fig. 2.6, the average achievable sum-rates for different precoding strategies are compared

for multi-cell multi-user massive FD-MIMO networks. Five schemes are compared: the

introduced scheme presented in Theorem 5; the block-diagonalization based zero forcing (BD-

ZF) precoding method [11, 12] assuming full CSI at the BS; and three eigen-beamforming

schemes based on the large antenna system analysis. To be specific, Scheme A is the single-

user eigen-beamforming introduced in [6]. This scheme uses eigen-beamformer in (2.55),

and applies the modified water-filling power allocation presented in [6] taking into account

the DoA estimation error due to the noise. However, Scheme A doesn’t consider the effects

of intra/inter-cell interference into power allocation. In Scheme B, (2.55) is used as the

beamformer and the traditional water-filling in (2.54) is used as power allocation assuming

ideal DoA estimation. Scheme C uses the same beamformer as Scheme A and B, however, it

utilizes the power allocation in (2.58) considering the DoA estimation error due to intra/inter-

cell interference of the network. Fig. 2.6 clearly suggests that the scheme introduced in

Theorem 5 achieves best performance among all precoding strategies over the entire SNR

regime of interests. Even assuming full CSI at the BS, the BD-ZF scheme performs worst in

the medium to high SNR regime. This suggests that BD-ZF based precoding strategy may


-4 0 4 8 12 16 20 24

SNR (dB)

10-3

10-2

10-1

100

RM

SE

fo

r A

ng

le E

stim

atio

n (

in D

eg

)







Figure 2.4: Elevation Angle Estimation for256 Antennas.

yield strictly suboptimal performance for massive FD-MIMO networks. It is to be noted

here that even though BD is using full channel state information, the performance gain of

DoA-based method over BD method is coming from the fact that DoA-based method utilizes

the structure of the underlying channel, whereas BD method does not take into account the

underlying structure of the MIMO channel.

For the three eigen-beamforming schemes based on the large antenna system analysis, we

have the following observation: Scheme C outperforms both Schemes A and B over the


-4 0 4 8 12 16 20 24

SNR (dB)

10-3

10-2

10-1

100

RM

SE

fo

r A

ng

le E

stim

atio

n (

in D

eg

)







Figure 2.5: Azimuth Angle Estimation for256 Antennas.

entire SNR regime since Scheme C considers the comprehensive characterization of the DoA

estimation for power allocation as discussed in Remark 1. Scheme A performs better than

Scheme B at low SNRs indicating the importance of incorporating the DoA estimation error.

Since DoA estimation error decreases as SNR increases, both Scheme A and B approach

Scheme C asymptotically. This is because the power allocation in (2.58) converges to water-

filling solution in (2.54) with increasing SNR.

In Fig. 2.7 we compare computational complexity of our ESPRIT-based DoA estimation


0 2 4 6 8 10 12 14 16 18 20

SNR (dB)

100

101

102

Su

m-R

ate

(b

/s/H

z)

Sum-rate-maximizing Precoding (Theorem 5)

Eigenbeamforming: Scheme-B

Eigenbeamforming: Scheme C

BD-ZF with Full CSI

Eigenbeamforming: Scheme A [5]

Figure 2.6: Average Achievable Sum-Rate Comparison.


2 3 4 5 6 7 8

M1 = M

2

0

1

2

3

4

5

6

7

8

9

10

Co

mp

uta

tio

na

l C

om

ple

xity (

in F

LO

PS

)

106

ESPRIT

MUSIC

Figure 2.7: Computational ComplexityComparison for DoA Estimation Algorithms.

method with the widely used MUSIC algorithm, for a square BS antenna array (i.e., M1 =

M2). Number of transmit antennas is 8, and number of paths in the channel is 4. For MUSIC

algorithm, number of grids for candidate DoA search is 360 which is a typical number. We can

observe that as the number of antennas increases, the complexity of the MUSIC algorithm

increases much faster compared to the complexity of ESPRIT algorithm. Hence, for massive

MIMO system, MUSIC-based DoA estimation will incur significantly more computational

burden than the ESPRIT method. In Fig. 2.8, we compare the computational complexity

of the our proposed DoA-based precoding scheme (Theorem 2.12) is compared with the


4 6 8 10 12 14 16

M1 = M

2

0

1

2

3

4

5

6

7

Co

mp

uta

tio

na

l C

om

ple

xity (

in F

LO

PS

)

108

DoA-based Precoding

BD Precoding

Figure 2.8: Computational ComplexityComparison for Precoding Methods.

complexity of traditional Block Diagonalization (BD) precoding. We can observe that as

for low and mid-size arrays, the complexity of both algorithms are similar. However, as the

number of antennas increases the DoA-based precoder outperforms the BD method in term

of the computational complexity.

Chapter 3

Joint Parameter Estimation for 3D

Massive MIMO

3.1 System Model

Consider a MIMO-OFDM system in uplink with Nr receive antennas at the base station

(BS), and a single transmit antennas at the user equipment (UE). The BS antenna array is

in the form of uniform rectangular array (URA) so that the estimation degree of freedom from

both vertical and horizontal dimensions can be achieved. At UE, the high-rate information

symbols to be transmitted are grouped into blocks of length Nc. The i-th such block at the

transmitter can be represented as xi = [xi(0), xi(1), . . . , xi(Nc − 1)]T , where xi(k) denotes

the information symbol on the k-th subcarrier within i-th OFDM block at the transmitter.

The continuous-time received signal at time instant, t, for the i-th OFDM symbol can then

47

48 Chapter 3. Joint Parameter Estimation for 3D Massive MIMO

be represented as

yi(t) =L−1∑`=0

αi(`)er(`)si(t− τ`) + wi(t), (3.1)

where si(t) =∑Nc−1

k=0 xi(k)ej2πktT , and αi(`) is the complex channel gain for the `-th resolvable

path during the i-th OFDM symbol, and er(`) and τ` are, respectively, the Nr × 1 receive

antenna array response, and the path delay for the `-th tap; T is the OFDM symbol duration,

and wi(t) is the corresponding noise element. It is noteworthy here that DoAs and delays

are relatively long term statistics of the channel, and changes much slower than the channel

gains.

BS antenna array is placed in the X-Z plane, with M1 and M2 antenna elements in ver-

tical and horizontal directions, respectively. Accordingly, total number of receive antenna

elements at the base station is Nr = M1M2. Since the antenna elements are placed in a

2D plane, for each resolvable path, there will be an azimuth DoA and an elevation DoA.

Therefore, the receive antenna array response can be expressed as er(`) = a(v`) ⊗ a(u`),

where ⊗ represents Kronecker product. a(u`) =

[1 eju` . . . ej(M1−1)u`

]Tand a(v`) =[

1 ejv` . . . ej(M2−1)v`

]Tcan be viewed as the receive steering vectors for the elevation

and azimuth angles, respectively. Here, u` = 2πdλ

cos θ` and v` = 2πdλ

sin θ` cosφ` are the two

receive spatial frequencies at the base station, d is the spacing between adjacent antenna

elements, λ is the carrier wavelength, and θ` and φ` are the elevation and azimuth DoAs of

the `-th path, respectively.

Now, after sampling, the discrete time received signal at n-th time sample can be written as

yi[n] =L−1∑`=0

Nc−1∑k=0

xi(k)ej2πknNc e−j2πk∆fτ`αi(`)er(`) + wi[n], (3.2)

3.1. System Model 49

where (∆f) denotes OFDM subcarrier spacing. After taking FFT, the frequency domain

received signal at the k-th subcarrier therefore can be written as

yi[k] = xi(k)L−1∑`=0

αi(`)er(`)e−j2πk∆fτ` + wi[k]. (3.3)

Accordingly, after correlating with the transmit symbol xi(k), the received signal at the k-th

subcarrier can be denoted as yi[k] =x∗i (k)

|xi(k)|2 yi[k]. Now, stacking the correlated received signal

for all subcarriers into columns, from (3.3), we have

Yi = ADiB + Wi, (3.4)

where Yi = [yi[0], yi[1], . . . , yi[Nc − 1]] is theNr×Nc received signal matrix, A = [er(0), er(1), . . . , er(L−

1)] is the Nr × L array steering matrix, Di = diagαi(0), αi(1), . . . , αi(L− 1) is the L× L

diagonal matrix containing the complex channel gains for all the paths, Wi is the corre-

sponding Nr×Nc noise matrix, and B is the L×Nc delay-manifold matrix containing paths

delays, and given by

B =

1 ejω0 . . . ej(Nc−1)ω0

1 ejω1 . . . ej(Nc−1)ω1

......

. . ....

1 ejωL−1 . . . ej(Nc−1)ωL−1

, (3.5)

where ω` = 2π(∆f)τ` is the temporal frequency corresponding to delay, τ`.


3.2 Parameter Estimation Framework

In this section, we will construct a space-time manifold through vectorization and jointly

estimate the delay and DoAs using ESPRIT-type algorithm.

3.2.1 Joint Angle and Delay Estimation Using Standard ESPRIT

For joint angle and delay estimation (JADE) algorithm, the first step is to construct the

manifold matrix which involves all three parameters–delay, elevation angle, and azimuth

angle. Now, taking vectorization of the received signal in (3.4):

y(i)v = A(τ, θ, φ)d(i) + vec

Wi

, (3.6)

where d(i) = diagDi = [αi(0), . . . , αi(L− 1)]T , and A(τ, θ, φ) = BT A is the space-time

array manifold matrix, where B and A are the time delay matrix and array manifold matrix

respectively, with Vandermonde structure, and denotes the Khatri-Rao product, i.e., a

column-wise Kronecker product. Now, collect y(i)v for K OFDM symbols, we have

Yv = A(τ, θ, φ)S + Wv, (3.7)

where S =

[d(0) d(1) . . . d(K − 1)

]can be regarded as the equivalent transmit sig-

nal, and Wv =

[vec

W0

vec

W1

. . . vec

WK−1

]is the corresponding noise

matrix. From (3.7), we can observe that if we can utilize the shift-invariance property of

the highly structured manifold matrix, we can apply ESPRIT-type algorithms in order to

jointly estimate the unknown parameters. Next, we briefly describe the steps for 3D joint

parameter estimation through standard ESPRIT method.

3.2. Parameter Estimation Framework 51

In order to estimate w`, we should take the first and respectively the last M1M2(Nc−1) rows

of Yv as two sub-matrices, while for θ` estimation, we may take its first and respectively

last M1 − 1 rows for all NcM2 blocks. Similarly, for φ` estimation, we need to select its first

and respectively last M2 − 1 rows for all NcM1 blocks. Hence, we may define the selection

matrices as follows:

J(1)1 = [INc−1 0]⊗ IM1M2 J

(1)2 = [0 INc−1]⊗ IM1M2

J(2)1 = IM2Nc ⊗ [IM1−1 0] J

(2)2 = IM2Nc ⊗ [0 IM1−1]

J(3)1 = INc ⊗ [IM2−1 0]⊗ IM1 J

(3)2 = INc ⊗ [0 IM2−1]⊗ IM1

where J(r)1 and J

(r)2 are the two selection matrices for the r-th parameter mode, where r =

1, 2, and 3 for path-delay, elevation, and azimuth angle, respectively. Now through shift-

invariance property of the space-time manifold matrix, we can have the following shift-

relations:

J(1)1 A(τ, θ, φ)Ω = J

(1)2 A(τ, θ, φ)

J(2)1 A(τ, θ, φ)Θ = J

(2)2 A(τ, θ, φ) (3.8)

J(3)1 A(τ, θ, φ)Φ = J

(3)2 A(τ, θ, φ)

where Ω = diagejω0 , . . . , ejωL−1, Θ = diageju0 , . . . , ejuL−1, and Φ = diagejv0 , . . . , ejvL−1

are the corresponding diagonal matrices, containing, respectively, the delay, elevation, and

azimuth parameters for each path. Now, we need to perform subspace decomposition of the

received signal in (3.7) through singular value decomposition (SVD). Let the signal space

of the received signal, Yv, be denoted as Us. It can be observed that the columns of the

space-time manifold matrix, A(τ, θ, φ), also span the same L- dimensional signal subspace,

i.e., RangeA(τ, θ, φ) = RangeUs. Therefore there exist a non-singular transformation


matrix, T, such that Us = A(τ, θ, φ)T. Hence, we can write the shift-invariance equations

in (3.8) in terms of signal subspace:

J(1)1 UsΨτ = J

(1)2 Us

J(2)1 UsΨθ = J

(2)2 Us (3.9)

J(3)1 UsΨφ = J

(3)2 Us,

where Ψτ = T−1ΩT, Ψθ = T−1ΘT, and Ψφ = T−1ΦT are the three shift-invariance

operators for the path-delay, elevation and azimuth angles, respectively. Hence, from (3.9),

we can solve for the shift-invariance operators using least square (LS) or total least square

(TLS) method. Let the eigenvalues of the matrices, Ψτ , Ψθ, and Ψφ be denoted as λ`τ , λ`θ,

and λ`φ for ` = 0, 1, . . . , L−1. Hence, the temporal frequency, and the elevation and azimuth

spatial frequencies can be computed as ω` = angle(λ`τ ), u` = angle(λ`θ), and v` = angle(λ`φ).

Accordingly, the path-delays, and elevation and azimuth angles can be found by simple

parameter transformation.

It is to be noted here that the advantage of jointly estimating the angle- and delay- param-

eters is that it can work even when the number of paths exceeds the number of antennas

(P > M1M2). In order to be able to estimate all the paths in the underlying channel, in

our proposed formulation, we only need the space-time manifold matrix to be a tall one

(M1M2Nc > P ), which can easily be satisfied even for a relatively large number of paths.

3.2.2 Parameter Pairing and Channel Gains Estimation

After estimating the path-delays and the elevation and azimuth angles, we need to pair

up the respective path parameters. We can apply simultaneous schur decomposition (SSD)

in order to couple the parameters. However, the computational complexity of the SSD for

3.2. Parameter Estimation Framework 53

large antenna systems and for 3D parameter pairing is very high, and may not be feasible for

practical cellular systems. Alternatively, we can obtain the correct pairing by correlating the

eigen-vectors of the shift-invariance operators. Let the matrix containing the eigen-vectors

corresponding to the shift invariance matrices Ψτ , Ψθ, and Ψφ be denoted as Qτ , Qθ, and

Qφ, respectively. Since the delay and angle parameters stem from the same signal subspace,

the products Q−1θ Qτ and QφQτ should be close to permutation matrices. These permutation

matrices, in essence, indicate how the order of the eigenvalues of the matrices Ψθ and Ψφ

are changed with respect to the order of the eigenvalues of Ψτ . Hence, after reordering the

eigenvalues, we can obtain the correct pairing of the estimated parameters.

After pairing the delay and angles, we estimate the complex channel gains using maximum

likelihood (ML) estimator for each OFDM symbol. Let A and B denote the estimated

array steering matrix and delay manifold matrix, respectively. Now, the ML estimate of the

diagonal channel gain matrix, Di, for the i-th OFDM symbol can be written as

Di = argmaxDi

p(Yi|Di) = argminDi

∣∣∣∣∣∣Yi − ADiB∣∣∣∣∣∣2F, (3.10)

where ||X||F denotes the Frobenius norm of the matrix, X. Now,

∣∣∣∣∣∣Yi − ADiB∣∣∣∣∣∣2F

= Tr

(Yi − ADiB

)(Yi − ADiB

)H. (3.11)

Taking derivative of (3.11) with respective to Di, and setting the derivative equal to zero,

we obtain

AT Yi∗BT = AT A∗DiB

∗BT . (3.12)


Accordingly, the ML estimate of the complex channel gain matrix can be calculated as

Di =(AHA

)−1

AT Y∗i BT(BBH

)−1

(3.13)

The uplink channel then can be reconstructed by utilizing all the estimated parameters.

3.3 RMSE Characterization of the Joint Angle-Delay

Estimation

In this section, we present the theoretical analysis of the root mean square error (RMSE) of

joint angle-delay estimation massive MIMO OFDM systems. For notational simplicity, we

denote µ(1)` = ω`, µ

(2)` = u` and µ

(3)` = v`. Let us also denote the estimated temporal and

spatial frequencies as µ(1)` = ω`, µ

(2)` = u`, and µ

(3)` = v`, respectively. Define the estimation

error as 4µ(r)` = µ

(r)` − µ

(r)` , for r = 1, 2, and 3. Now subspace decomposition of the received

signal in (3.7) can be performed through SVD. Let Us and Vs denote, respectively, the left

and right singular matrices corresponding to signal subspace, and Σs denote the diagonal

matrix containing the corresponding singular values. The first order approximation of the

mean squared error of the `-th path in the r-th mode is given by [20]:

E(4µ(r)

`

)2

=1

2

(r

(r)H` ·W∗

mat ·RTnn ·WT

mat · r(r)`

−Re

r(r)T` ·Wmat ·Cnn ·WT

mat · r(r)`

), r ∈ 1, 2, 3 .

(3.14)

The vector r(r)` and the matrix Wmat are given by

r(r)` = q` ⊗

([(J

(r)1 Us

)+ (J

(r)2 /ej·µ

(r)` − J

(r)1

)]Tp`

), (3.15)

3.3. RMSE Characterization of the Joint Angle-Delay Estimation 55

Wmat =(Σ−1s VT

s

)⊗(UnU

Hn

), (3.16)

where q` is the `-th column of the transformation matrix T, p` is the `-th row of matrix T−1;

Rnn and Cnn are the noise covariance and complementary covariance matrices, respectively.

Now, let aτθφ(`) denote the normalized space-time steering vector corresponding to the `-th

path of the channel, i.e., aτθφ(`) = 1/√M1M2Ncaτθφ(`), where aτθφ(`) is the `-th column of

the space-time manifold matrix, A(τ, θ, φ). Accordingly, for the massive MIMO systems, we

can have the following Lemma [61]:

Lemma 3.1. If the elevation and azimuth angles are both drawn independently from any

continuous distribution, the normalized space-time steering vectors are orthogonal, that is,

aτθφ(i) ⊥ span aτθφ(j) | ∀i 6= j when M1M2 is large and the number of paths L = o(M1M2).

It is apparent that (3.14) relies on the singular value decomposition of the noiseless received

signal, which is difficult to obtain at the base station. In fact, it is very challenging to sim-

plify such complicated result in the multiple path scenario. Fortunately, for massive MIMO

systems, the result can be significantly simplified due to the orthogonality of the steering

vectors. Specifically, using standard ESPRIT, for the massive MIMO OFDM systems, we

have the simplified RMSE of the 3D parameter estimation as follows:

Theorem 3.2. For parameter estimation based on a uniform planar array of M1 × M2

elements, the root mean square errors of estimation of the delay, and elevation and azimuth


angles for 3D massive MIMO OFDM system are given by:

RMSEτ` =σ

2π(∆f)(Nc − 1)

√R−1ss (`, `)

KM1M2

, (3.17)

RMSEθ` =σ

π sin(θ`)(M1 − 1)

√R−1ss (`, `)

KM2Nc

, (3.18)

RMSEφ` =σ

π sin(θ`)

√R−1ss (`, `)

KNc

×

√(cot2(θ`) cot2(φ`)

(M1 − 1)2M2

+1

sin2(φ`)(M2 − 1)2M1

), (3.19)

where Rss is the covariance matrix of the equivalent transmit signal, S, and Rss(`, `) denotes

its `-th diagonal element, K is the number of OFDM symbols, ∆f is subcarrier spacing, and

σ2 is the noise variance.


3.4 Simulation Results

In this section, we evaluate the RMSE of delay and angle estimation for the 3D massive

MIMO OFDM system, and verify the accuracy of our analytical results through extensive

simulation works. To evaluate the performance of the DoA estimation, we assume there are 4

resolvable paths, which is a typical number for the outdoor millimeter-wave communication

systems at both 28GHz and 73GHz [62]. Number of subcarriers of the OFDM system is

64, and the antenna spacing for both the received and transmit antennas is assumed to be

0.5λ. The elevation and azimuth DoAs are chosen randomly from the uniform distribution:

U [65, 90] and U [0, 180], respectively. In our work, we invoke the far field assumption,

and the wavefront impinging on the antenna array was assumed to be planer. The number

3.4. Simulation Results 57

of OFDM symbols is taken to be 50. All the path gains are normalized, i.e.,∑L−1

`=0 |αi(`)|2 =

1,∀i = 0, . . . , K − 1, where K is the number of OFDM symbols, and L is the number

resolvable paths in the channel. Finally, the total available transmit power is assumed to be

unity, and the SNR is defined as the ratio of the received signal power to the noise power, i.e.

SNR = 10 log10 (1/σ2). Performances of delay, and elevation and azimuth angle estimation

for different antenna arrays are shown in figures 3.1, 3.2, and 3.3, respectively, where the

analytical RMSE results are compared with the simulation results. We can observe that as

the number of antenna increases the estimation performance also improves. Moreover, in all

cases, our analytical results match with the simulation results asymptotically. At low SNR,

however, the gap between the analytical and simulation results is higher. This is because

our analytical results are based on first order perturbation expansion [63], which, mainly

at high SNR regime (low perturbation), can be used to obtain a linear approximation of

the perturbed subspace. Therefore, if the SNR is chosen moderate to high, first order

perturbation expansion becomes accurate, and the empirical and analytical results converge

as well.

In figure 3.4, we compare performance of the minimum mean squared error (MMSE)-based

channel estimation with parametric channel estimation. The channel estimation quality is

measured by the correlation between the underlying and the estimated channel. Higher

correlation is expected to result in better system level performance. We can observe that

at low and medium SNR regime, parametric channel estimation yields significantly better

performance than the MMSE-based channel estimation, and as SNR increases both results

in correlation value of 1.


-4 0 4 8 12 16 20

SNR (dB)

10-2

10-1

100

101

102

RM

SE

for

Estim

ation (

in S

ym

bol D

ura

tion)

Simulation Result: 8 8 Array

Analytical Result: 8 8 Array



Figure 3.1: Performance of Delay Estimation.


-4 0 4 8 12 16 20

SNR (dB)

10-3

10-2

10-1

100

RM

SE

for

Estim

ation (

in D

eg)





Figure 3.2: Elevation Angle Estimation Performance.


-4 0 4 8 12 16 20

SNR (dB)

10-3

10-2

10-1

100

RM

SE

for

Estim

ation (

in D

eg)





Figure 3.3: Azimuth Angle Estimation Performance.


-20 -15 -10 -5 0 5 10 15 20

SNR(dB)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Co

rre

latio

n V

alu

e

Figure 3.4: Correlation Between Underlying True Channel and Estimated Channel.

Chapter 4

Superimposed Pilot for Massive

FD-MIMO Systems

4.1 Motivation and Literature Review

Massive-MIMO, also known as large-scale MIMO, is regarded as one of the key enabling tech-

nologies for 5G cellular networks. First introduced in [1], massive MIMO has created new

research trends not only in academia but also in industry [2]. Placing a large antenna array

at the base station (BS), 3 dimensional (3D) massive MIMO/FD-MIMO systems promise

significant gain in spectral efficiency by coherently communicating with tens of mobile sta-

tions (MSs) through aggressive spatial multiplexing. Constrained by the BS form-factor

limitation, FD-MIMO systems employ active antenna elements placed in a 2D antenna ar-

ray, and hence exploits the degrees of freedom in both elevation and azimuth domains [2].

On the other hand, due to large available spectrum, communications at millimeter wave

(mmWave) frequencies is considered as another key enabler for 5G and Beyond 5G systems.

To overcome the high path loss of mmWave channels, it is extremely important to deploy

62

4.1. Motivation and Literature Review 63

appropriate beamforming strategies. Equipped with a large antenna array at the BS, mas-

sive FD-MIMO can form very narrow 3D beams and hence can compensate for the mmWave

path loss by focusing more energy in the desired direction. In this way, massive FD-MIMO

comes as a natural partner for mmWave systems.

To harness the benefits of massive FD-MIMO, it is critical for the BSs to have accurate

downlink (DL) channel state information (CSI). For time division duplex (TDD) systems,

it is possible to obtain the DL CSI directly from the estimated uplink (UL) channel using

channel reciprocity. For mmWave channels, it has been shown in our previous work [6, 7]

that the direction of arrivals (DoAs) estimated in the UL can be directly linked to the MIMO

precoding in DL to completely avoid DL CSI feedback. Our recent work in [64, 65] extended

the analysis to the case where MSs are assigned non-orthogonal pilot sequences due to the

large number of co-scheduled MSs in massive FD-MIMO networks. Note that the strategy

of conducting DL MIMO precoding based on UL DoA has also been introduced to frequency

division duplex systems showing tremendous performance gains for practical networks [66].

In modern cellular networks, dedicated time/frequency resources are assigned to UL pilot

sequences. This approach simplifies channel estimation procedures while introducing pilot

overhead and corresponding rate-loss for the UL. Meanwhile, this UL rate-loss can be poten-

tially alleviated by superimposing pilot symbols with data symbols in the UL [67, 68, 69, 70].

Note that there is a clear trade-off between UL pilot overhead and DL throughput. A higher

UL pilot overhead will reduce UL throughput while increasing DL throughput by improving

channel estimation. Therefore, to provide a systematic view of the network throughput, in

this paper, we use the overall network achievable rate, Roverall = κULRUL + κDLR

DL, as the

performance metric. Here, RUL and RDL are the UL and DL achievable rates, respectively,

κUL and κDL are the respective weights for UL and DL. Note that the UL achievable rate

has two components: 1) the UL achievable rate during the channel estimation phase due to

64 Chapter 4. Superimposed Pilot for Massive FD-MIMO Systems

superimposed pilot and data, and 2) UL achievable rate during the data-only transmission

phase.

The detailed contributions of this paper can be summarized as the following:

• In this work, we present a novel superimposed pilot framework for mmWave 3D massive

MIMO/FD-MIMO systems [71]. Most of the works in the superimposed pilot litera-

ture, including [72, 73? ] consider Rayleigh fading channels for system performance

analysis. While this assumption often makes the analysis simpler, it is only applica-

ble for rich scattering channels. On the other hand, higher frequency channels such

as millimeter wave (mmWave) channels only have a few resolvable paths, and hence,

Rayleigh fading models do not accurately portray the characteristics of mmWave chan-

nel. In the analysis of our superposition pilot system, we adopt a parametric channel

modeling approach where the channel is represented in terms of some parameters (such

as DoA, DoD, and channel gains) corresponding to each resolvable path. This para-

metric channel modeling approach is more appropriate for mmWave channels [6, 7, 47].

All derivations and analysis carried out in this paper are based on parametric channel

modeling, and the results provide new perspectives on the design of superimposed pilot

systems for the mmWave channels.

• We present an UL DoA estimation method for mmWave FD-MIMO OFDM systems un-

der superimposed pilots. The majority of the work in literature on superimposed pilot

system adopt least square (LS) or linear minimum mean squared error (LMMSE)-based

channel estimation methods [68, 72, 74]. In this work, based on parametric channel

modeling, we estimate channel parameters namely directions of arrival (DoA) of the

resolvable multipath components instead of estimating the channel transfer function,

and demonstrate how the estimated DoA in the uplink can be utilized in both uplink

4.1. Motivation and Literature Review 65

and downlink processing. The introduced DoA based strategy is especially suitable

for superimposed pilot to reduce UL overhead. To reflect the reality, non-orthogonal

scrambling/spreading sequences are assumed in the UL.

• We analytically characterize the root mean squared error (RMSE) of uplink angle es-

timation for superimposed pilot-based 3D massive MIMO systems. The performance

of angle estimation is also connected to important physical parameters namely num-

ber of antennas at the BS, array geometry at the receiver, channel gains, correlation

coefficients among MSs’ scrambling/spreading sequences, and the power split between

the superimposed pilot and data in the UL. It is also to be noted here that compared

to our prior works [6, 7], [64], the performance of DoA estimation characterization is

much more involved and non-trivial for superimposed pilot systems.

• We characterize the overall network throughput for the superimposed pilot-based mas-

sive FD-MIMO system. To be specific, we consider jointly the DL achievable rate,

where DL precoders are designed based on UL estimated DoAs, and the UL achievable

rates in both channel estimation phase and data-only transmission phase. Impacts

of imperfect DoA estimation on uplink rate and overall rate are also investigated. In

contrast to majority of the works in literature, where uplink transmission is assumed

to be based only on superimposed pilot and data transmission phase, in this work,

we consider a practical and rather general framework for superimposed pilot systems,

where uplink transmission occurs in two phases. First, the phase where both pilots and

data symbols are transmitted. Second, the phase where only data-symbols are trans-

mitted. This general framework provides the system designer with additional control

parameters to tune for performance optimization.

• Most of the works in superimposed literature use matched filtering (MF) or maximum

ration combining (MRC) for uplink processing, which are based on complete knowledge


of channel transfer functions. On the other hand, our proposed approach in this work

uses only partial channel information for uplink symbol detection for superimposed

pilot systems. To be specifc, we utilize DoAs for uplink processing, and demonstrate

how DoA estimation error affects the uplink rate as well as overall system rates. This

uplink processing strategy with only partial channel information incurs significantly

less computational complexity compared to traditional MF or MRC-based symbol de-

tection.

• Finally, we validate the analytical results through comprehensive simulation works, and

identify important system-level design intuitions. We provide design insights which

are novel compared to results in existing literature for superimposed pilot systems. In

fact, because of using a generalized framework, we are able to identify under which

conditions what the optimal strategies are.

4.2 System and Channel Model

We consider a 3D massive MIMO OFDM network consisted of G BS. Each BS with Nr anten-

nas serves J users, where each user has Nt antennas. In the uplink, time-domain transmit

signal from each mobile station, after up-conversion, is transmitted through a frequency-

selective fading channel, which remains time-invariant for one OFDM symbol duration. In

our feedback-free system, each MS superimposes a scrambling/spreading sequence on top of

its UL data. Assume that we collect the received signal for Q symbol period. Accordingly,

in the UL at i-th BS, Nr ×Q received signal at the k-th subcarrier can be expressed as

Zi(k) =G−1∑g=0

J−1∑j=0

√Λjg,iHjg,i(k) (Xjg(k) + Sjg(k)) + Wi(k), (4.1)

4.2. System and Channel Model 67

where Hjg,i(k) denotes the Nr ×Nt channel transfer function at the k-th subcarrier for the

channel between the i-th BS and the j-th user in g-th cell ; Λjg,i represents the corresponding

large scale fading parameter, and is independent of subcarrier index; Xjg(k) denotes the

Nt × Q frequency-domain UL data matrix from the j-th MS in the g-th cell at the k-

th subcarrier, Sjg(k) is the corresponding scrambling/spreading matrix; and Wi(k) is the

Nr × Q noise element. It is to be noted here that each scrambling/spreading sequence is

of length Q and each MS will utilize Nt orthogonal sequences for uplink data transmission.

Now, the channel, Hjg,i(k), can be expressed as

Hjg,i(k) =

Ljg,i−1∑`=0

Cjg,i(`)e−j2πk`Nc , (4.2)

where Cjg,i(`) denotes the Nr × Nt channel impulse response (CIR) corresponding to the

`-th tap for the channel between j-th user in g-th cell and i-th BS. The CIR is assumed to

have a finite number (Ljg,i) of non-zero taps; Nc is the number of subcarriers.

Using parametric channel modeling for mmWave frequencies, the CIR, Cjg,i(`), can be ex-

pressed as [6, 7, 47]

Cjg,i(`) =

Pjg,i,`−1∑p=0

αjg,i(`, p)er,jg,i(`, p)eHt,jg,i(`, p), (4.3)

here αjg,i(`, p), er,jg,i(`, p), and et,jg,i(`, p) denote, respectively, the complex path gain, Nr×1

receive antenna array response, and Nt × 1 transmit antenna array response corresponding

to the p-th sub-path in `-th cluster; Hermitian operation is denoted as (·)H .

We assume the antennas at each MS is arranged in a uniform linear array (ULA). Accord-

ingly, we can express the uplink array response in a Vandermonde structure: et,jg,i(`, p) =[1 ejωjg,i,`,p . . . ej(Nt−1)ωjg,i,`,p

]T, where ωjg,i,`,p = (2π∆t/λ) cos Ωjg,i,`,p, denotes transmit


spatial frequency, λ denotes the carrier wavelength, Ωjg,i,`,p represents the direction of depar-

ture (DoD), and ∆t denotes the distance between adjacent transmit antenna elements. Each

BS is equipped with 2-dimensional (2D) antenna array placed in X-Z plane, and the number

of antennas in the vertical and horizontal directions are represented by M1 an M2, respec-

tively. Hence, total number of receive antennas at each BS is, Nr = M1M2. The receive-side

array response vector, er,jg,i(`, p), can accordingly be written as er,jg,i(`, p) = a(vjg,i,`,p) ⊗

a(ujg,i,`,p), where a(ujg,i,`,p) =

[1 ejujg,i,`,p . . . ej(M1−1)ujg,i,`,p

]Tand a(vjg,i,`,p) =[

1 ejvjg,i,`,p . . . ej(M2−1)vjg,i,`,p

]Trepresent receive array responses in elevation and azimuth

domains, and⊗ denotes the Kronecker product; elevation and azimuth spatial frequencies are

represented by ujg,i,`,p = 2π∆r

λcos θjg,i,`,p and vjg,i,`,p = 2π∆r

λsin θjg,i,`,p cosφjg,i,`,p, respectively,

where θjg,i,`,p and φjg,i,`,p denote the corresponding elevation and azimuth angles, respectively.

Finally, ∆r denotes the spacing between two adjacent receive antenna elements.

4.3 Uplink Channel Estimation and Performance Char-

acterization

4.3.1 Uplink DoA Estimation using Unitary ESPRIT

In this work, we adopt a parametric-based approach for channel estimation. In tradi-

tional channel estimation methods for example, least squared (LS) or linear minimum mean

squared error (LMMSE)-based techniques, the channel transfer function is estimated explic-

itly. Hence, for this approach, as the dimension of the channel matrix increases, estimation

overhead also increases accordingly. For massive MIMO system, because of the large num-

ber of antennas the BS, number of channel coefficient that need to be estimated using

traditional method increases with geometric progression. On the other hand, in paramet-

4.3. Uplink Channel Estimation and Performance Characterization 69

ric channel estimation approach, number of parameters that need to be estimated doesn’t

grow with number of antennas in either transmitter or receiver, and therefore, estimation

overhead is independent of the channel dimension. Moreover, it has been shown that para-

metric channel estimation approach outperforms channel transfer function estimation based

approaches (LS/LMMSE) for large antenna systems [6, 7, 60]. This makes the parametric

channel estimation an attractive solution massive MIMO or full-dimensional MIMO (FD-

MIMO) systems, especially for mmWave channels, where number of paths in the channels

are quite limited compared to sub-6 GHz channels.

Let the n-th user at i-th cell be the target user, and communicates with i-th BS.

Denote the correlation between the scrambling sequences from different users as ρ1. At i-th

BS, after correlating with the scrambling sequence for the target user, we have

Zi(k)SHni(k) =G−1∑g=0

J−1∑j=0

√Λjg,iHjg,i(k) (Xjg(k) + Sjg(k)) SHni(k) + W

′

i(k), (4.4)

where W′i(k) = Wi(k)SHni(k) denotes equivalent noise element.

Now, (4.4) can be re-written as

Zi(k)SHni(k) =√

Λni,iHni,i(k)(Xni(k)SHni(k) + γI

)+

G−1∑g=0g 6=i

√Λng,iHng,i(k)

(Xng(k)SHni(k) + γI

)

+J−1∑j=0j 6=n

√Λji,iHji,i(k)

(Xji(k)SHni(k) + ρ1γ1Nt

)

+G−1∑g=0g 6=i

J−1∑j=0j 6=n

√Λjg,iHjg,i(k)

(Xjg(k)SHni(k) + ρ1γ1Nt

)+ W

′

i(k), (4.5)


where 1Nt denotes an Nt×Nt matrix with each element being unity, and γ is the portion of

power allocated to pilot symbol; hence, power allocated on the data symbol=(1− γ).

Now, (4.5) can be expressed as

Hni,i(k) = Zi(k)SHni(k) =√

Λni,iHni,i(k)(Xni(k)SHni(k) + γI

)+ W

′′

i (k) (4.6)

where W′′i (k) denotes the equivalent noise-plus-interference matrix. Using (4.2) and (4.3),

Hni,i(k) can now be written as

Hni,i(k) =

Lni,i−1∑`=0

Pni,i,`−1∑p=0

αni,i(`, p)er,ni,i(`, p)eHt,ni,i,k(`, p) (4.7)

where et,ni,i,k(`, p) = et,ni,i(`, p)e−j2πk`Nc .

In this work, we consider millimeter wave channel, and assume that each cluster contribute

one propagation path [6, 50, 51, 52]. Hence, for notational convenience and clarity of expo-

sition, we drop the subpath index. Accordingly, (4.7) becomes

Hni,i(k) = Ani,iDni,iBHni,i(k) (4.8)

where Ani,i =

[er,ni,i(0) . . . er,ni,i(Lni,i − 1)

]is the receiver-side array steering matrix,

Dni,i = diag

[αni,i(0) . . . αni,i(Lni,i − 1)

]is the diagonal matrix with the diagonal elements

being the complex path gains, and Bni,i(k) =

[et,ni,i,k(0) . . . et,ni,i,k(Lni,i − 1)

]denotes the

transmitter-side array steering matrix. Accordingly, the channel, Hni,i(k), from (4.6), can

be expressed as

Hni,i(k) =√

Λni,iAni,iDni,iBHni,i(k)

(Xni(k)SHni(k) + γI

)+ W

′′

i (k). (4.9)


Now, the noisy channel matrix in (4.9) can concisely be expressed as

Hni,i(k) = Ani,iSni,i(k) + W′′

i (k), (4.10)

where Sni,i(k) =√

Λni,iDni,iBHni,i(k)

(Xni(k)SHni(k) + γI

). The forward-backward averaged

received signal can be expressed as:

Hfbani,i(k) =

[Hni,i(k) ΠNrH

∗ni,i(k)ΠNt

]=


∗ni,iS

∗ni,i(k)ΠNt

]+

[W′′i (k) ΠNrW

′′i

∗(k)ΠNt

], (4.11)

where A∗ denotes complex conjugate of A, and Πp represents the p× p exchange matrix.

We can therefore apply Unitary ESPRIT on (4.11) for DoA estimation [6].

4.3.2 RMSE Characterization

Let vni,i,` and uni,i,` denote the estimated elevation and azimuth spatial frequencies for `-

th tap, respectively. The corresponding estimation errors are given by ∆vni,i,` = vni,i,` −

vni,i,` and ∆uni,i,` = uni,i,` − uni,i,`. Using Lemma 2 of [64], total MSE can be written as

E

(4vni,i,`)2 =4∑

m=1

E

(4vni,i,`)2m

, where m = 1, 2, 3, 4 correspond to the MSE due to

pilot contamination, intra-cell interference, inter-cell interference, and noise element, respec-

tively. Now, for facilitating the derivation of MSE expression for superimposed pilot-based

massive FD-MIMO system, we consider the following Lemma [6]:

Lemma 4.1. Normalized array steering vectors are asymptotically orthogonal as the number

of antennas at the BS goes large, i.e., er,jg,i(m) ⊥ spaner,j′g′ ,i′ (n) | ∀(j, g, i,m) 6= (j

′, g′, i′, n)

,

where er,jg,i(m) = 1√Nr

er,jg,i(m).


Utilizing Lemma 4.1, it is possible to characterize the performance of DoA estimation for

superimposed pilot-based 3D massive MIMO systems. Specifically, following theorem ana-

lytically characterizes the MSE of the uplink angle estimation caused by pilot contamination:

Theorem 4.2. For superimposed pilot-based 3D massive MIMO system, the MSE, E(∆vni,i,`)21,

due to pilot contamination is given by

E(∆vni,i,`)21 =1

Wni,i,`

G−1∑g=0g 6=i

Λng,iαng,i

(Xng,i + ρ2

2γ(1− γ)X′

ng,i

)(Yng,i + Y

′

ng,i

)

− 4ρ22γ(1− γ)

Wni,i,`

G−1∑g=0g 6=i

Λng,iαng,iX′

ng,i<ejΦYng,i

(4.12)

where Φ = ((M1 − 1)uni,i,` + (M2 − 1)vni,i,`),Wni,i,` = 8|αni,i(`)|2N2t Λni,i(M2−1)2M2

1 , αng,i =Lng,i−1∑m=0

|αng,i(m)|2 and Xng,i and Yng,i are given by

Xng,i = Eψ

∣∣∣ρ2

√γ(1− γ)X`,1X`,2 + γX`,3

∣∣∣2 X′

ng,i = Eψ

∣∣∣(ρ2

√γ(1− γ)Nt + 1)X`,1X`,2

∣∣∣2(4.13)

where X`,1 =(1 + e−jωni,i,` + . . .+ e−j(Nt−1)ωni,i,`

), X`,2 =

(1 + ejωni,i,` + . . .+ ej(Nt−1)ωni,i,`

),

and X`,3 =(1 + e−j(ωni,i,`−ωng,i,m) + . . .+ e−j(Nt−1)(ωni,i,`−ωng,i,m)

), ρ2 is the expected correla-

tion between the data signal and spreading sequences, and

Yng,i = Eθ,φ∣∣(1 + ej(uni,i,`−ung,i,m) + . . .+ ej(M1−1)(uni,i,`−ung,i,m)

) (ejvni,i,`e−jvng,i,m − 1

)∣∣2 ,(4.14)


Y′

ng,i = Eθ,φ∣∣(ej(M1−1)ung,i,m + ejuni,i,`ej(M1−2)ung,i,m + . . .+ ej(M1−1)uni,i,`

)×(

ej(M2−1)vni,i,` − ej(M2−1)vng,i,m)∣∣2 , (4.15)

Yng,i = Eθ,φ

[ (e−j(M1−1)ung,i,m + e−juni,i,`e−j(M1−2)ung,i,m + . . .+ e−j(M1−1)uni,i,`

)×

(e−j(M2−1)vni,i,` − e−j(M2−1)vng,i,m

) (1 + . . .+ e(M1−1)(ung,i,m−uni,i,`)

)×(

ej(M2−1)(vng,i,m−vni,i,`) − 1) ]

(4.16)

for m = 0, . . . Lng,i − 1, and Eψ


Similarly, MSEs due to intra-cell interference, inter-cell interference, and noise element are

characterized in following three theorems, respectively.


due to intra-cell interference is given by

E

(4vni,i,`)22

=ρ2

1γ2 + ρ2

2γ(1− γ)

Wni,i,`

J−1∑j=0j 6=n

Λji,iX′

ji,iαji,i

(Yji,i + Y

′

ji,i − 2<ejΦYji,i

).



due to inter-cell interference is given by

E

(4vni,i,`)23

=ρ2

1γ2 + ρ2

2γ(1− γ)

Wni,i,`

G−1∑g=0g 6=i

J−1∑j=0j 6=n

Λjg,iX′

jg,iαjg,i

(Yjg,i + Y

′

jg,i − 2<ejΦYjg,i

).


Proof. This theorem can be proved similarly to Theorem 4.3.


due to noise element is given by

E

(4vni,i,`)24

=σ2(ξ |X`,1|2 + γ2Nt

)2|αni,i(`)|2N2

t Λni,i(M2 − 1)2M1

,

where σ2 is the noise variance, and ξ = Ntρ22γ(1− γ) + 2ρ2γ

√γ(1− γ).

Proof. This proof follows the line of proof for Theorem 1 in [6].

Remark 4.6. Theorems 1-4 explicitly shows how MSE depends on a few physical parameters

namely number of antennas, channel gains, correlation between pilot and data sequences,

and ratio of power allocated on pilot and data symbols. Moreover, these results clearly depict

how antenna configurations affect the estimation perfomance– placing the same total number

of antennas in different orientation will result in different DoA estimation performance for

elevation and azimuth angles.

Remark 4.7. It can be observed from Theorem 1 that MSE due to pilot contamination

does not depend on ρ1, correlation between scrambling sequences for different users. This

is due to the fact that pilot contamination results from users in other cells that use exactly

same spreading sequences as that of target user. On the other hand, from Theorems 2 and

3, it is obvious that MSEs due to intra-cell and inter-cell interference are affected by, ρ1.

4.4. Achievable Rate Analysis 75

Pilot + Data Data-Only

𝑄" Symbols 𝑄# Symbols

Figure 4.1: Uplink Transmission Phases in Superimposed Pilot System.

4.4 Achievable Rate Analysis

4.4.1 Uplink Rate Analysis

We consider a practical scenario, where uplink transmission occurs in two phases (see

Fig. 4.1). First, uplink channel estimation phase with superimposed pilots and data, and

second, uplink data-only transmission phase. Unlike the conventional channel estimation

approaches, where only pilots are transmitted during channel estimation phase, the super-

imposed pilot-based approaches can have non-zero data rate during uplink channel estimation

phase. We assume the number of symbols used for superimposed pilot and data transmission

is Qs, and the number of symbols used for data-only transmission is Qd. Let δs and δd denote

the ratio of symbols used during the channel estimation and data-only transmission phases,

respectively, i.e., δs = Qs/(Qs +Qd) and δd = Qd/(Qs +Qd) = 1− δs. Accordingly, average

uplink spectral efficiency for the i-th cell can be written as

IULi = δsIul,sd

i + (1− δs)Iul,ddi , (4.17)

where Iul,sdi and Iul,dd

i are the uplink rates corresponding to superimposed pilot-based channel

estimation phase and dedicated data-only transmission phase, respectively.

In this work, we assume that the path gains are known apriori to the BS. In the uplink, DoA

is the only channel parameters that BS needs to estimate. BS utilizes the uplink estimated


DoA for uplink data detection. On the other hand, at the UE side, each UE knows its own

DoD and precodes the uplink data with the DoD steering vectors. In the downlink, with

the TDD operation, BS utilizes the uplink estimated DoA for downlink precoding. On the

other hand, UEs use their own DoD steering matrices for downlink receive processing. Note

that in order to detect symbols in the uplink or do precoding in the downlink, the BSs do

not need the uplink DoD information.

It is to be noted that even though in this work we assume the path gains are known apriori,

in practice, based on estimated angle information, path gains can also be estimated using

maximum likelihood (ML) method [60].

Uplink Rate for Channel Estimation Phase

From (4.1), the Nr × 1 uplink received signal at i-th base station for k-th subcarrier at the

q-th symbol, zqi (k), can be written as

zqi (k) =G−1∑g=0

J−1∑j=0

√Λjg,iHjg,i(k)

(xqjg(k) + sqjg(k)

)+ wq

i (k), (4.18)

where xqjg(k) and sqjg(k) are the Nt×1 transmitted data and pilot signal vectors, respectively,

from the j-th user in the g-th cell for k-th subcarrier at the q-th symbol. Let us assume

that n-th user in the i-th cell is the target user whose signal the i-th base station wants to

detect, and the mobile devices in the uplink use their own DoDs to precode the information


symbols. Now, from (4.18), we have

zqi (k) =√

Λni,iAni,i(k)Dni,i (xqni(k) + sqni(k)) +

J−1∑j=0j 6=n

√Λji,iAji,i(k)Dji,i

(xqji(k) + sqji(k)

)

+G−1∑g=0g 6=i

J−1∑j=0

√Λjg,iAjg,i(k)Djg,iB

Hjg,i

(BHjg,g

)+ (xqjg(k) + sqjg(k)

)+ wq

i (k), (4.19)

where xqni(k) and sqni(k) are the Ns×1 unprecoded data and pilot information symbol vector

in the superimposed channel estimation phase, respectively, and ()+ denotes matrix pseudo-

inverse operation. For detecting the data from the n-th user in i-th cell, the i-th base station

first subtracts the corresponding pilot signal from the received signal by utilizing uplink

estimated DoAs:

zqi (k)−√

Λni,iAni,i(k)Dni,isqni(k)

=√

Λni,iAni,i(k)Dni,ixqni(k) +

√Λni,iAni,i(k)Dni,is

qni(k)−


qni(k)

+J−1∑j=0j 6=n

√Λji,iAji,i(k)Dji,i

(xqji(k) + sqji(k)

)

+G−1∑g=0g 6=i

J−1∑j=0


Hjg,i

(BHjg,g


)+ wq

i (k). (4.20)

Now, after correlating with target user’s estimated array steering matrix, we have

zqni(k) =1

Nr

AHni,i(k)

(zqi (k)−


qni(k)

)=

√Λni,i

Nr

AHni,i(k)Ani,i(k)Dni,ix

qni(k) + pself

ni,q(k) + pintrani,q (k) + pinter

ni,q (k) + wqi (k),

(4.21)


where wqi (k) = 1

NrAHni,i(k)wq

i (k) is the noise term, and pselfni,q(k), pintra

ni,q (k), and pinterni,q (k) are

the self interference, intra-cell interference, and inter-cell interference, respectively, and given

by

pselfni,q(k) =

√Λni,i

Nr

(AHni,i(k)

(Ani,i(k)− AH

ni,i(k))

Dni,isqni(k)

)(4.22)

pintrani,q (k) =

J−1∑j=0j 6=n

√Λji,i

Nr

AHni,i(k)Aji,i(k)Dji,i

(xqji(k) + sqji(k)

)(4.23)

pinterni,q (k) =

G−1∑g=0g 6=i

J−1∑j=0

√Λjg,i

Nr

AHni,i(k)Ajg,i(k)Djg,iB

Hjg,i

(BHjg,g


). (4.24)

It is to be noted here that the uplink self interference, pselfni,q(k), during the uplink data

detection phase is caused due to the mismatch between uplink estimated DoAs and true

uplink DoAs. Following theorem characterizes the uplink rate during channel estimation

phase assuming perfect channel estimation at the BS.

Theorem 4.8. For superimposed pilot-based 3D massive MIMO systems, uplink achievable

rate corresponding to n-th user at k-th subcarrier during superimposed pilot+data transmis-

sion phase and with perfect CSI acquisition is given by

Iul,sdni [k] =

Lni,i−1∑`=0

log2

(1 + γni,`p

ul,sdni,` [k]

), (4.25)

where γni,` = Λni,i|αni,i(`)|2/(σ2), Lni,i denotes the number of symbols, and pul,sdni,` [k] is the

uplink power allocated on the `-th superimposed data symbol during the channel estimation

phase.


Remark 4.9. Note that the rate expressions here are based on large antenna approximation.


Hence, in the strict sense, the rate in (4.25) is not the exact achievable rate. However,

following the tradition in massive MIMO literature, where the large antenna assumption is

very common in rate analysis, we stick to the convention and call it ’achievable rate’.

In the presence of DoA estimation error, uplink rate performance can be affected. Next

theorem relates uplink rate with estimated DoAs at the BS.

Theorem 4.10. For superimposed pilot system, uplink achievable rate corresponding to n-

th user at k-th subcarrier during superimposed pilot+data transmission phase and in the

presence of DoA estimation error is given by

Iul,sdni [k] = E

Lni,i−1∑`=0

log

1 +

1N2rγni,`

∣∣eHr,ni,i,k(`)er,ni,i,k(`)∣∣2 pul,sdni,` [k]

(σ2 + γni,`

∣∣∣ 1Nr

eHr,ni,i,k(`)er,ni,i,k(`)− 1∣∣∣2 pul,spni,` [k])

, (4.26)

where the expectation is taken with respect to DoA estimation error, γni,` = Λni,i|αni,i(`)|2,

and pul,sdni,` [k] and pul,spni,` [k] are the transmit powers during the uplink channel estimation phase

allocated on the data and pilot symbols, respectively.


Remark 4.11. Theorem 4.10 shows that unlike the perfect channel estimation case, under

DoA estimation error, the uplink rate is affected by pilot symbols superimposed on the data.

Employing more power on the superimposed pilot symbols, although improves the channel

estimation quality, negatively affects the uplink rate during the channel estimation phase.

Uplink Rate for Data-only Transmission Phase

In this subsection, we discuss on the uplink achievable rate for the data-only transmission

phase. The Nr × 1 received signal at k-th subcarrier on the q′-th symbol during the uplink


data-only transmission phase can be written as

zq′

i (k) =√

Λni,iAni,i(k)Dni,ixq′

ni(k) +J−1∑j=0j 6=n

√Λji,iAji,i(k)Dji,ix

q′

ji(k)

+G−1∑g=0g 6=i

J−1∑j=0


Hjg,i

(BHjg,g

)+xq′

jg(k) + wq′

i (k), (4.27)

where the first and second summation terms in (4.27) represent intra- and inter-cell inter-

ference during uplink data transmission phase, respectively, and wq′

i (k) denotes the corre-

sponding noise element. Uplink rate corresponding to the data-only transmission phase is

characterized in the following theorem assuming perfect CSI availability at the BS.

Theorem 4.12. For superimposed pilot system, uplink achievable rate corresponding to n-th

user at k-th subcarrier during data-only transmission phase and with perfect CSI is given by

Iul,ddni [k] =

Lni,i−1∑`=0

log2

(1 + γni,`p

ul,ddni,` [k]

), (4.28)

where pul,ddni,` [k] is the power allocated on the `-th symbol during the data-only tranmission

phase.


It can be observed from Theorem 4.8 and Theorem 4.12 that assuming perfect DoA es-

timation, uplink rates for channel estimation phase and data-only transmission phase are

similar except for the corresponding data power allocation during the two phases. Following

theorem now characterizes the uplink rate during data-only transmission phase taking DoA

estimation error into account.


Theorem 4.13. For superimposed pilot system, uplink achievable rate corresponding to n-

th user at k-th subcarrier during data-only transmission phase and in the presence of DoA

estimation error is given by

Iul,ddni [k] = E

Lni,i−1∑`=0

log

(1 +

1N2rγni,` |er,ni,i,k(`)er,ni,i,k(`)|2 pddni,`[k]

σ2

) , (4.29)

pul,ddni,` is the corresponding power allocated at `-th symbol during the data-only transmission,

and the expectation is taken with respect to uplink DoA estimation error.


4.4.2 Optimum Downlink Precoding

At the i-th BS, for downlink transmission, the Ns × 1 frequency domain information vector

intended for the n-th MS can be denoted as sdlni[k] =

[sdlni,0[k], . . . , sdlni,Ns−1[k]

]T, where sdlni,p[k]

is the p-th downlink information at the k-th subcarrier for the n-th MS in i-th cell. Hence, we

can express the transmitted Nr×1 downlink signal from the i-th BS as xdli [k] =∑J−1

j=0 xdlji[k],

where xdlji[k] = Vdlji[k]sdlji[k], and Vdl

ji[k] is the Nr×Ns downlink precoder corresponding to the

j-th MS in i-th cell at the k-th subcarrier. At the n-th MS in i-th cell, the Nt × 1 downlink

received signal at the k-th subcarrier, ydlni[k], can be expressed as

ydlni[k] =G−1∑g=0

√Λni,gH

dlni,g[k]xdlg [k] + ndlni[k] =

G−1∑g=0

J−1∑j=0

√Λni,gH

dlni,g[k]Vdl

jg[k]sdljg[k] + ndlni[k]

=√

Λni,iHdlni,i[k]Vdl

ni[k]sdlni[k] + I intra,dlni + I inter,dl

ni + ndlni[k], (4.30)

where Hdlni,g[k] is the Nt × Nr channel transfer function at the k-th subcarrier correspond-


ing to the downlink channel between the n-th MS in i-th cell and the g-th BS; I intra,dlni =

J−1∑j=0j 6=n

√Λni,iH

dlni,i[k]Vdl

ji[k]sdlji[k] and I inter,dlni =

G−1∑g=0g 6=i

J−1∑j=0

√Λni,gH

dlni,g[k]Vdl

jg[k]sdljg[k] repreesnt, re-

spectively, the intra- and inter-cell interferences; ndlni[k] is the Nt × 1 noise vector with

Endlni[m]ndlni[n] = σ2INtδ(m − n). First term in (4.30) is the desired signal for the n-th

user in the i-th cell.

The n-th user’s downlink rate can accordingly be written as [64]:

Idlni = log2 det

(ILni,i + Dni,iQ

dlni[k]DH

ni,iR′−1

ni [k]), (4.31)

where Dni,i =√

Λni,iDni,i, Qdlni[k] = Esdlni[k]sdl

H

ni [k] is the covariance matrix of the transmit

symbol vector from the i-th BS intended for the n-th MS on the k-th subcarrier, and R′ni[k]

denotes the inter-cell interference-plus-noise covariance matrix.

Hence, under the total power constraint, Pt, for each subcarrier, the downlink sum-rate for

the i-th BS at the k-th subcarrier can be expressed as

Idli [k] =

J−1∑j=0

Idlji [k]. (4.32)

It is to be noted here that precoders, Vdljg,where j = 1, . . . , J, and g = 1, . . . , G, in (4.32)

are constructed based on the uplink DoAs in the BS. The precoder and power allocation that

maximize the downlink rate in (4.32) is presented in [64] taking the uplink DoA estimation

error into account. For the completeness of the exposition, in Section V, we utilize the

theoretical results from [64] for investigating downlink and overall rate performance for

superimposed pilot-based massive MIMO systems. It is to be noted that for investigating

downlink rate performance, the BS is assumed to have the perfect knowledge of the complex

channel gains. However, once the uplink DoAs are estimated, we can estimate the channel


-5 0 5 10 15 20 25

SNR (dB)

10-2

10-1

100

101

RM

SE

fo

r A

ng

le E

stim

atio

n (

in D

eg

)

Simulation Result, Elevation: 8 8 Array with Data Power=0.3

Analytical Result, Elevation: 8 8 Array with Data Power=0.3



Figure 4.2: Elevation Angle Estimation for 64 Antennas.

gains based on estimated DoAs. Our work on this aspect is presented in [60].

4.5 Performance Evaluation

In this section, we present the performance evaluation for superimposed pilot based 3D

massive MIMO systems. We consider seven hexagonal cells with 10 users in each cell. We

assume each channel has four dominant paths, and antennas at both BS and user sides are

0.5 wavelengths apart.


-5 0 5 10 15 20 25

SNR (dB)

10-2

10-1

100

101

RM

SE

fo

r A

ng

le E

stim

atio

n (

in D

eg

)

Simulation Result, Azimuth: 8 8 Array with Data Power=0.3

Analytical Result, Azimuth: 8 8 Array with Data Power=0.3



Figure 4.3: Azimuth Angle Estimation for 64 Antennas.


The RMSE of DoA estimation for 8×8 antenna arrays are presented in Fig. 4.2 and Fig. 4.3

for elevation and azimuth angles, respectively. Number of antennas at each user is 8. The

correlation coefficient of spreading sequences, ρ1, is takes as 0.3, and correlation between data

and pilot vectors is 0.1. We can observe from the figures that the theoretical and analytical

results match very closely. Results for 16 × 4 antenna array are shown in Fig. 4.4, which,

after comparing with 8× 8 case, indicates that antenna array geometry plays a critical role

in DoA estimation performance. Results corresponding to 16 × 16, and 32 × 8 arrays are

presented in Fig. 4.5 and Fig. 4.6, respectively. We can clearly see that DoA estimation

performance significantly improves with the number of antennas at the BS. In our work,

the array response vectors become orthogonal asymptotically as the number of antennas at

the BS goes large. In practice, this asymptotic assumption holds even for relatively small

antenna arrays, for example, with a total number of 64 antennas (8× 8 or 4× 16 or 16× 4)

[6, 7].

We next investigate the effects of superimposed pilots on uplink achievable rate. From (4.17),

uplink rate can be written in terms of rate achieved during superimposed pilot-based channel

estimation phase and uplink data-only transmission phase:

IUL = δsIul,sd + δdIul,dd, (4.33)

where, again, δs and δd are the ratio of symbols used for channel estimation phase and data

transmission phase, respectively.

Figure 4.7 presents cumulative distribution function (CDF) for uplink rate where SNR level

is fixed at -5 dB, and the values for δs and δd are chosen to be 0.1 and 0.9, respectively.

For the uplink channel estimation phase, sum of powers allocated on the pilots and data

symbols is assumed to be unity. The figure shows the curves as we vary pilot power during


-5 0 5 10 15 20 25

SNR (dB)

10-2

10-1

100

101

RM

SE

fo

r A

ng

le E

stim

atio

n (

in D

eg

)





Figure 4.4: Angle Estimation for 16× 4 Antenna Array.


-5 0 5 10 15 20 25

SNR (dB)

10-3

10-2

10-1

100

RM

SE

fo

r A

ng

le E

stim

atio

n (

in D

eg

)





Figure 4.5: Elevation Angle Estimation for 256 Antenna Elements.


-5 0 5 10 15 20 25

SNR (dB)

10-3

10-2

10-1

100

RM

SE

fo

r A

ng

le E

stim

atio

n (

in D

eg

)





Figure 4.6: Azimuth Angle Estimation for 256 Antenna Elements.


0 0.5 1 1.5 2 2.5

Data Rate (bits/s/Hz)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CD

F

Uplink CDF for s= 0.1,

d= 0.9, SNR= -5 dB

Orthogonal Pilot Transmission

Data power = : 0.1, Pilot power = : 0.9





Figure 4.7: Uplink Rate CDF when δs = 0.1 and δd = 0.9, and SNR= -5 dB.


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CD

F


d= 0.1, SNR= -5 dB







Figure 4.8: Uplink Rate CDF when δs = 0.9 and δd = 0.1, and SNR= -5 dB.


the channel estimation phase from 0.1 to 0.9 ( i.e., vary the data power from 0.9 to 0.1).

We can observe that as the data power during the channel estimation phase is increased,

the rate performance actually worsens. This is quite counter-intuitive. The reason behind

this phenomenon is as following: figure 4.7 depicts the scenario where most of the data

symbols are reserved for data-only transmission, while only 10% of the symbols are used

for channel estimation phase. In order to successfully decode the data symbols, accurate

DoA estimation is required since array steering matrices are utilized as the uplink receive

processing matrices. On the other hand, at -5 dB SNR level, the uplink DoA estimation

performance is not very good as we can observe from figure 4.2 to figure 4.6. Hence most

of the power during the superimposed channel estimation phase should be deployed on pilot

symbols in order to improve the channel estimation quality. Hence one can see a trade-off:

even though we are sacrificing the 10% data symbols that are superimposed with the pilots,

it is still better to employ most of the power on pilot symbols for better channel estimation

in order to secure the other 90% of the data symbols.

Figure 4.8 shows the uplink rate CDF where δs = 0.9 and δd = 0.1, and the SNR level is fixed

at same -5 dB. It can be observed that as the pilot power is increased from 0.1 to 0.5, the rate

increases. However, if we further increase the pilot power (i.e., decrease the superimposed

data power), the rate starts decreasing. This is quite different than what we observed in

figure 7. The reason is that unlike figure 7, in figure 4.8, most of the data symbols are

used during the superimposed channel estimation phase, and only 10% of the data symbols

are used during the data-only transmission phase. Hence, once a certain level of channel

estimation quality is ensured, we should employ rest of the power on the superimposed data

during channel estimation.

In Fig. 4.7 and Fig. 4.8, we also plot the traditional orthogonal pilot transmission scheme,

which corresponds to the case where data power = 0 and pilot power = 1. We can see from


0 20 40 60 80 100 120 140 160 180


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CD

F


d= 0.9, SNR= 20 dB







Figure 4.9: Uplink Rate CDF when δs = 0.1 and δd = 0.9, and SNR= 20 dB.

Fig. 4.7 that superimposed pilot strategies (non-zero data power during phase 1 transmission)

do not provide too much advantage over the traditional orthogonal pilot transmission scheme.

This observation is consistent with that in [74]. On the other hand, Fig. 4.8 depicts the

scenario for δs = 0.9, and δd = 0.1. In contrast to Fig. 4.7, it can be seen from Fig. 4.8 that

having a balanced power allocation between data and pilot symbols is important for obtaining

higher uplink rate. In this scenario, superimposed pilot strategy clearly outperforms the

traditional orthogonal pilot scheme in all SNR regime.

The uplink rate CDFs for SNR = 20 dB are shown in figures 4.9 and figure 4.10. Figure 4.9

depicts the scenario where δs = 0.1 and δd = 0.9, i.e., most of the data symbols are reserved


0 20 40 60 80 100 120


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CD

F


d= 0.1, SNR= 20 dB







Figure 4.10: Uplink Rate CDF when δs = 0.9 and δd = 0.1, and SNR= 20 dB.


for data-only transmission. Comparing it with Figure 4.7, we can observe that performance

is worst if the power allocation on the pilots is too low such as pilot power = 0.1 in this

figure. Hence, certain level of power needs to be employed on superimposed pilot symbols for

having better channel estimation. However, as we keep increasing the pilot power allocation,

in contrast to Figure 4.7, the achievable rate performance becomes very similar. The reason

behind this is that at high SNR, the DoA estimation performance is quite good, as can also

be observed in figure 4.2 to figure 4.6, and it doesn’t require too much power on the pilot to

attain a good channel estimation. Figure 4.10, on the other hand, presents the uplink rate

CDFs for the case where δs = 0.9 and δd = 0.1. This depicts the scenario where majority of

the data symbols are used during the channel estimation phase superimposed with the pilot

symbols. In contrast to low-SNR case of figure 4.8, figure 4.10 shows that in the high-SNR

regime, the rate performance increases as we keep increasing power on the superimposed data

symbols during channel estimation phase. Figure 4.11 shows the uplink rate vs SNR plots

for different data-pilot power allocation ratios and for the case where δs = 0.1, and δd = 0.9.

We can observe that the performance improves as we put more power on the superimposed

pilots. However, as the SNR increases the performance-gap among different power splitting

ratios gradually decreases. Figure 4.12 presents the results for uplink rate vs SNR plots for

δs = 0.9 and δd = 0.1. We can observe that as SNR increases the performance also gets

better. In this scenario, it can be clearly seen that at low SNR regime, while it is detrimental

to invest too little power on the superimposed pilot symbols, it also hurts the data-rate if too

much power is invested on the pilot. This observation provides important design intuitions

for the system architects. At high SNR regime, as is obvious from Figure 4.12, it is prudent

to employ more power on the superimposed data symbols.

The downlink rate CDFs are shown in figure 4.13 and figure 4.14. Figure 4.13 depicts the

case where equal number of symbols are used for the pilot and data symbols, i.e., δs = 0.5


-5 0 5 10 15 20

SNR (dB)

10-1

100

101

102

Ra

te (

bits/s

/Hz)

Uplink Rate when s= 0.1,

d= 0.9







Figure 4.11: Uplink Rate vs SNR when δs = 0.1 and δd = 0.9


-5 0 5 10 15 20

SNR (dB)

10-1

100

101

102

Rate

(bits/s

/Hz)

Uplink Rate when s= 0.9,

d= 0.1







Figure 4.12: Uplink Rate vs SNR when δs = 0.9 and δd = 0.1


0 10 20 30 40 50 60 70 80


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CD

F

Downlink CDF for s= 0.5,

d= 0.5, SNR

UL= 20dB, SNR

DL= 10dB







Figure 4.13: Downlink Rate when the uplink SNR=20 dB


0 50 100 150 200


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CD

F

Downlink CDF for s= 0.5,

d= 0.5, SNR

UL= 5, SNR

DL= 20







Figure 4.14: Downlink Rate when the uplink SNR=5 dB


and δd = 0.5. Downlink SNR level is fixed at 10 dB, while uplink SNR during channel

estimation phase is kept at 20 dB. We can observe that if the uplink pilot power during

channel estimation is too low, the downlink data rate suffers. The reason is that downlink

precoder is based on uplink estimated DoAs. With too little power on the uplink superim-

posed pilot symbols, uplink channel estimation quality degrades, and accordingly, downlink

rate is also affected. On the other hand, at 20 dB uplink SNR, after ensuring a certain

power level, further increasing the allocation for uplink superimosed pilot symbols doesn’t

cause too much variation in the downlink rate performance. Figure 4.14 shows the case for

lower uplink SNR level (5 dB). Superimposed pilot and data power splitting ratios are kept

same as in figure 4.13, i.e., δs = 0.5 and δd = 0.5, where as downlink SNR level is fixed at

20 dB. In this low uplink SNR case, we can observe that putting more power on the pilot

symbols increases the downlink rate performance since at low uplink SNR scenario, it takes

more power on the pilot symbols in order to achieve a good DoA estimation, which, again,

directly affects downlink rates through the downlink precoder design. Finally, comparing

figure 4.13 and figure 4.14, one can see that as the downlink SNR is increased, as expected,

downlink rate also increases.

The overall achievable rate, Ioverall, can be expressed in terms of uplink achievable rate, IUL,

and downlink achievable rate, IDL, as follows:

Ioverall = κULIUL + κDLIDL = κUL

(δsIul,sd + δdIul,dd

)+ κDLIDL, (4.34)

where κUL and κDL represent the weights/priorities for the uplink and downlink rates, re-

spectively, and κUL + κDL = 1. In figure 4.15, we present the overall rate vs power allocated

on the superimposed data during channel estimation. Plots for different uplink-downlink

priorities are shown for the scenario where δs = 0.1 and δd = 0.9. We can observe that

as we increase the downlink priority, overall rate increases. Moreover, as we increase the


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Data Power, (1- )

140

145

150

155

160

165

170

175

180

185

Ra

te (

bits/s

/Hz)

Total Rate when s= 0.1, and

d= 0.9

ul

= 0.9, dl

= 0.1

ul

= 0.5, dl

= 0.5

ul

= 0.1, dl

= 0.9

Figure 4.15: Total Rate when δs = 0.1


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Data Power, (1- )

20

40

60

80

100

120

140

160

180

Ra

te (

bits/s

/Hz)

Total Rate when s= 0.9, and

d= 0.1

ul

= 0.9, dl

= 0.1

ul

= 0.5, dl

= 0.5

ul

= 0.1, dl

= 0.9



power allocated on the superimposed data, the rate initially increases slowly. However, after

a certain point, the overall rate keeps dropping. This is the point where uplink channel

estimation quality becomes quite bad since too little power is left for the superimposed pilot

symbols. Overall rate vs superimposed data power for the case where δs = 0.9 and δd = 0.1

is shown in figure 4.16. Comparing with figure 4.15, we can see that overall rate for this case

is worse than that of figure 4.15 for all priority levels. Also, similar to figure 4.15, the rate

drops after the superimposed data power goes above a threshold (0.8 in this case). Finally,

overall rate vs uplink priority level is shown in figure 4.17 and figure 4.18 for the cases where

δs = 0.1 and δs = 0.9, respectively. We can observe that as the uplink priority increases (i.e.,

downlink priority decreases), overall rate also decreases. Moreover, increasing power on the

superimposed data symbols from 0.1 to 0.7 increases the overall data rate.

Finally, we provide comparison between DoA-based and tradi2tional strategies for superim-

posed pilot system. For superimposed pilot systems, a popular strategy in the literature is

to estimate the channel using least square (LS) and utilize matched filtering (MF) method

for uplink receive processing. In Fig. 4.19, we compare the the proposed DoA based strategy

with the conventional LS-MF based method for uplink rate performance, where data power

and pilot power are fixed at 0.7 and 0.3, respectively. We can clearly observe that for both

δs = 0.5 and δs = 0.8, the DoA based strategy outperforms the conventional LS-MF based

method for superimposed pilot-based 3D massive MIMO systems.

4.6 Summary of Chapter 4

In this work, a superimposed pilot based massive FD-MIMO network is introduced and

the corresponding network performance is investigated. Both UL and DL achievable rates

are considered in the analysis to reflect the impact of UL pilot overhead on the network

4.6. Summary of Chapter 4 103

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Uplink Priority Factor, UL

166

168

170

172

174

176

178

180

182

184

Ra

te (

bits/s

/Hz)

Total Rate when s= 0.1 and

s= 0.9


Data Power= 0.1, Pilot power = : 0.9






0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Uplink Priority Factor, UL

20

40

60

80

100

120

140

160

180

Rate

(bits/s

/Hz)

Total Rate when s= 0.9 and

s= 0.1







4.6. Summary of Chapter 4 105

-5 0 5 10 15 20

SNR (dB)

10-1

100

101

102

Rate

(bits/s

/Hz)

Uplink Rate when Data power = 0.7, Pilot power = : 0.3

DoA-based: s= 0.8,

d= 0.2

LS-based: s= 0.8,

d= 0.2

DoA-based: s= 0.5,

d= 0.5

LS-based: s= 0.5,

d= 0.5

Figure 4.19: Uplink Transmission Phases in Superimposed Pilot System.

performance. The performance of UL DoA estimation is analytically characterized for su-

perimposed pilot based massive FD-MIMO networks and is linked to both UL and DL

achievable rate analysis. Both analytical and numerical evaluation suggest that the intro-

duced superimposed pilot based massive FD-MIMO network can significantly reduce the UL

pilot overhead to achieve a good trade-off between UL and DL achievable rates.

Chapter 5

MIMO Broadcast-Beam Optimization

Through DRL

5.1 Network Model and Problem Statement

We consider a cellular network consisting of G BSs and K UEs. We assume the BSs can

have one or multiple sectors, and there are total M sectors in the network, where M ≥ G.

Each sector is equipped with a two dimensional (2D) antenna array whose phases can be

configured so that different array-beam widths (in both elevation and azimuth domain) and

elevation tilt (e-tilt) angle can be updated. Placing 2D antenna array enables the BSs to

beamform in both elevation and azimuth directions, and this is essentially the setup for full

dimension (FD) MIMO systems [2]. The elevation beam-width, φ, azimuth beam-width, ψ,

and e-tilt angle, ζ, constitute the parameter set in constructing the broadcast beams for

each sector. In this work, we focus on optimizing the broadcast beams/sector-wide beams

for cellular network. Let us denote the number of antenna elements in elevation and azimuth

directions by N1 and N2, respectively. Hence, total N = N1N2 number of antenna weights

106

5.1. Network Model and Problem Statement 107

need to be tuned for generating the FD-MIMO broadcast beams. We can represent the

N1 × N2 antenna weight matrix into a N × 1 weight vector, w, following a vectorization

operation. Each choice of weight vector, w, in fact, consists of a specific choice of φ, ψ, and

ζ. A collection of notations used in this paper is summarized in Table 5.1.

Table 5.1: Notation for System Variables

Variable Notation

No. of BSs GNo. of Sectors M

No. of UEs KElevation beam-width φAzimuth beam-width ψ

E-tilt angle ζNo. of antennas at the BSs in elevation direction N1

No. of antennas at the BSs in azimuth direction N2

Total no. of antennas NBroadcast signal from m-th BS xm

Broadcast beamforming vector for m-th BS fmReceived signal at k-th UE yk

Channel between m-th BS and k-th UE hm,kBeam-pool W

No. of possible beams in beam-pool Jj-th beam-weight vector in beam-pool wj

n-th antenna weight in j-th beam wjnUEs’ SINR threshold for connectivity T

Assuming each UE has a single antenna, the downlink broadcast received signal at k-th UE

under m-th cell-sector can be written as

yk = hTm,kfmxm +M∑

m′=1m′ 6=m

hTm′,kfm′xm′ + nk, (5.1)

where hm,k is the N×1 channel vector for the channel between m-th sector and the k-th UE,

xm is the broadcast signal from m-th sector, and fm is the corresponding N × 1 broadcast

precoding vector for m-th sector. It can be clearly observed from (5.1) that broadcast beams

108 Chapter 5. MIMO Broadcast-Beam Optimization Through DRL

from one sector interfere with the beams from other sectors. Hence, in order to maximize the

network coverage, selecting the appropriate broadcast beams for all the sectors is critical.

In this work, we adopt a DRL-based approach where an agent is responsible for selecting

the proper antenna configurations for all sectors [75]. Each BS, for its sectors, has the same

pool of possible antenna weight vectors available,W : w1,w2, . . . ,wJ, where J is the total

number of beam-weight vectors in the pool; wj = [wj1, wj2, . . . , w

jN ] is the j-th vector in the

beam pool, and wqn is the antenna weight for the n-th antenna element corresponding to q-th

weight vector. Accordingly, each sector chooses its precoder, f , from the beam pool, i.e.,

fm ∈ W . It is to be noted here again that each of the weight vector in the pool corresponds to

a particular choice of elevation and azimuth beam-widths and e-tilt angle. The agent selects

one out of J beam patterns for each sector based on users’ distribution/mobility patterns.

This selection behavior is referred to as actions in reinforcement learning.

All BS in the network transmit sector-specific signals using the wide broadcast beams selected

by the agent. UEs collect measurement results such as Reference Signal Received Power

(RSRP) or Reference Signal Received Quality (RSRQ), and report them to the agent as

observation of the mobile environment. Assuming k-th UE in the network is associated with

m-th sector, from (5.1), the received signal-to-interference-plus-noise ratio (SINR) for k-th

user can be expressed as:

SINRk =

∣∣hTm,kfm∣∣2∑Mm′=1m′ 6=m

∣∣hTm′,kfm′∣∣2 + σ2, (5.2)

where σ2 is the noise variance. In this work, we use the number of connected UEs as a metric

to measure the cell coverage. Number of connected UEs in the network can be defined as

the number of UEs whose received signal-to-interference-plus-noise ratio (SINR) are above

a predefined threshold, T . For any user distribution, the objective, hence, is to select the

5.1. Network Model and Problem Statement 109

optimal beam pattern indices for all the sectors under all BSs that maximize the coverage

or total number of connected UEs in the network. The problem can formally be written as:

maxf1,f2,...,fM

K∑k=1

1SINRk>T (5.3)

s.t. fm ∈ W , 1 ≤ m ≤M, (5.4)

where the indicator function, 1x>T , is defined as

1x>T :=

1, if x > T

0, if x ≤ T.

(5.5)

The user distribution changes over time, and hence optimal beam patterns that maximize

the number of connected UEs at time t1 may not be the same as that at time t2, where

t1 6= t2. The agent, therefore, has to be able to identify users’ mobility pattern, and then

dynamically and autonomously select optimal beams for all the sectors in order to maximize

network coverage. It is to be noted here that we are not using users’ location information to

optimize the beam patterns. In order to minimize the feedback from the network, the agent

will be merely using users’ RSRP values to for the optimization.

In this work, we consider both single cell and multiple cells network scenarios. In the single

cell case, the agent optimizes the broadcast beam for one cell–this represents a noise-limited

environment. In this case, the DRL only needs to learn the optimal beam according to the

cell environment including UE mobility pattern. On the other hand, in the multiple-cell case,

the broadcast beams for all the cells need to be updated simultaneously– this represents an

interference-limited environment. We are addressing the challenges of these two scenarios


where UEs are assumed to be moving according to some mobility pattern; first, the periodic

case, where users’ movement change in a periodic fashion, and second, the Markov case,

where users’ mobility is determined by following a transition probability matrix.

5.2 Learning Framework

In this section, we present learning framework for MIMO broadcast beam optimization using

DRL as a self-tuning sectorization mechanism. We first briefly describe the background of

DRL which will set up the foundation for broadcast beam-learning strategy developed in

subsequent sections.

5.2.1 Beam Learning Framework

Appropriate MIMO broadcast beam selection for cell-sectorization is critical for wireless

network performance optimization. Our objective here is to build a mechanism that auto-

matically facilitates the selection of best beams for all the sectors. Moreover, we would need

the sectors autonomously update their beam parameters based on different scenarios or user

distributions, and realize self-tuning sectorization. Towards this, our learning framework is

described as follows:

Specification of design parameters: First of all, network designer needs to decide on

the objective function that needs to be optimized [76]. For broadcast beam optimization,

an important objective function is the network coverage or total number of connected UEs

in the network. The optimization parameters in this problem are the beam weights for each

sector antenna element. It is necessary to select the optimal beam for each BS from a set of

possible beams. Next, the system designer needs to decide on what input, such as RSRP or

5.2. Learning Framework 111

RSRQ, are required from the UEs in order to learn their mobility behavior and optimize the

beams. Finally, in order to avoid random broadcast beams during the deployment stage, a

simulation platform based on ray-tracing data is built to train the DRL agent offline.

Learning Engine: An agent or learning engine has the task of learning the UE mobility

pattern and selecting the best beam parameters for each scenario. It takes feedbacks from

UEs as inputs, and suggests the optimal beam vectors for all sectors. Updating the beams

based on user distribution by autonomously identifying underlying mobility pattern requires

training. However, online training is often not desirable because of stringent network man-

agement requirements from the operators. Hence, the training needs to be done offline, and

the training environment has to be close to the real cellular environment as much as possible

so that the optimal beams in the training stage will be identical to the optimal beams in

deployment stage– the procedure is presented in details in the next subsection.

Online Deployment and Occasional Re-training Once the learning engine is trained

offline, the learned agent is deployed for real-time operation. It will enable the BSs to choose

the optimal beams and update the selections based on users mobility pattern. Since users’

mobility pattern in the network don’t change too frequently, the beam parameters learned

offline can remain unchanged for a long period of time– on the order of weeks or months.

Whenever, there is a need to support new scenarios or any change in mobility patterns

is identified, the learning engine would need to be re-trained offline based on recent data.

The newly learned beam parameters will then be pushed to the respective BSs for updated

operation.


5.2.2 Offline Training

Dynamically updating the broadcast beam patterns according to the cellular environment

and user distribution for all cells in real time is intrinsically a difficult problem. Directly

deploying a DRL agent and training it online is not only slow but also costly. During the

online training stage of the DRL, the agent may output some random beams according to

the greedy exploration algorithm. Some of these random beams may not be acceptable

to operators because of degraded network performance. In order to address this issue, we

develop an offline training mechanism using ray-tracing data to train the DRL network before

real deployment. By providing azimuth angle of arrival, elevation angle of arrival, azimuth

angle of departure, elevation angle of departure, and path loss value of each path for each

location in a cell, ray-tracing can well-capture the cellular environment so that the learned

beam in the offline training platform could be the same as the online deployment case.

The offline training is focused on learning the UE distribution pattern from users’ location

history data. The location data includes UEs’ location and the corresponding time stamp.

The location history data contains the UEs’ mobility pattern information. Together with ray-

tracing data, which contains the information about signal propagation environment , UEs’

location history data are used to train the DRL network so that the DRL agent could learn

the best broadcast beam according to both the cellular environment and UE distribution

pattern. It is to be noted that for each training time step, the BSs select one set of actions

and it throws the agent to a new state, upon which the new reward is computed. Hence, for

each training step, the agent needs to access one ray-tracing data. After offline training, it

will be deployed to provide real-time broadcast beam selection results for all the BSs in the

cellular network. In the following, we describe the detailed steps of offline training.

According to 3GPP standard on minimization of drive test (MDT), a BS could configure its

UEs to report measurement results, time stamp, and location information [77]. Therefore,

5.2. Learning Framework 113

we assume that UE location history information is available for a cellular network. During

one training step, a batch of time stamps are selected from the location history data, and

the corresponding UEs’ location information is incorporated to ray-tracing data for every

time stamp. Therefore, the UE distribution at the selected timestamp is combined with

ray-tracing data. We call the ray-tracing data with UE distribution information as scenario-

specific ray-tracing data and the UEs who report their measurement information during the

timestamp as selected UEs. Based on the current BSs’ broadcast beam and scenario-specific

ray-tracing data, the receive power for the selected UEs could be calculated and accordingly

the network coverage. A reward could be provided to the DRL agent based on the coverage

and the DRL agent could accordingly update its selection of broadcast beams based on

selected optimizer. These offline training steps could be repeated many times until the DRL

agent converges. After the DRL agent converges, it could be deployed in the cellular network

for real-time broadcast beam selection. Details on the DRL agent design is discussed in next

section. The entire offline training process is pictorially depicted in Fig. 5.1 and Algorithm 1.

Figure 5.1: Offline training


Algorithm 1 Offline Training

Input:1: UE location history data, ray-tracing data of a cellular networkOutput:2: trained DRL agent for broadcast beam selectionSTEP 1: Initialization3: Define a pool of candidate antenna patterns;STEP 2: Learning Best Beams4: while algorithm doesn’t achieve convergence do5: Select a batch of UE location at different timestamps;6: incorporate UE location distribution to ray-tracing data to create scenario-specific ray-

tracing data;7: calculate the received power for each UE in the scenario-specific ray-tracing data based on

the current BSs’ broadcast beam;8: calculate the network coverage, and calculate a total reward as a function of network cov-

erage;9: DRL updates its neural weights based on the learning algorithm and reward

10: end while

5.3 DRL for Broadcast Beam Optimization

In this section, details on the design of DRL framework for self-tuning sectorization are

presented. The DRL network is utilized in order to track optimal beams during both the

offline training and online deployment. To be specific, a deep Q-network (DQN)-based

architecture has been introduced to select MIMO broadcast beams for all sectors in a dynamic

environment. For better stability of the results, we use DQN with experience replay [42, 78].

The agent (decision maker) interacts with the environment by selecting the best broadcast

beam parameters. The DRL has three main components: state, action and reward. The

dynamics between state, action and reward are shown in Fig. 5.2. Agent interacts with

environment by observing the state of the network, and taking action that maximizes the

reward or network performance metric.

5.3. DRL for Broadcast Beam Optimization 115

5.3.1 Background of DRL

We consider a reinforcement learning framework where an agent or controller dynamically

interacts with an unknown environment, E , by taking sequential decisions or actions in

discrete time steps. At each time step, t, the agent interacting with the environment observes

a state, st ∈ S, selects an action, at, from a set of allowable actions, A, and receives an

immediate scalar reward, rt ∈ R(st, at). Based on agent’s current action, agent enters into

new state, st+1. The cumulative discounted reward, Rt, at time step, t, is defined as

Rt =∞∑k=0

γkrt+k, (5.6)

where γ ∈ (0, 1] is the reward discount factor, which balances between the impact of recent

rewards and earlier rewards. The learning objective is to maximize the expected cumulative

reward at each state, st. The Q-value, Qπ(s, a), for state-action pair, (s, a), is defined as

the expected cumulative discounted reward for taking action, a, in state, s, and following a

policy, π, onward, i.e.,

Qπ(s, a) = E[Rt|s, a], (5.7)

where E[·] denotes expectation. Q-learning adopts a value iteration approach to find the

Q-values for each state-action pair, and optimal value function Q∗(s, a) is the one which

provides maximum action value for state, s, and action, a, achievable by following any

policy:

Q∗(s, a) = maxπ

Qπ(s, a). (5.8)


Using Bellman equation [41], the optimal value function in (5.7) can be expressed as

Q∗(s, a) = Es′

[rt + γmax

a′Q∗(s′, a′)|s, a

]. (5.9)

The value iteration algorithm can solve the Bellman equation, and the update rule is given

by

Qi+1(s, a)← Es′

[rt + γmax

a′Qi(s

′, a′)|s, a.]. (5.10)

In deep Q-learning, the value functions are approximated by deep neural network parame-

terized by the weights, ζ:

Q(s, a, ζ) ≈ Qπ(s, a). (5.11)

This helps to estimate the Q-values even for very large state-action space, and reduces

the computational complexity. Next, we describe each of these components in details, and

explain how we model the state, action, and reward in DRL-based MIMO broadcast beam

optimization problem.

State: State in the introduced RL framework is designed as to reflect the network coverage

situation which can be obtained from UE measurements. To be specific, we can design

the state as the connection indicators of UEs in the network (a vector of 1/0s). Each UE

reports its status to its attached BS. If a UE’s SINR falls below a predefined threshold, T ,

a zero is placed at the element of the vector corresponding to that UE. Otherwise, a one is

placed. Accordingly, a ‘0’ in the state vector will represent that the corresponding UE has

poor connection, and a ‘1’ will indicate that the UE has good connection. The DRL state

representation adopted in this work is pictorially depicted in Fig. 5.3.


AGENT

Action(BeamParameters)

Observation/StateReward(No.ofConnectedUEs)Environment

UEs

Figure 5.2: Reinforcement Learning Framework for Beam Optimization

1

11

00

BaseStation/CellTower

User1’sStatus

User2’sStatus

User3’sStatus

User4’sStatus

User5’sStatus

User1

User2

User3

User4

User5

Connected

Connected

Connected

NotConnected

NotConnected

GoodConnectionSignal-strengthabovethreshold

PoorConnectionSignal-strengthbelowthreshold

Figure 5.3: DRL State Representation for Beam Optimization Problem

Action: An action of the agent is defined as the selection of beam index from a pool of

candidate beam patterns. Agent observes the states and the corresponding reward, and takes

the best possible action that maximizes the cumulative discounted future reward for the next


time step. At the beginning of training, agent explores different actions in an attempt to

learn the best beams for different user distribution. However, once the training phase is

complete, agent exploits the learned information and only selects the best known actions

that maximize the cumulative reward for each user distribution. We would like to highlight

the fact that continuous beam space would not be feasible for cellular network coverage

optimization since it can produce many beams which may not be practically realizable at BS

arrays. Hence, beam pool should consist of discrete set of beams and need to be judiciously

selected based on the particular network under consideration.

Reward: A reward in this work refers to any network performance metric. One way to

design the reward can be the total number of connected UEs in the network based on the

state and action taken in the previous state. Another approach to design the reward can

be the function of the measurement results, for example, a function of the SINR or RSRP

vector. In this work, we adopt the first approach for designing the reward. It is to be

noted here that maximizing total number of connected UEs in the network is equivalent to

maximizing the coverage of the cellular network.

The agent’s goal is to maximize the cumulative discounted future reward. The agent gathers

its experiences as tuples, (st, at, rt, st+1), where st denotes current UE connection state, at

denotes the action taken at state, st; rt is the instantaneous reward obtained from state,

st and by taking action, at; and st+1 is the next state. The agent stores history of its

experiences in a memory called experience replay memory[78], and replay memory stores the

tuples, (st, at, rt, st+1), for all time steps. The DRL agent randomly samples mini-batches

of experience from the replay memory for training, and selects an action based on ε-greedy

policy, i.e., with probability ε, it tries a random action, and with probability (1−ε), the agent

selects the best known action so far. The optimum action in a particular state is selected

based on maximum Q-values [41] corresponding to that state. In DQN-based reinforcement


learning, the Q-values are predicted using deep neural network. Input to the neural network

is the UEs’ connection vector representing the state of the RL environment, and output is

the Q-values corresponding to all the possible actions, i.e., beam indices from the beam-pool.

In the following subsections, we detail the broadcast beam optimization strategies for both

single cell and multiple cell scenarios.

5.3.2 Broadcast beam optimization for dynamic environment

In this subsection, we present the framework for dynamically optimizing MIMO broadcast

beams, where the RL agent needs to simultaneously control the beam parameters for all

the sectors based on different user distributions. For the single cell case, beam parameters

corresponding to only one sector need to be optimized. This could serve as an example where

a legacy LTE sector is replaced with one massive MIMO unit. The goal is to maximize the

number of connected UEs for different dynamic user distributions. The agent keeps a single

replay memory containing the agent’s experience tuples, and randomly samples from it–this

random sampling from experience replay memory helps to decorrelate the data[43]. However,

for multiple sectors case, there needs to be some significant changes on the RL framework

compared to that for single sector beam optimization. In the multiple sector environment,

each sector has its own pool of beams or action sets. Each sector can hence independently

select its own beam parameters. The setup is similar to that of multi agent system [79, 80].

The goal remains the same–to maximize the overall network coverage. This is a challenging

problem in terms of computational tractability. For an illustration, let us consider that

there are m sectors in the network, and each sector has j possible beam patterns (actions)

to select from. Hence, total number of actions, i.e., all possible combinations of sectors’

beam patterns, becomes jm, which increases exponentially with total number of sectors.

If there are 40 base stations, and each has 5 possible actions to choose from, total possible


combination of beam patterns becomes 540, which is an extraordinarily large number, making

it difficult to achieve optimal solution within reasonable time.

One way to find the appropriate broadcast beams for multiple sectors simultaneously is to

use a single large neural network with large number of output nodes that can be used to

predict the Q-values for all possible jm actions. However, total number of training samples

needed to train such neural network would be extremely large, which may not be feasible

at all for any practical purposes. In other words, the learning algorithm can almost never

achieve convergence with this architecture for even moderate size cellular network.

To address this issue, we introduce a novel low-complexity algorithm for optimizing the

broadcast beams for multiple sectors where the action space grows only linearly, instead of

exponentially, with total number of sectors in the network. Let us again assume that there

are m sectors, and each sector has j possible actions (beam-weight set) to choose from.

Unlike the single cell case, for multiple cell environment, we assume the agent preserves

different replay memories for different sectors. Moreover we use m different neural networks

for independently computing the Q-values for j sectors. Each neural network is responsible

to predict the optimum action for the corresponding sector only. With this architecture,

number of actions increases only linearly, but we can still achieve perfect convergence with

reasonably short computation time, which demonstrated through extensive simulation in

Section 5.4. It is to be noted that deep Q learning algorithm proposed in [42] is designed to

create a single NN-based agent that can learn to play Atari games where the number of valid

actions for the player is quite limited. On the other hand, in terms of training methodology,

the architecture presented in this section for multiple sector scenario is scalable with growing

action space, and in this sense, it provides a architectural generalization of the work in [42].

The details of the architectures for replay memory and neural networks for multiple sector

broadcast beam optimization are briefly described next.


Figure 5.4: Replay Buffer architecture for multiple sector case

Replay memory architecture: The replay memory architecture for multiple sectors

broadcast beam optimization is shown in figure 5.4. There are separate buffers for each

sector. The same current state, reward, and the next state are stored in all the replay mem-

ories/buffers for the sectors. However, the replay memories differ in the actions taken (beam

indices chosen) by the each sector. While all the sectors observe the same current state, st

, reward, rt , and next state, st+1 , the action stored are different–BS 1’s action is stored

in buffer 1, BS 2’s action is stored in buffer 2, and so on. The rationale behind this buffer

architecture is that states and rewards are network specific, and same states and rewards

are observed by all sectors. On the other hand, each sector takes its own action, and their

joint actions regulate the overall network state and the corresponding reward.

Neural Network architecture: For Q-value prediction, a deep convolutional neural net-

work is used in this work. For the suitability of computing the Q-values using convolutional

neural network, we transform the (K × 1) UE connection vector into an ( K100× 100) frame.

Four such frames are stacked together, and fed as the input to the neural network for com-

puting the Q-values. We used three convolutional layers–all with rectified linear unit (ReLU)

activation function. First convolution layer has 32 (8x8) filters. Second and third convolu-


tion layers have 64 (4x4) filters and 64 (3x3) filters, respectively. Finally, a dense layer with

linear activation function is used as the output layer.

Two such identical neural networks are used in predicting the Q-values. One is used for

computing the running Q-values–this neural network is called the evaluation network. The

other neural network, called the target network, is held fixed for some training duration,

say for P episodes, and every P episodes, the weights of the evaluation neural network is

transferred to the target neural network. It has been shown that this two neural network-

based approach for Q-value prediction provides better stability of results at convergence

[43].

The neural network architecture for predicting the Q-values for multiple sectors are shown in

Fig. 5.5. The depiction is presented for M sectors case, where M separate neural networks

Q-valuesfor

Act1ofSector1

ActJ ofSector1

Act1ofSectorM

ActJ ofSectorM

(𝑲×𝟏)UEConnection

Vector

NNforSector1’sQ-values

NNforSectorM’sQ-values

Figure 5.5: Neural Network architecture for multiple sector case

are used for predicting the Q-values for M sectors. Input to all neural networks are the

5.4. Simulation Results and Performance Analysis 123

same state vectors. Neural networks are identical, and the number of output for each neural

network is J . Hence, size of action space is JM , instead of JM , i.e., total number of actions

grows only linearly with number of sectors. The optimal action predicted by the Q-values of

neural network 1 is stored in Buffer 1, which corresponds to sector 1. Similarly, the action

predicted by the Q-values of neural network 2 is stored in Buffer 2, which corresponds to

sector 2, and so on. It is to be noted that neural networks do not share any weight information

during training, and each neural network independently predicts the optimal actions for the

corresponding sectors. Hence, there is no additional signaling overhead among the neural

networks. The beam learning procedure for multiple BS environment is presented in

Algorithm 2.

5.4 Simulation Results and Performance Analysis

In this section, we present the simulation results and performance evaluation for self-tuning

sectorization mechanism through DRL-based MIMO broadcast beam optimization. We first

present the results for single sector environment followed by multiple sectors case. Both

periodic and Markov mobility patterns have been considered for the evaluation.

5.4.1 Results for single sector dynamic environment:

In this sub-section, we present the performance evaluation for our algorithm for single sector

dynamic environment. The sector is equipped with a two dimensional (2D) antenna array

with 4 antenna elements in both elevation and azimuth directions. The horizontal distance

between BS antenna elements is 0.5 wave-length and the vertical distance between antenna

elements is 1.48 wave-length. We first consider two scenarios or user distributions, and


Algorithm 2 Broadcast Beam Optimization for Multiple Sectors

Input:1: RSRP measurements from the UEs in the networkOutput:2: Optimum broadcast beam patterns for all sectors that maximizes the number of connected UEsSTEP 1: Initialization3: Define a pool of candidate antenna pattern;4: Define the maximum exploration rate, εmax, minimum exploration rate, εmin, exploration decay

rate, optimizer’s learning rate, α, and reward discount factor, γ;5: Initialize the replay memory, D.STEP 2: Optimization of Beam Weights6: for episode = 1, 2, . . . , Z, do7: Initialize the state vector at time step 1 as s1;8: for t = 1, 2, . . . , T ′, do9: Sample c from Uniform (0, 1)

10: if c ≤ ε then11: Select an action (choose a beam index) for each sector randomly from the beam pool12: else13: for m = 1, 2, . . . ,M do14: Select the action for m-th BS, amt = argmaxamQ

∗m(st, a

m; θm)15: end for16: end if17: Apply the selected beam patterns on the antenna arrays of the corresponding BSs18: Observe the resulting RL state, st+1, the UE connection vector.19: Pre-process the state vector into a frame before feeding to Neural Network20: Compute the reward, rt, which is the number of connected UEs.21: for m = 1, 2, . . . ,M do22: Store the experience tuple for m-th sector, emt = (st, a

mt , rt, st+1), in m-th replay

memory, Dm.23: Sample random mini-batches of experience (sj , a

mj , rj , sj+1), from Dm

24: if sj+1 is a terminal state then25: Set ymj = rj26: else27: Set ymj = rj + γmaxam Qm(st, am; θ)28: end if

29: Perform a gradient descend on(ymj −Q(sj , a

mj ; θ)

)2

30: end for31: end for32: end for


Table 5.2: Simulation Parameters

RL Parameter Specification

Reward Discount Factor, γ 0.0001Learning Rate, α 0.001

Initial exploration probability, εmax 1.0Final exploration probability, εmin 0.000001

Training batch size 32Optimizer Adam

Network Parameter Specification

Antenna array at BSs 4× 4Antenna separation in azimuth domain 1.48λAntenna separation in elevation domain 0.5λUEs’ SINR threshold for connectivit, T -6 dB

BSs height from the ground 35m

assume that users switch between Scenario-1 and Scenario-2 periodically every 8 time steps

(see Fig. 5.6). The BS is located at a height of 35 m from ground, and users are distributed

randomly in the cell. Based on users’ X-, Y-, and Z-coordinates, two scenarios are defined

as follows: Scenario-1: X ≥ 2600 m, Z ≥ 10 m; Scenario-2: X ≤ 2700 m, Z ≤ 12 m. For

simulation, this partition is used as users’ mobility pattern. The received power of each UE

is calculated based on ray-tracing data. Noise level is set as −95 dBm, and SINR threshold

level is kept at 0.1 dB. For a particular user, if the received SINR is above this threshold, we

consider the user to be connected; otherwise, we consider it to be not-connected. A set of

simulation parameters used in this work is summarized in Table 5.2. The general rationale

for selecting the hyper-parameters are described below:

Initial Exploration Rate: At the beginning of training, agent needs to gather experiences,

and explore as much as possible. Accordingly, the initial exploration rate, εmax, is set to 1,

which corresponds to complete exploration and no exploitation.

Final Exploration Rate: Towards the end of training, agent should have acquired enough

knowledge about the environment and the underlying user distributions. In this phase, rather


than exploration, the agent should focus on exploitation by taking the already known best

actions for different sectors. Hence, final exploration rate should be close to zero. However,

in order to avoid the situation where two rewards are very close to each other and the agent

is stuck with the slightly lesser reward, the final exploration, εmin, in practice, is not set at

exactly 0. In this work, we set εmin at a very small number, 0.000001, which correspond to

very high exploitation phase.

Exploration Decay Rate Exploration decay rate is set based on the total number of

training samples available and the number of training samples used for initial exploration

phase. Usually, the exploration rate is decreased in regular interval. Denoting the number

of training samples dedicated for initial observation as Tobs, in this work, we followed the

algorithm below for decaying the exploration rate at time step, t. Here, Texpl denotes

if t ≤ Tobs thenεt = εmax

else if t > Tobs and εt > εmin thenεt = εt−1 − (εmax−εmin)

Texpl

elseεt = εmin

end if

a parameter > 1 controlling the speed of decay. In our work, we set Texpl = 5000, and

Tobs = 1000.

Learning Rate: Learning rate, α, determines how fast information acquired from recent

experiences overrides the information from prior experiences. In practice, α is set between 0

and 1. A learning rate of 0 implies the Q-values are never updated, and hence no learning

takes place. On the other extreme, a learning rate of 1 means the agent only considers

the information from the most recent experience, and ignores any information previously

acquired. In this work, we start training with initial learnig rate, α = 0.001, and every

20000 training steps, we reduce our learning rate by a factor of 10.


Reward Discount Factor: Reward discount factor, γ, indicates how the agent values the

future reward. In practice, the value of γ is set between 0 and 1. γ close to zero indicates

that immediate rewards are more valued than the distant future rewards. On the other hand,

γ close to 1 implies that long term cumulative future rewards are more important than the

current reward. Based on our DRL environment, we set the reward discount factor at 0.0001.

Figure 5.6: Periodic Change in Scenarios

At each time step, the RL agent has 10 actions to choose from, i.e., there are 10 different

beam weight vectors available for the agent. Each of the actions corresponds to a unique

beam pattern. As an illustration, one such beam pattern and the associated elevation and

azimuth cuts are shown in Fig. 5.7. Based on the change in user distribution, the agent

adaptively selects the beam that maximizes the total number of connected UEs. Figure 5.8a

shows the average squared difference (ASD) between the reward (total number of connected

UEs) obtained by the DRL agent and the reward predicted by Oracle:

ASD =1

N ′

N ′∑n=1

(RAgentn −ROracle

n

)2, (5.12)

where RAgentn and ROracle

n denote instantaneous reward at n-th time step obtained by DRL

agent and Oracle, respectively; N ′ represent the number of time steps used for averaging.

Oracle is defined as an entity which has the complete and perfect knowledge of the environ-

ment and user distribution; it is essentially an exhaustive search method in order to compute

the maximum attainable reward at any given scenario. Each point in Fig. 5.8a represent

ASD over N ′ = 200 time steps. In Fig. 5.8a, we have also shown the shaded error bar, which


(a) Beam Pattern

-200 -100 0 100 200

Azimuth Angle (degrees)

-50

-40

-30

-20

-10

0

10

20

Pow

er

(dB

)

Azimuth Cut (elevation angle = 0.0°)

-100 -50 0 50 100

Elevation Angle (degrees)

-50

-40

-30

-20

-10

0

10

20

Pow

er

(dB

)

Elevation Cut (azimuth angle = 0.0°)

(b) Elevation and Azimuth Cuts

Figure 5.7: Beam pattern corresponding to a typical RL action.

represent the maximum difference from the mean value within every N ′ time steps. It can

be observed that at the beginning of training, ASD between rewards obtained by the RL

agent and the Oracle is quite high. However, as time goes by, ASD gradually decreases, and

finally, at the completion of training, rewards from RL agent converges completely with that

from Oracle. This is due to the fact that at the beginning of training, the agent explores

different actions and collects the memory. During the exploration phase, the agent tries out

all available actions, and attempts to learn the optimal beam weights for different user dis-

tributions. Over time, this exploration rate decreases, and exploitation increases, i.e., agent

tends to choose more frequently the best known actions so far that maximize the reward.

Fig. 5.8b shows the results for average mismatch (AM) in actions (selected beam pattern)

taken by the DRL agent and the Oracle, respectively, where AM is defined as

AM =1

N ′

N ′∑n=1

1(AAgentn 6=AOracle

n ), (5.13)

where AAgentn and AOracle

n denote the actions selected for n-th time step by the DRL agent


(a) ASD in reward from DRL agent and Oracle. (b) Average action mismatch with Oracle.

Figure 5.8: Results for periodic mobility pattern in a single sector dynamic environment: (a)average squared difference (ASD) between reward achieved by DRL agent and the rewardobtained by Oracle; (b) average mismatch (AM) between actions taken by the DRL agentsand the Oracle.

and the Oracle, respectively, and the indicator function, 1(AAgentn 6=AOracle

n ), is defined as

1(AAgentn 6=AOracle

n ) =

1, if AAgent

n 6= AOraclen

0, if AAgentn = AOracle

n .

(5.14)

It can be observed that action mismatch is quite large at the start of the training because

of high exploration rate. However, at the end of training phase, actions taken by the DRL

agent and the Oracle converge completely, and average mismatch reduces to zero. It is to be

noted that the introduced DRL-based self-sectorization method is applicable for any discrete

number of actions. However, as the number actions grows large, the difference between opti-

mal number of connected users corresponding to different best beam combinations becomes

smaller. Further increasing the number of action would not provide much gain, however,

may potentially cause longer training time, especially for the multiple sector scenarios.


(a) Scenario 1 (b) Scenario 2

Figure 5.9: Users’ Distribution Patterns for 2 Scenarios.

5.4.2 Results for multiple sector dynamic environment:

In this sub-section, we present the simulation results for multiple sector dynamic environ-

ment. We consider two sectors, each at a height of 35 m from ground. Each sector has two

possible beam patterns to choose from. Two scenarios are considered similar to single sector

case in the previous sub-section. The scenarios with line of sight (LoS) and non-line of sight

(NLoS) UEs are shown in Fig. 5.9. We assume the scenarios periodically change every 8

time steps. The agent is responsible for simultaneously selecting the optimal beam patterns

for both sectors for maximizing the number of connected UEs in the network. The aver-

age squared difference in rewards achieved by the agent and the oracle for multiple sectors

scenario is shown in Fig. 5.10a. Similarly to single cell case, as training increases, overall

rewards attained by the agent and the oracle converge completely. In other words, the agent

is able to dynamically optimize the beam patterns for both sectors simultaneously in the

interference environment, and maximize the overall rewards from the network in all scenarios

or user distributions. In Fig. 5.10b, we show the average action mismatch for both sectors.

It can be observed that towards the end of exploration phase, average action mismatches be-


(a) ASD

0 5 10 15 20 25

Simulation Steps ( 200)

0

0.2

0.4

0.6

0.8

1

AM

fro

m O

racle

Sector-2 Actions

Sector-1 Actions

(b) AM

Figure 5.10: Results for periodic mobility pattern in a multiple sector dynamic environment:(a) average squared difference (ASD) between reward achieved by DRL agent and the rewardobtained by Oracle; (b) average mismatch (AM) between actions taken by DRL agents foreach sector and the corresponding Oracles.

tween the sectors and the corresponding Oracles reduce to zero. The instantaneous rewards

and actions at convergence of the algorithm are shown in Fig. 5.11, where, for clarity, we

zoom in for time steps between 4000 and 4030. We can observe that scenarios change every

8 time steps and maximum number of connected UEs are different for the two scenarios.

Optimal strategy for sector-1 is to select action 1 while in scenario 1, and select action 2

while in scenario 2. On the other hand, optimal strategy for sector-2 is to select action 2 for

both scenarios. In reinforcement learning, it is, in general, difficult to obtain convergence

if the reward values are too close. However, we can observe from Fig. 5.11 that the DRL

agent can completely converge with the oracle and take the corresponding best actions even

when the reward values for scenario 1 and scenario 2 differ. This indicates the accuracy of

self-tuning sectorization strategy developed in this work.

Fig. 5.10 and Fig. 5.11 are based on our introduced neural network architecture, where Q-

values corresponding to each sector is predicted by a separate neural network. For compari-


4000 4005 4010 4015 4020 4025 4030

Simulation Steps

290

300

310

320

330

340

350

360

No

. o

f C

on

ne

cte

d U

Es

Rewards from DRL Agent

Rewards from Oracle

(a) Rewards

4000 4005 4010 4015 4020 4025 4030Simulation Steps

1

2

3

4

Actio

n I

nd

ex

Sector-2 Actions from DRL Agent

Sector-2 Actions from Oracle



(b) Actions

Figure 5.11: Instantaneous rewards (a) and instantaneous actions (b) at convergence formultiple sectors environment and periodic user-mobility pattern.

son, in Fig. 5.12, we present the global solution, where a single neural network is responsible

for predicting the Q-values for all sectors. Hence, if there are 4 actions available for each

sector, for a 2-sectors environment, the neural network needs to predict Q-values for 42 = 16

actions. We can observe from Fig. 5.12 that for the single NN-based architecture, it requires

more than 7500 time steps for the DRL to converge with Oracle. In comparison, for the in-

troduced NN-architecture in Fig. 5.10, the DRL agent can converge with Oracle within only

about 4000 time steps. These results demonstrate the training advantage of the introduced

neural network architecture, especially for large action space, over the traditional state of

the art DRL training method [42].

In general, if the action space grows large, more training time is required for the algorithm to

converge. However, exact training time required can be determined through experimentation.

For example, Fig. 5.13 shows the results on average squared difference (ASD) in reward

(number of connected UEs) obtained by the DRL agent and the Oracle for a single sector.

In Fig. 5.13, for a fixed set of hyper-parameters, we provide a comparison on how size of

action space affects the convergence time, where we vary the number of available actions


(a) Rewards

0 10 20 30 40 50


0

0.2

0.4

0.6

0.8

AM

fro

m O

racle

Sector-2 Actions

Sector-1 Actions

(b) Actions

Figure 5.12: Results for Global solution for periodic mobility pattern in a multiple sectordynamic environment: (a) average squared difference (ASD) between reward achieved byDRL agent and the reward obtained by Oracle; (b) average mismatch (AM) between actionstaken by DRL agents for each sector and the corresponding Oracles.

(possible beam patterns) from 2 to 10. We can observe that for the single sector case,

training for the action sizes from 2 to 10 all can converge within about 3000 time steps.

On the other hand, Fig. 5.14 presents ASD between the DRL agent and the Oracle for

two-cells dynamic environment where the comparison on convergence time is shown for the

numbers of actions 2 and 4. Unlike the performance on single cell cases in Fig. 5.13, for

the multiple sectors case, we can observe that as the number of actions doubles, from 2 to

4, it requires more time for the DRL agent to converge with the Oracle. However, from

these experiments, it is notable that even though the action space increases linearly, the

required convergence time doesn’t increase proportionally, i.e. training time doesn’t need to

be doubled with doubling the action space.


(a) ASD: 2 Actions (b) ASD: 4 Actions

(c) ASD: 6 Actions (d) ASD: 8 Actions

(e) ASD: 10 Actions

Figure 5.13: Results for average squared difference (ASD) in reward between the DRL agentand the Oracle for periodic mobility pattern in a single sector dynamic environment. ASDsfor different size of action space have been plotted in figures (a) - (e).


(a) ASD: 2 Actions (b) ASD: 4 Actions

Figure 5.14: Results for ASD in reward between DRL agent and the Oracle for periodicmobility pattern in a multiple sector dynamic environment. ASDs for different size of actionspace have been plotted in figures (a) and (b).

5.4.3 Multi-sectors environment with Markovian mobility pattern

In this sub-section, we present the performance analysis for DRL-based self-tuning beam-

forming in multiple sector environment and for the case where user distributions alternate

between two scenarios following a Markovian mobility pattern. It is to be noted here that,

in general, users’ mobility pattern has some intrinsic regularity. For example, users can be

clustered more in the commercial area during day time while they move to residential are

in the evening. Hence, periodic mobility patterns considered in previous two sub-sections

rather closely depict the actual mobility pattern in cellular network. Nevertheless, in this

sub-section, we consider the Markovian mobility in order to verify the robustness of the de-

veloped self-tuning sectorization algorithm for the extreme case when users’ mobility pattern

doesn’t have any regularity and users move between different scenarios in random fashion.

We consider two scenarios defined similarly to the ones in Section 5.4.1, and assume the

users’ locations switch between these two scenarios with transition probability governed by


the state transition diagram shown in Fig. 5.15. Moreover, we consider two sectors each

Scenario2

0.9

0.1

0.6

0.4

Scenario1

Figure 5.15: State Transition Diagram for Markov Mobility.

having two possible beam patterns to choose from for each scenario. Fig. 5.16a shows the

average squared difference for rewards attained by the RL agent and the oracle for Markov

mobility pattern. We can observe that similarly to the periodic cases presented in previous

two subsections, RL agent does converge with the oracle even for probabilistic mobility, and

ASD goes to zero after the training phase. Average mismatch in actions between the sectors

and the corresponding oracles are shown in Fig. 5.16b. It can be seen that average mismatch

in actions for both sectors reduce to zero at the end of the training phase. Finally, the

instantaneous rewards achieved and the actions taken by the sectors at convergence of the

algorithm are shown in Fig. 5.17, which, again, indicates perfect convergence for Markov

mobility pattern in multiple cell environment.


(a)

0 10 20 30 40


0

0.2

0.4

0.6

0.8

1

AM

fro

m O

racle

Sector-2 Actions

Sector-1 Actions

(b)

Figure 5.16: Results for Markov mobility pattern in a multiple sector dynamic environment:(a) average squared difference (ASD) between reward achieved by DRL agent and the rewardobtained by Oracle; (b) average mismatch (AM) between actions taken by DRL agent foreach sector and the corresponding Oracles.

6000 6005 6010 6015 6020 6025 6030290

300

310

320

330

340

350

360

Reward from DRL Agent

Reward from Oracle

(a) Rewards

6000 6005 6010 6015 6020 6025 6030

Simulation Steps

1

2

3

4

Actio

n I

nd

ex



Sector-1 Action from DRL Agent

Sector-1 Action from Oracle

(b) Actions

Figure 5.17: Instantaneous reward (a) and instanteneous actions (b) at Convergence formultiple sectors environment and Markov user mobility pattern.


5.5 Chapter Summary

In this work, we have developed a framework for self-tuning cell sectorization through MIMO

broadcast beam optimization using deep reinforcement learning. To be specific, we have in-

troduced learning strategies for both single sector and multiple sectors environment with

dynamic user distribution. The introduced solutions can autonomously and adaptively up-

date the RF parameters based on the changes in user distributions. Simulation results show

that the DRL-based method completely converges with the Oracle-suggested optimal solu-

tions for both periodic and Markovian user mobility patterns.

Chapter 6

Conclusion

Accurate DL CSI is critical for massive MIMO to realize the promised throughput gain. In

this dissertation, we introduced optimal DL MIMO precoding and power allocation strategies

for multi-cell multi-user massive FD-MIMO networks based on UL DoA estimation at the

BS. The UL DoA estimation error for such a network has been analytically characterized and

has been incorporated into the proposed MIMO precoding and power allocation strategy.

Simulation results suggested that the proposed strategy outperforms existing BD-ZF based

MIMO precoding strategies which requires full CSI at the BS. This work shed a light on

system design for massive FD-MIMO communications which is critical for 5G and Beyond

5G cellular networks. Moreover, based on parametric channel modeling, we have proposed

a framework for estimating parameters for 3D massive MIMO OFDM system and analyti-

cally characterized the estimation performance. Results show that the empirical results on

parameter estimation match those with analytical ones asymptotically. Moreover, we have

shown that parametric channel estimation outperforms MMSE-based channel estimation in

terms of correlation between the estimated channel and the underlying channel.

AI is the next frontier for future wireless cellular network. In this dissertation, we have

139

140 Chapter 6. Conclusion

developed a framework for MIMO broadcast beam optimization using deep reinforcement

learning. To be specific, we have proposed learning strategies for both single cell and multiple

cell environment with dynamic user distribution. The proposed solutions can autonomously

and adaptively update the RF parameters based on the changes in user distributions. Sim-

ulation results show that the proposed DRL based method completely converges with the

Oracle-suggested optimal solutions for both periodic and Markovian user mobility patterns.

Appendices

141

Appendix A

Proofs for Chapter 2, Chapter 3

A.1 Proof of Theorem 2.4

Effect of pilot contamination on the MSE of DoA estimation is given by

E

(4vni,i,`)21

=1

2

(r

(v)H


(fba)T

i,1 ·WTni,i,mat · r

(v)ni,i,`

−Re

r(v)T

ni,i,` ·Wni,i,mat ·C(fba)i,1 ·WT


), (A.1)

Let us now denote

βni,i,` = Vsigni,iΣ

sig−1

ni,i q`, (A.2)

αv,ni,i,` =

(pT`

(J

(v)1 Usig

ni,i

)+ (J

(v)2 /ejvni,i,` − J

(v)1

)(Unoiseni,i UnoiseH

ni,i

))T, (A.3)

142

A.1. Proof of Theorem 2.4 143

Using (2.19) and (2.20), we have WTni,i,matr

(v)ni,i,` = βni,i,`⊗αv,ni,i,`. The MSE in (A.1) becomes

E

(4vni,i,`)21

=1

2

((βni,i,` ⊗αv,ni,i,`)

H ·R(fba)Ti,1 · (βni,i,` ⊗αv,ni,i,`)

−Re

(βni,i,` ⊗αv,ni,i,`)T ·C(fba)

i,1 (βni,i,` ⊗αv,ni,i,`))

. (A.4)

It can be easily verified that αv,ni,i,` can be written as

αTv,ni,i,` = cT`

((Jv,2Ani,i

)+

Jv,2 −(Jv,1Ani,i

)+

Jv,1

),

=1

(M2 − 1)M1

[−1,−e−juni,i,` , . . . ,−e−j(M1−1)uni,i,` , 0, . . . , 0,

e−j(M2−1)vni,i,` , e−j((M2−1)vni,i,`+uni,i,`), . . . , e−j((M2−1)vni,i,`+(M1−1)uni,i,`)]. (A.5)

Next, in order to obtain the expression for βni,i,`, we need to perform the SVD of the

perturbation-free signal in (2.11):

√Λni,i

[Ani,iDni,iB

Hni,i(k) ΠNrA

∗ni,iD

∗ni,iB

Tni,i(k)ΠNt

]= Ani,idiag

bni,i

[BHni,i(k) Γni,iB

Tni,i(k)ΠNt

],

where

Γni,i = diag[e−j((M1−1)uni,i,0+(M2−1)vni,i,0), . . . , e−j((M1−1)uni,i,Lni,i−1+(M2−1)vni,i,Lni,i−1)

],

BHni,i(k) = diag

ejφ′ni,i,0 , . . . , e

jφ′ni,i,Lni,i−1

BHni,i(k), and

bni,i =[bni,i,0, . . . , bni,i,Lni,i−1

],

144 Appendix A. Proofs for Chapter 2, Chapter 3

where bni,i,` and φ′

ni,i,` are the amplitude and the phase of the channel gain αni,i(`), respec-

tively. Accordingly, based on Lemma 2.3, we can obtain

Usigni,i = 1/

√NrAni,i,

Σsigni,i =

√2NrNt

√Λni,idiag

bni,i

, and

VsigH

ni,i = 1/√

2Nt

[BHni,i(k) Γni,iB

Tni,i(k)ΠNt

].

In [19], the vector βni,i,` is given as βni,i,` = Vsigni,iΣ

sig−1

ni,i UsigH

ni,i Ani,ic`. Now substituting here

the expressions of Usigni,i, Σsig

ni,i, and Vsigni,i, we obtain:

βni,i,` =1

(bni,i,`)√

Λni,i

√2Nt

Vsigni,ic`. (A.6)

Hence, the expression (βni,i,` ⊗αv,ni,i,`) in (A.42) can be written as

β` ⊗αv,` =1

(bni,i,`)2Nt

√Λni,i

e−jφ′ni,i,èt,ni,i,k(`)

ejφ′ni,i,èj((M1−1)uni,i,`+(M2−1)vni,i,`)ΠNte

∗t,ni,i,k(`)

⊗αv,`

=1

(bni,i,`)2Nt

√Λni,i

e−jφ′ni,i,èt,ni,i,k(`)⊗αv,ni,i,`


∗t,ni,i,k(`)⊗αv,ni,i,`

,(A.7)

Now, using equation (2.23) and (A.43), the first term in (A.42) can be written as

(βni,i,` ⊗αv,ni,i,`)H ·R(fba)T

i,1 · (βni,i,` ⊗αv,ni,i,`)

=1

b2ni,i,`4N

2t Λni,i

((eHt,ni,i,k(`)⊗αH

v,ni,i,`

)RTi,1 (et,ni,i,k(`)⊗αv,ni,i,`)

+(eTt,ni,i,k(`)ΠNt ⊗αH

v,ni,i,`

)ΠNrNtR

Hi,1ΠNrNt

(ΠNte


))(A.8)


Now, using (2.15), and after some simplification, we have

(eHt,ni,i,k(`)⊗αH

v,ni,i,`

)RTi,1 (et,ni,i,k(`)⊗αv,ni,i,`)

=1

(M2 − 1)2M21

G−1∑g=0g 6=i

(√Λng,i

)2

Xng,iYng,i

L∑m=1

|αng,i(m)|2 (A.9)

where Xng,i and Yng,i are given by

Xng,i = Eψ∣∣(1 + e−j(ωni,i,`−ωng,i,m) + . . .+ e−j(Nt−1)(ωni,i,`−ωng,i,m)

)∣∣2 , (A.10)

Yng,i = Eθ,φ∣∣(1 + ej(uni,i,`−ung,i,m) + . . .+ ej(M1−1)(uni,i,`−ung,i,m)

) (ejvni,i,`e−jvng,i,m − 1

)∣∣2 ,(A.11)

for m = 0, . . . Lng,i − 1, and Eψ and Eθ,φ denote, respectively, expectations with respect to

DoD and DoAs. Now, similarly to (A.45), we also have

(eTt,ni,i,k(`)ΠNt ⊗αH

v,ni,i,`

)ΠNrNtR

Hi,1ΠNrNt

(ΠNte


)=

1

(M2 − 1)2M21

G−1∑g=0g 6=i

(√Λng,i

)2

Xng,iY′

ng,i

L∑m=1

|αng,i(m)|2, (A.12)

where

Y′

ng,i =Eθ,φ∣∣(ej(M1−1)ung,i,m + ejuni,i,`ej(M1−2)ung,i,m + . . .+ ej(M1−1)uni,i,`

)×(

ej(M2−1)vni,i,` − ej(M2−1)vng,i,m)∣∣2 , (A.13)


for m = 0, . . . Lng,i − 1. Now, using (A.45) and (A.46), we can write (A.44) as


i,1 · (βni,i,` ⊗αv,ni,i,`)

=1

b2ni,i,`4N

2t Λni,i

1

(M2 − 1)2M21

G−1∑g=0g 6=i

(√Λng,i

)2

Xng,i

Lng,i−1∑m=0

|αng,i(m)|2(Yng,i + Y

′

ng,i

)(A.14)

Similarly, we can also have


i,1 (βni,i,` ⊗αv,ni,i,`)

=1

b2ni,i,`2N

2t Λni,i

1

(M2 − 1)2M21

ejΦG−1∑g=0g 6=i

(√Λng,i

)2

Xng,iYng,i

Lng,i−1∑m=0

|αng,i(m)|2, (A.15)

where Φ = ((M1 − 1)uni,i,` + (M2 − 1)vni,i,`), and Yng,i is given in (2.29). Finally, plug the

expressions from (A.47) and (A.48) into (A.42), and the proof is finished.


Similarly to (A.42), MSE due to intra-cell interference can be written as

E

(4vni,i,`)22

=1

2



−Re



. (A.16)


Using (2.23) for m = 2, the first term in (A.49) can be expressed as



=1

b2ni,i,`4N

2t Λni,i

ejΦ[(

eHt,ni,i,k(`)ΠNt ⊗αTv,ni,i,`

)ΠNrNtR

∗i,2 (et,ni,i,k(`)⊗αv,ni,i,`)

+(eTt,ni,i,k(`)⊗αT

v,ni,i,`

)Ri,2ΠNrNt

(ΠNte


)](A.17)

Now, using (2.16), and after some simplifications, we can write the first term in (A.50) as

(eHt,ni,i,k(`)ΠNt ⊗αT

v,ni,i,`

)ΠNrNtR


=ρ21Eα,θ,φ,ψ

J−1∑j=0j 6=n

(√Λji,i

)2 (eHt,ni,i,k(`)1NtBji,i(k)⊗αT

v,ni,i,`ΠNrA∗ji,i

)vec Dji,i∗

×vec Dji,iT(BHji,i(k)1Ntet,ni,i,k(`)⊗AT

ji,iαv,ni,i,`

) (A.18)

It can be shown that

Eα

[(eHt,ni,i,k(`)1NtBji,i(k)⊗αT

v,ni,i,`ΠNrA∗ji,i

)vec Dji,i∗ vec Dji,iT ×(

BHji,i(k)1Ntet,ni,i,k(`)⊗AT

ji,iαv,ni,i,`

)]=∣∣∣X ′′ni,i,`∣∣∣2 Lji,i−1∑

m=0

|αji,i(m)|2 |Xji,i(m)|2 αTv,ni,i,`ΠNre

∗ji,i(m)eTji,i(m)αv,ni,i,`, (A.19)

where X′′

ni,i,` =Nt−1∑r=0

ejrωni,i,` , and Xji,i(m) =Nt−1∑r=0

ejr(ωni,i,`−ωji,i,m). After some tedious but

straight forward calculations, we have

Eθ,φ[αTv,ni,i,`ΠNre

∗ji,i(m)eTji,i(m)αv,ni,i,`

]=

1

(M2 − 1)2M21

Yji,i. (A.20)


Now, using (A.19) and (A.20), we can simplify (A.18) as follows:


v,ni,i,`

)ΠNrNtR


= ρ21

1

(M2 − 1)2M21

|X ′′ni,i,`|2

J−1∑j=0j 6=n

(√Λji,i

)2

Xji,iYji,i

Lji,i−1∑m=0

|αji,i(m)|2 . (A.21)

Similarly, we can simplify the second term in (A.50) as follows:


v,ni,i,`

)ΠNrNtR


= ρ21

1

(M2 − 1)2M21

|X ′′ni,i,`|2

J−1∑j=0j 6=n

(√Λji,i

)2

Xji,iYji,i

Lji,i−1∑m=0

|αji,i(m)|2 . (A.22)

Now, plugging the expressions from (A.51) and (A.52) into (A.50), we obtain



=1

b2ni,i,`4N

2t Λni,i

ρ21

1

(M2 − 1)2M21

|X ′′ni,i,`|2ejΦ×J−1∑

j=0j 6=n

(√Λji,i

)2

Lji,i−1∑m=0

|αji,i(m)|2Xji,i

(2Yji,i

) (A.23)


Following similar procedure, we can also obtain


i,2 · (βni,i,` ⊗αv,ni,i,`)

=1

b2ni,i,`4N

2t Λni,i

ρ21

1

(M2 − 1)2M21

|X ′′ni,i,`|2×J−1∑

j=0j 6=n

(√Λji,i

)2

Lji,i−1∑m=0

|αji,i(m)|2Xji,i

(Yji,i + Y

′

ji,i

) , (A.24)

Now, plugging the expressions from (A.53) and (A.54) into (A.49), we obtain the desired

result.


The problem in (2.42) is a convex quadratic optimization problem, and can be solved using

Lagrangian method. The Lagrange function for (2.42) can be written as

L (Vi[k], µik) = TrRHi Hi,i[k]Vi[k]VH

i [k]Ri −RHi Hi,i[k]Vi[k]−VH

i [k]Ri + I

+ µik(TrVi[k]VH

i [k] − Pt), (A.25)

where µik is the corresponding Lagrange multiplier. Now, taking the derivative of the La-

grange function w.r.t. Vi[k] and setting the derivative equal to zero, we can obtain the

desired result.



Proof. In the case of standard ESPRIT and for the circularly symmetric white noise, the

complementary covariance matrix, Cnn = 0 [? ]. Hence, we can write (3.14) as

E(4µ(r)

`

)2

=1

2

(r

(r)H` ·W∗

mat ·RTnn ·WT

mat · r(r)`

). (A.26)

Let us now denote β` = VsΣ−1s q`, and

α(r)` =

(pT`

(J

(r)1 Us

)† (J

(r)2 /ejµ

(r)` − J

(r)1

)(UnUn)

)T. (A.27)

Hence, we have:

WTmatr

(r)` = β` ⊗α

(r)` . (A.28)

Substituting (A.28) into (A.26) we obtain

E(4µ(r)

`

)2

=1

2

(β` ⊗α

(r)`

)HRTnn

(β` ⊗α

(r)`

). (A.29)

Now, the noise covariance matrix can be written as: Rnn = E

vecWvvecWvH

. If the

noise is assumed to be circularly symmetric and white Gaussian, the covariance matrix can

then be written as Rnn = σ2INrNcK . Hence, we can succinctly write (A.29) as

E(4µ(r)

`

)2

=σ2

2

(β` ⊗α

(r)`

)H (β` ⊗α

(r)`

)=σ2

2

(βH` β`

)⊗(α

(r)H` α

(r)`

)=σ2

2||β`||2 ||α(r)

` ||2 (A.30)


The vector α(r)` can be written as [63]:

α(r)T` = pT` (J1Us)

+ (J2/ejw` − J1

) (UnU

Hn

)= eT`

((J

(r)2 A(τ, θ, φ)

)†J

(r)2 −

(J

(r)1 A(τ, θ, φ)

)+

J(r)1

), (A.31)

where e` =

[0 . . . 1 . . . 0

]Tis the column selection vector with all zero elements

except the `-th one.

Computing the pseudo-inverses in (A.31) can be very cumbersome. However, for the massive

MIMO systems, pseudo inverse of the selected signal can be significantly simplified. For

mode, r = 1, we have:

(J

(1)1 A(τ, θ, φ)

)†=

((J

(1)1 A(τ, θ, φ)

)H (J

(1)1 A(τ, θ, φ)

))−1

(J

(1)1 A(τ, θ, φ)

)H=

1

M1M2(Nc − 1)

(J

(1)1 A(τ, θ, φ)

)H (J

(1)1 A(τ, θ, φ)

)M1M2(Nc − 1)

−1

(J

(1)1 A(τ, θ, φ)

)H (a)=

1

M1M2(Nc − 1)

(J

(1)1 A(τ, θ, φ)

)H,

(A.32)

where (a) holds due to Lemma 2.3. Similarly, we have

(J

(1)2 A(τ, θ, φ)

)+

=1

M1M2(Nc − 1)

(J

(1)2 A(τ, θ, φ)

)H. (A.33)

Using (A.32) and (A.33), and noting the definition of J(1)1 and J

(1)2 from (3.8), and after some


simplifications, we can write (A.31) for r = 1 as:

α(1)T` = eT`

((J

(1)2 A (τ, θ, φ))

)†J

(1)2

−(J

(1)1 A (τ, θ, φ))

)†J

(1)1

)=

1

M1M2(Nc − 1)

[−1,−e−ju` , . . . ,−e−j(M1−1)u` ,−e−jv` ,

. . . ,−e−j(M2−1)vè−j(M1−1)u` , 0, . . . , 0, e−j(Nc−1)ω` ,

e−j(Nc−1)ωè−ju` , . . . , e−j(Nc−1)ωè−j(M2−1)vè−j(M1−1)u`]. (A.34)

Accordingly, we have

∣∣∣∣∣∣α(1)`

∣∣∣∣∣∣2 = α(1)H` α

(1)` =

2

M1M2(Nc − 1)2. (A.35)

Similarly, for the parameter mode, r = 2, 3, we can obtain the following relations:

∣∣∣∣∣∣α(2)`

∣∣∣∣∣∣2 =2

M2Nc(M1 − 1)2,∣∣∣∣∣∣α(3)

`

∣∣∣∣∣∣2 =2

M1Nc(M2 − 1)2.

Now, extending the results in [63] to 3D parameter estimation problem, the vector, β`, can

be expressed as β` = VsΣ−1s UH

s A(τ, θ, φ)e`. Since β` is the `-th column of the matrix, ||β`||2

becomes the `-th diagonal element of the matrix, AH(τ, θ, φ)UsΣ−2s UH

s A(τ, θ, φ), and can

be succinctly expressed as ||β`||2 = R−1

SS(`, `)/K [63], where RSS(`, `) is the `-th diagonal

element of the equivalent transmit signal covariance matrix. Now, plugging the values of

||β`||2 and ||α(r)` ||2 into (A.29), for r = 1, 2, 3, we can, respectively, obtain the MSEs of the


temporal frequency, ω`, and spatial frequencies, u` and v` as follows:

E

(4ω`)2

=R−1ss (`, `)

K

σ2

M1M2(Nc − 1)2, (A.36)

E

(4u`)2

=R−1ss (`, `)

K

σ2

M2Nc(M1 − 1)2, (A.37)

E

(4v`)2

=R−1ss (`, `)

K

σ2

M1Nc(M2 − 1)2(A.38)

Now, based on Jacobian matrix, we have

E

(4τ`)2

= E

(4ω`)2 1

4π2(∆f)2, (A.39)

E

(4θ`)2

= E

(4u`)2 1

π2 sin2(θ`), (A.40)

E

(4φ`)2

=E

(4u`)2

cot2(θ`) cot2(φ`)

π2 sin2(θ`)

+E

(4v`)2

π2 sin2(θ`) sin2(φ`). (A.41)

Recognizing that RMSEτ` =√

E4τ 2` , RMSEθ` =

√E4θ2

`, and RMSEφ` =√E4φ2

`, and after substituting the (A.36), (A.37), and (A.38) into (A.39), (A.40), (A.41),

respectively, we obtain the desired result.


From [64] and [6], MSE E

(4vni,i,`)21

can be expressed as

E

(4vni,i,`)21

=1

2



−Re



. (A.42)


where expression (βni,i,` ⊗αv,ni,i,`) in (A.42) for superimposed pilot system can be expressed

as

β` ⊗αv,` =1

(bni,i,`)2Nt

√Λni,i

e−jφ′ni,i,`et,ni,i,k(`)


∗t,ni,i,k(`)

⊗αv,`

=1

(bni,i,`)2Nt

√Λni,i

e−jφ′ni,i,`YH

ni(k)et,ni,i,k(`)⊗αv,ni,i,`

ejφ′ni,i,`ej((M1−1)uni,i,`+(M2−1)vni,i,`)ΠNtY

Tni(k)e∗t,ni,i,k(`)⊗αv,ni,i,`

,(A.43)

Utilizing (A.43), the first term in (A.42) can thus be expressed as


i,1 · (βni,i,` ⊗αv,ni,i,`)

=1

b2ni,i,`4N

2t Λni,i

((eHt,ni,i,k(`)Yni(k)⊗αH

v,ni,i,`

)RTi,1

(YHni(k)et,ni,i,k(`)⊗αv,ni,i,`

)+(eTt,ni,i,k(`)Y

∗ni(k)ΠNt ⊗αH

v,ni,i,`

)ΠNrNtR

Hi,1ΠNrNt

(ΠNtY


)).

(A.44)

Now, we can write the first part in (A.44) as

(eHt,ni,i,k(`)Yni(k)⊗αH

v,ni,i,`

)RTi,1


)=

1

(M2 − 1)2M21

G−1∑g=0g 6=i

(√Λng,i

)2

(Xng,i + ρ22γ(1− γ)X

′

ng,i)Yng,i

L∑m=1

|αng,i(m)|2, (A.45)


Now, similarly to the derivation of (A.45), we also have

(eTt,ni,i,k(`)Y

∗ni(k)ΠNt ⊗αH

v,ni,i,`

)ΠNrNtR

Hi,1ΠNrNt

(ΠNtY


)=

1

(M2 − 1)2M21

G−1∑g=0g 6=i

(√Λng,i

)2


′

ng,i)Y′

ng,i

L∑m=1

|αng,i(m)|2, (A.46)

Accordingly, utilizing (A.45) and (A.46), (A.44) can be simplified as


i,1 · (βni,i,` ⊗αv,ni,i,`)

=1

b2ni,i,`4N

2t Λni,i

1

(M2 − 1)2M21

G−1∑g=0g 6=i

(√Λng,i

)2


′

ng,i)

×Lng,i−1∑m=0

|αng,i(m)|2(Yng,i + Y

′

ng,i

)(A.47)

We can similarly obtain the expression for the second term in (A.42) as



=1

b2ni,i,`N

2t Λni,i

ρ22γ(1− γ)

(M2 − 1)2M21

ejΦG−1∑g=0g 6=i

(√Λng,i

)2

X′

ng,iYng,i

Lng,i−1∑m=0

|αng,i(m)|2, (A.48)

Finally, after inserting the expressions from (A.47) and (A.48) into (A.42), we achieve the

desired result.



MSE E

(4vni,i,`)22

can be written as

E

(4vni,i,`)22

=1

2



−Re



. (A.49)

The second term in (A.49) can be written as



=1

b2ni,i,`4N

2t Λni,i

ejΦ[(

eHt,ni,i,k(`)Yni(k)ΠNt ⊗αTv,ni,i,`

)ΠNrNtR

∗i,2


)+(eTt,ni,i,k(`)Y

∗ni(k)⊗αT

v,ni,i,`

)Ri,2ΠNrNt

(ΠNtY


)](A.50)

Now, after some simplifications, we can write the first term in (A.50) as

(eHt,ni,i,k(`)Yni(k)ΠNt ⊗αT

v,ni,i,`

)ΠNrNtR

∗i,2


)=ρ2

1γ2 + ρ2

2γ(1− γ)

(M2 − 1)2M21

J−1∑j=0j 6=n

(√Λji,i

)2

X′

ji,iYji,i

Lji,i−1∑m=0

|αji,i(m)|2 . (A.51)

In a similar manner, second term in (A.50) can be written as:


v,ni,i,`

)ΠNrNtR


=ρ2

1γ2 + ρ2

2γ(1− γ)

(M2 − 1)2M21

J−1∑j=0j 6=n

(√Λji,i

)2

X′

ji,iYji,i

Lji,i−1∑m=0

|αji,i(m)|2 . (A.52)


Now, after inserting the expressions from (A.51) and (A.52) into (A.50), we obtain



=1

b2ni,i,`4N

2t Λni,i

ρ21γ

2 + ρ22γ(1− γ)

(M2 − 1)2M21

ejΦ

J−1∑j=0j 6=n

(√Λji,i

)2

Lji,i−1∑m=0

|αji,i(m)|2X

′

ji,i

(2Yji,i

)(A.53)

Similarly, we can also obtain


i,2 · (βni,i,` ⊗αv,ni,i,`)

=1

b2ni,i,`4N

2t Λni,i

ρ21γ

2 + ρ22γ(1− γ)

(M2 − 1)2M21

J−1∑j=0j 6=n

(√Λji,i

)2

Lji,i−1∑m=0

|αji,i(m)|2X

′

ji,i

(Yji,i + Y

′

ji,i

) ,(A.54)

Now, insert the expressions from (A.53) and (A.54) into (A.49), and the proof is complete.


For perfect uplink DoA estimation scenario, we have AHni,i(k) = Ani,i(k). Accordingly, from

(4.22), we have pselfni,q(k) = 0, and pintra

ni,q (k) and pinterni,q (k) can respectively be written as

pintrani,q (k) =

J−1∑j=0j 6=n

√Λji,i

Nr

AHni,i(k)Aji,i(k)Dji,i

(xqji(k) + sqji(k)

); (A.55)

pinterni,q (k) =

G−1∑g=0g 6=i

J−1∑j=0

√Λjg,i

Nr

AHni,i(k)Ajg,i(k)Djg,iB

Hjg,i

(BHjg,g


). (A.56)


Now, using Lemma 2.3, (1/Nr)Ani,i(k)Aji,i(k) → 0,∀(j 6= n) and (1/Nr)Ani,i(k)Ajg,i(k) →

0,∀(j 6= n, and g 6= i). Hence, for 3D massive MIMO systems, pintrani,q (k)→ 0 and pinter

ni,q (k)→

0. Accordingly, (4.21) simplifies as

zqni(k) =

√Λni,i

Nr


qni(k) + wq

i (k) a=

√Λni,iDni,ix

qni(k) + wq

i (k), (A.57)

where a results since (1/√Nr)Ani,i(k) is unitary. Hence, from (A.57), mutual information

for n-th user in i-th cell at the k-th subcarrier, can be written as

Iul,sdni [k] = log2 det

(ILni,i +

1

σ2Dni,iQ

ul,sdni [k]DH

ni,i

). (A.58)

Finally, using Hadamard’s Inequality, (A.58) results in

Iul,sdni [k] = log2 Π

`

(1 +

Λni,i|αni,i(`)|2pul,sdni,` [k]

σ2

)=

Lni,i−1∑`=0

log2

(1 + γni,`p

ul,sdni,` [k]

). (A.59)


Using Lemma 3 in [6], the product, AHni,i(k)Ani,i(k), can be written as

1

Nr

AHni,i(k)Ani,i(k) =

1

Nr

diageHr,ni,i,k(0)er,ni,i,k(0), . . . , eHr,ni,i,k(Lni,i − 1)er,ni,i,k(Lni,i − 1).

Similarly, using Lemma 2 and Lemma 3 in [6], it can be shown that 1Nr

AHni,i(k)Ani,i(k)→ 1

NrI,

1Nr

AHni,i(k)Aji,i(k) → 0, and 1

NrAHni,i(k)Ajg,i(k) → 0. Accordingly, from (4.23) and (4.24),

for large antenna system, pintrani,q′ (k)→ 0 and pinter

ni,q′(k)→ 0, and from (4.22), pselfni,q(k) simplifies


as

pselfni,q(k) =

√Λni,i

Nr

(AHni,i(k)Ani,i(k)− I

)Dni,is

qni(k); (A.60)

Now, from (4.21), we have

zqi (k) =

√Λni,i

Nr


qni(k)

+√

Λni,i

(1

Nr

AHni,i(k)Ani,i(k)− I

)Dni,is

qni(k) + wq

i (k). (A.61)

Accordingly, achievable rate for superimposed pilot+data transmission phase under DoA

estimation error can be written as

Iul,sdni [k] = E

log2 det

(ILni,i +

Λni,i

N2r

AHni,i(k)Ani,i(k)Dni,iQ

ul,sdni [k]DH

ni,iAHni,i(k)Ani,i(k)Rul,s−1

ni [k]

),

(A.62)

where Rul,sni = Λni,i

(1Nr


)Dni,iQ

ul,spni [k]DH

ni,i

(1Nr


)H+

σ2I, where Qul,spni [k] is the covariance matrix of superimposed pilot symbols, and the expec-

tation is taken with respect to DoA estimation error.

The termΛni,iN2r

AHni,i(k)Ani,i(k)Dni,iQ

ul,sdni [k]DH

ni,iAHni,i(k)Ani,i(k) in (A.62) is a diagonal matrix

with (`, `)-th diagonal element beingΛni,iN2r|αni,i(`)|2

∣∣eHr,ni,i,k(`)er,ni,i,k(`)∣∣2 pul,sdni,` [k]. Similarly,

Rul,sni in (A.62) also results in a diagonal matrix, where the (`, `)-th diagonal element is given

by (σ2 + Λni,i|αni,i(`)|2∣∣∣ 1Nr

eHr,ni,i,k(`)er,ni,i,k(`)− 1∣∣∣2 pul,sp

ni,` [k]). Hence, using Hadamard’s in-

equality, from (A.62), the expected achievable rate during superimposed (pilot+data) trans-


mission phase under DoA estimation error becomes

Iul,sdni [k] = E

Lni,i−1∑`=0

log

1 +

1N2rγni,`

∣∣eHr,ni,i,k(`)er,ni,i,k(`)∣∣2 pul,sdni,` [k]

(σ2 + γni,`

∣∣∣ 1Nr

eHr,ni,i,k(`)er,ni,i,k(`)− 1∣∣∣2 pul,sp

ni,` [k])

, (A.63)


and pul,sdni,` [k] and pul,sp

ni,` [k] are the transmit powers during the uplink channel estimation phase

allocated on the data and pilot symbols, respectively.


Following the line of proof of Theorem 4.8 in Appendix A.3, (4.27) can be written as

zq′

i (k) =√

Λni,i1

Nr


q′

ni(k) + wq′

i (k) =√

Λni,iDni,ixq′

ni(k) + wq′

i (k).

(A.64)

Hence, mutual information for n-th user in i-th cell at the k-th subcarrier, can be written as

Iul,ddni [k] = log2 det

(ILni,i +

1

σ2Dni,iQ

ul,ddni [k]DH

ni,i

)= log2 det

(ILni,i +

1

σ2diagΛni,iαni,i(0)pul,dd

ni,0 [k], . . . ,Λni,iαni,i(Lni,i − 1)pul,ddni,Lni,i−1[k]

).

Finally using Hadamard’s Inequality, we obtain the desired results:

Iul,ddni [k] = log2 Π

`

(1 +

Λni,i|αni,i(`)|2pul,ddni,` [k]

σ2

)=

Lni,i−1∑`=0

log2

(1 + γni,`p

ul,ddni,` [k]

). (A.65)



Following proof of Theorem 4.10, in presence of DoA estimation error, (4.27) can be written

as

zq′

i (k) =

√Λni,i

Nr


q′

ni(k) + wq′

i (k).

Accordingly, achievable rate for data-only transmission phase under DoA estimation error:

Iul,ddni [k] = E

log2 det

(ILni,i +

Λni,i

N2r σ

2AHni,i(k)Ani,i(k)Dni,iQ

ul,ddni [k]DH

ni,iAHni,i(k)Ani,i(k)

).

(A.66)

Now, the termΛni,iN2r σ

2 AHni,i(k)Ani,i(k)Dni,iQ

ul,ddni [k]DH

ni,iAHni,i(k)Ani,i(k) in (A.66) is a diago-

nal matrix with (`, `)-th diagonal element beingΛni,iN2r σ

2 |αni,i(`)|2∣∣eHr,ni,i,k(`)er,ni,i,k(`)∣∣2 pul,dd

ni,` [k].

Finally, using Hadamard’s inequality, the expected achievable rate during data-only trans-

mission phase under DoA estimation error becomes

Iul,ddni [k] = E

Lni,i−1∑`=0

log

(1 +

1N2rγni,` |er,ni,i,k(`)er,ni,i,k(`)|2 pddni,`[k]

σ2

) , (A.67)


and pul,ddni,` [k] are the transmit powers during the uplink data-only transmission phase.

Bibliography

[1] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base sta-

tion antennas,” IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590 – 3600, Novem-

ber 2010.

[2] Y. Kim, H. Ji, J. Lee, Y.-H. Nam, B. L. Ng, I. Tzanidis, Y. Li, and J. Zhang, “Full

dimension MIMO (FD-MIMO): The next evolution of MIMO in LTE systems,” IEEE

Wireless Commun., vol. 21, no. 3, pp. 92–100, June 2014.

[3] L. Liu, R. Chen, S. Geirhofer, K. Sayana, Z. Shi, and Y. Zhou, “Downlink MIMO in

LTE-Advanced: SU-MIMO vs. MU-MIMO,” IEEE Commun. Mag., vol. 50, no. 2, pp.

140–147, February 2012.

[4] A. Paulraj, R. Roy, and T. Kailath, “A subspace rotation approach to signal parameter

estimation,” Proc. IEEE, vol. 74, no. 7, pp. 1044–1046, July 1986.

[5] R. Shafin and L. Liu, “Doa estimation and performance analysis for multi-cell multi-

user 3d mmwave massive-mimo ofdm system,” in 2017 IEEE Wireless Communications

and Networking Conference (WCNC). IEEE, 2017, pp. 1–6.

[6] R. Shafin, L. Liu, J. Zhang, and Y. C. Wu, “DoA Estimation and Capacity Analysis

for 3-D Millimeter Wave Massive-MIMO/FD-MIMO OFDM Systems,” IEEE Trans.

Wireless Commun., vol. 15, no. 10, pp. 6963–6978, Oct 2016.

162

BIBLIOGRAPHY 163

[7] R. Shafin, L. Liu, Y. Li, A. Wang, and J. Zhang, “Angle and Delay Estimation for 3-D

Massive MIMO/FD-MIMO Systems Based on Parametric Channel Modeling,” IEEE

Trans. Wireless Commun., vol. 16, no. 8, pp. 5370–5383, Aug 2017.

[8] H. Almosa, R. Shafin, S. Mosleh, Z. Zhou, Y. Li, J. Zhang, and L. Liu, “Downlink

channel estimation and precoding for fdd 3d massive mimo/fd-mimo systems,” in 2017

26th Wireless and Optical Communication Conference (WOCC). IEEE, 2017, pp. 1–4.

[9] R. Shafin, L. Liu, and J. C. Zhang, “Doa estimation and capacity analysis for 3d massive-

mimo/fd-mimo ofdm system,” in 2015 IEEE Global Conference on Signal and Informa-

tion Processing (GlobalSIP). IEEE, 2015, pp. 181–184.

[10] D. Vasisht, S. Kumar, H. Rahul, and D. Katabi, “Eliminating channel feedback in next-

generation cellular networks,” in 2016 ACM SIGCOMM Conference, 2016, pp. 398–411.

[11] E. Bjornson, E. G. Larsson, and M. Debbah, “Massive MIMO for Maximal Spectral

Efficiency: How Many Users and Pilots Should Be Allocated?” IEEE Trans. Wireless

Commun., vol. 15, no. 2, pp. 1293–1308, Feb 2016.

[12] T. V. Chien, E. Bjornson, and E. G. Larsson, “Joint Power Allocation and User As-

sociation Optimization for Massive MIMO Systems,” IEEE Trans. Wireless Commun.,

vol. 15, no. 9, pp. 6384–6399, Sept 2016.

[13] R. Shafin, L. Liu, and J. C. Zhang, “On the channel estimation for 3d massive mimo

systems,” E-LETTER, 2014.

[14] R. Shafin, “Performance analysis of parametric channel estimation for 3d massive

mimo/fd-mimo ofdm systems.” Master’s thesis, University of Kansas, 2017.

[15] B. Yang, K. Letaief, R. Cheng, and Z. Cao, “Channel Estimation for OFDM Trans-

164 BIBLIOGRAPHY

mission in Multipath Fading Channels Based on Parametric Channel Modeling,” IEEE

Trans. Commun., vol. 49, no. 3, pp. 467–479, Mar 2001.

[16] M. Larsen, A. Swindlehurst, and T. Svantesson, “Performance bounds for MIMO-

OFDM channel estimation,” IEEE Trans. Signal Process., vol. 57, no. 5, pp. 1901–1916,

May 2009.

[17] M. Wax and A. Leshem, “Joint Estimation of Time Delays and Directions of Arrival of

Multiple Reflections of a Known Signal,” IEEE Trans. Signal Process., vol. 45, no. 10,

pp. 2477–2484, October 1997.

[18] R. Roy and T. Kailath, “ESPRIT-Estimation of Signal Parameters via Rotational In-

variance Techniques,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 7, pp.

984–995, 1989.

[19] F. Li, H. Liu, and R. Vaccaro, “Performance analysis for DoA estimation algorithms:

unification, simplification, and observations,” IEEE Trans. Aerosp. Electron. Syst.,

vol. 29, no. 4, pp. 1170–1184, Oct 1993.

[20] F. Roemer, M. Haardt, and G. Del Galdo, “Analytical performance assessment of multi-

dimensional matrix- and tensor-based ESPRIT-type algorithms,” IEEE Trans. Signal

Process., vol. 62, no. 10, pp. 2611–2625, May 2014.

[21] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. Soong, and J. C. Zhang,

“What will 5G be?” IEEE J. Sel. Areas Commun., vol. 32, no. 6, pp. 1065–1082, 2014.

[22] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski, “Five disruptive

technology directions for 5g,” IEEE Communications Magazine, vol. 52, no. 2, pp. 74–

80, 2014.

BIBLIOGRAPHY 165

[23] C.-X. Wang, F. Haider, X. Gao, X.-H. You, Y. Yang, D. Yuan, H. Aggoune, H. Haas,

S. Fletcher, and E. Hepsaydir, “Cellular architecture and key technologies for 5g wireless

communication networks,” IEEE Communications Magazine, vol. 52, no. 2, pp. 122–

130, 2014.

[24] R. Shafin, L. Liu, V. Chandrasekhar, H. Chen, J. Reed et al., “Artificial intelligence-

enabled cellular networks: A critical path to beyond-5g and 6g,” to appear on IEEE

Wireless Communications, arXiv preprint arXiv:1907.07862, 2019.

[25] T. Cousik, R. Shafin, Z. Zhou, K. Kleine, J. Reed, and L. Liu, “Cogrf: A new fron-

tier for machine learning and artificial intelligence for 6g rf systems,” arXiv preprint

arXiv:1909.06862, 2019.

[26] A. Akhtar, J. Ma, R. Shafin, J. Bai, L. Li, Z. Li, and L. Liu, “Low latency scalable

point cloud communication in vanets using v2i communication,” in ICC 2019-2019

IEEE International Conference on Communications (ICC). IEEE, 2019, pp. 1–7.

[27] R. Shafin, L. Liu, J. Ashdown, J. Matyjas, M. Medley, B. Wysocki, and Y. Yi, “Realizing

green symbol detection via reservoir computing: An energy-efficiency perspective,” in

2018 IEEE International Conference on Communications (ICC). IEEE, 2018, pp. 1–6.

[28] S. Mosleh, L. Liu, C. Sahin, Y. R. Zheng, and Y. Yi, “Brain-inspired wireless commu-

nications: Where reservoir computing meets mimo-ofdm,” IEEE Trans. Neural Netw.

Learn. Syst., 2017.

[29] Z. Zhou, L. Liu, and H.-H. Chang, “Learn to demodulate: Mimo-ofdm symbol detection

through downlink pilots,” arXiv preprint arXiv:1907.01516, 2019.

[30] S. Hamalainen, H. Sanneck, and C. Sartori, LTE self-organising networks (SON): net-

work management automation for operational efficiency. John Wiley & Sons, 2012.

166 BIBLIOGRAPHY

[31] M. Peng, D. Liang, Y. Wei, J. Li, and H.-H. Chen, “Self-configuration and self-

optimization in lte-advanced heterogeneous networks,” IEEE Communications Mag-

azine, vol. 51, no. 5, pp. 36–45, 2013.

[32] O. Sallent, J. Perez-Romero, J. Sanchez-Gonzalez, R. Agustı, M. A. Dıaz-Guerra,

D. Henche, and D. Paul, “A roadmap from umts optimization to lte self-optimization,”

IEEE Communications Magazine, vol. 49, no. 6, pp. 172–182, 2011.

[33] H. Hu, J. Zhang, X. Zheng, Y. Yang, and P. Wu, “Self-configuration and self-

optimization for LTE networks,” IEEE Commun. Mag., vol. 48, no. 2, 2010.

[34] R. Shafin and L. Liu, “Multi-Cell Multi-User Massive FD-MIMO: Downlink Precoding

and Throughput Analysis,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 487–502,

Jan 2019.

[35] R. Shafin, L. Liu, Y. Li, A. Wang, and J. Zhang, “Angle and delay estimation for 3-d

massive mimo/fd-mimo systems based on parametric channel modeling,” IEEE Trans.

Wireless Commun., vol. 16, no. 8, pp. 5370–5383, Aug 2017.

[36] A. Galindo-Serrano and L. Giupponi, “Distributed Q-learning for aggregated interfer-

ence control in cognitive radio networks,” IEEE Trans. Veh. Technol., vol. 59, no. 4,

pp. 1823–1834, 2010.

[37] H. Saad, A. Mohamed, and T. ElBatt, “Distributed cooperative Q-learning for power

allocation in cognitive femtocell networks,” in IEEE Veh. Technol. Conf. (VTC Fall),

2012, 2012, pp. 1–5.

[38] M. Bennis, S. M. Perlaza, P. Blasco, Z. Han, and H. V. Poor, “Self-organization in

small cell networks: A reinforcement learning approach,” IEEE transactions on wireless

communications, vol. 12, no. 7, pp. 3202–3212, 2013.

BIBLIOGRAPHY 167

[39] J. Nie and S. Haykin, “A q-learning-based dynamic channel assignment technique for

mobile communication systems,” IEEE Transactions on Vehicular Technology, vol. 48,

no. 5, pp. 1676–1687, 1999.

[40] Y.-S. Chen, C.-J. Chang, and F.-C. Ren, “Q-learning-based multirate transmission con-

trol scheme for RRM in multimedia WCDMA systems,” IEEE Trans. Veh. Technol.,

vol. 53, no. 1, pp. 38–48, 2004.

[41] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press,

2018.

[42] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and

M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint

arXiv:1312.5602, 2013.

[43] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,

M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep

reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.

[44] H. Chang, H. Song, Y. Yi, J. Zhang, H. He, and L. Liu, “Distributive dynamic spectrum

access through deep reinforcement learning: A reservoir computing based approach,”

IEEE Internet Things J., pp. 1–1, 2019.

[45] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double

q-learning.” in AAAI, vol. 2. Phoenix, AZ, 2016, p. 5.

[46] R. Shafin, H. Chen, Y. H. Nam, S. Hur, J. Park, J. Reed, L. Liu et al., “Self-tuning sec-

torization: Deep reinforcement learning meets broadcast beam optimization,” to appear

on IEEE Transactions on Wireless Communications, arXiv preprint arXiv:1906.06021,

2020.

168 BIBLIOGRAPHY

[47] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S. Rappaport, and

E. Erkip, “Millimeter wave channel modeling and cellular capacity evaluation,” IEEE

J. Sel. Areas Commun., vol. 32, no. 6, pp. 1164–1179, 2014.

[48] 3GPP TR 36.814 V.9.2.0 Evolved Universal Terrestrial Radio Access (E-UTRA); Fur-

ther advancements for E-UTRA physical layer aspects, March 2017.

[49] B. M. Popovic, “Generalized chirp-like polyphase sequences with optimum correlation

properties,” IEEE Trans. Inf. Theory, vol. 38, no. 4, pp. 1406–1409, 1992.

[50] T. S. Rappaport, F. Gutierrez, E. Ben-Dor, J. N. Murdock, Y. Qiao, and J. I. Tamir,

“Broadband millimeter-wave propagation measurements and models using adaptive-

beam antennas for outdoor urban cellular communications,” IEEE Trans. Antennas

Propag., vol. 61, no. 4, pp. 1850–1859, April 2013.

[51] O. E. Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially Sparse Pre-

coding in Millimeter Wave MIMO Systems,” IEEE Trans. Wireless Commun., vol. 13,

no. 3, pp. 1499–1513, March 2014.

[52] A. Alkhateeb, O. E. Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid

precoding for millimeter wave cellular systems,” IEEE J. Sel. Topics Signal Process.,

vol. 8, no. 5, pp. 831–846, Oct 2014.

[53] A. Wang, L. Liu, and J. Zhang, “Low complexity direction of arrival (DoA) estimation

for 2D massive MIMO systems,” in IEEE Global Commun. Conf., 2012, pp. 703–707.

[54] M. Haardt and J. A. Nossek, “Unitary ESPRIT: How to obtain increased estimation

accuracy with a reduced computational burden,” IEEE Trans. Signal Process., vol. 43,

no. 5, pp. 1232–1242, 1995.

BIBLIOGRAPHY 169

[55] 3GPP, “Study on channel model for frequency spectrum above 6 GHz,” 3rd Generation

Partnership Project (3GPP), TR 38.900 V14.2.0, Dec 2016.

[56] M. K. Samimi and T. S. Rappaport, “3-D millimeter-wave statistical channel model for

5G wireless system design,” IEEE Transactions on Microwave Theory and Techniques,

vol. 64, no. 7, pp. 2207–2225, 2016.

[57] S. U. Pillai and B. H. Kwon, “Forward/backward spatial smoothing techniques for co-

herent signal identification,” IEEE Trans. Acoust., Speech, and Signal Process., vol. 37,

no. 1, pp. 8–15, 1989.

[58] Y. Zhu, L. Liu, and J. Zhang, “Joint angle and delay estimation for 2D active broadband

MIMO-OFDM systems,” in IEEE Global Commun. Conf., 2013, pp. 3300–3305.

[59] Q. Shi, M. Razaviyayn, Z. Q. Luo, and C. He, “An Iteratively Weighted MMSE Ap-

proach to Distributed Sum-Utility Maximization for a MIMO Interfering Broadcast

Channel,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4331–4340, Sept 2011.

[60] R. Shafin, M. Jiang, S. Ma, L. Piazzi, and L. Liu, “Joint Parametric Channel Estimation

and Performance Characterization for 3D Massive MIMO OFDM Systems,” in IEEE

Intl. Conf. on Commun., 2018, pp. 1–6.

[61] R. Shafin, L. Liu, and J. Zhang, “Doa estimation and rmse characterization for 3d

massive-mimo/fd-mimo ofdm system,” in 2015 IEEE Global Communications Confer-

ence (GLOBECOM), Dec 2015, pp. 1–6.

[62] S. Rangan, T. Rappaport, and E. Erkip, “Millimeter Wave Cellular Wireless Networks:

Potentials and Challenges,” Proc. IEEE, vol. 102, no. 3, pp. 366–385, Nov 2014.

[63] F. Li, H. Liu, and R. J. Vaccaro, “Performance Analysis for DOA Estimation Algo-

170 BIBLIOGRAPHY

rithms: Unification, Simplification, and Observations,” vol. 29, no. 4, pp. 1170–1184,

October 1993.

[64] R. Shafin and L. Liu, “Multi-cell multi-user massive FD-MIMO: downlink precoding

and throughput analysis,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 487–502,

Jan. 2019.

[65] R. Shafin, L. Liu, J. Ashdown, J. Matyjas, and J. Zhang, “On the Channel Estimation

of Multi-Cell Massive FD-MIMO Systems,” in 2018 IEEE Intl. Conf. Commun. (ICC),

pp. 1–6.

[66] D. Vasisht, S. Kumar, H. Rahul, and D. Katabi, “Eliminating channel feedback

in next-generation cellular networks,” in Proceedings of the 2016 ACM SIGCOMM

Conference, ser. SIGCOMM ’16. New York, NY, USA: ACM, 2016, pp. 398–411.

[Online]. Available: http://doi.acm.org/10.1145/2934872.2934895

[67] H. Zhang, S. Gao, D. Li, H. Chen, and L. Yang, “On superimposed pilot for channel

estimation in multicell multiuser MIMO uplink: Large system analysis,” IEEE Trans.

on Vehi. Tech., vol. 65, no. 3, pp. 1492–1505, March 2016.

[68] K. Upadhya, S. A. Vorobyov, and M. Vehkapera, “Superimposed pilots are superior

for mitigating pilot contamination in massive MIMO,” IEEE Trans. Signal Process.,

vol. 65, no. 11, pp. 2917–2932, June 2017.

[69] J. Ma, C. Liang, C. Xu, and L. Ping, “On orthogonal and superimposed pilot schemes

in massive MIMO NOMA systems,” IEEE J. on Sel. Areas Commun., vol. 35, no. 12,

pp. 2696–2707, Dec. 2017.

[70] Z. Zhou, L. Liu, and J. Zhang, “FD-MIMO via Pilot-Data Superposition: Tensor-Based

http://doi.acm.org/10.1145/2934872.2934895

BIBLIOGRAPHY 171

DOA Estimation and System Performance,” IEEE J. Sel. Topics Signal Process., vol. 13,

no. 5, pp. 931–946, Sep. 2019.

[71] R. Shafin and L. Liu, “Superimposed pilot for multi-cell multi-usermassive fd-mimo

systems,” to appear on IEEE Transactions on Wireless Communications, 2020.

[72] K. Upadhya, S. A. Vorobyov, and M. Vehkapera, “Downlink performance of super-

imposed pilots in massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 17,

no. 10, pp. 6630–6644, Oct. 2018.

[73] X. Jing, M. Li, H. Liu, S. Li, and G. Pan, “Superimposed Pilot Optimization Design

and Channel Estimation for Multiuser Massive MIMO Systems,” IEEE Trans. Veh.

Technol., vol. 67, no. 12, pp. 11 818–11 832, Dec. 2018.

[74] D. Verenzuela, E. Bjornson, and L. Sanguinetti, “Spectral and energy efficiency of

superimposed pilots in uplink massive MIMO,” IEEE Trans. Wireless Commun., vol. 17,

no. 11, pp. 7099–7115, Nov. 2018.

[75] H. Chen, Y. H. Nam, R. Shafin, and J. Zhang, “Method and apparatus for machine

learning based wide beam optimization in cellular network,” Dec. 10 2019, uS Patent

10,505,616.

[76] S. Joseph, R. Misra, and S. Katti, “Towards self-driving radios: Physical-layer control

using deep reinforcement learning,” in Proceedings of the 20th International Workshop

on Mobile Computing Systems and Applications, ser. HotMobile ’19. New York, NY,

USA: ACM, 2019, pp. 69–74.

[77] 3GPP, “Radio measurement collection for Minimization of Drive Tests (MDT),” 3rd

Generation Partnership Project (3GPP), TS 37.320 V14.0.0, Mar. 2017.

172 BIBLIOGRAPHY

[78] L.-J. Lin, “Reinforcement learning for robots using neural networks,” Carnegie-Mellon

Univ Pittsburgh PA School of Computer Science, Tech. Rep., 1993.

[79] K. P. Sycara, “Multiagent systems,” AI Mag., vol. 19, no. 2, p. 79, 1998.

[80] M. Wooldridge, An introduction to multiagent systems. John Wiley & Sons, 2009.

3d massive mimo and arti cial intelligence for next generation … · for 3d massive mimo...

Documents