ieee transactions on wireless communications, vol....

14
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 17, NO. 4, APRIL 2018 2557 Massive CSI Acquisition for Dense Cloud-RANs With Spatial-Temporal Dynamics Xuan Liu , Yuanming Shi , Member, IEEE, Jun Zhang , Senior Member, IEEE , and Khaled B. Letaief, Fellow, IEEE Abstract— Dense cloud radio access networks (cloud-RANs) provide a promising way to enable scalable connectivity and han- dle diversified service requirements for massive mobile devices. To fully exploit the performance gains of dense cloud-RANs, channel state information of both the signal link and interference links is required. However, with limited radio resources for training, the channel estimation problem in dense cloud-RANs becomes a high-dimensional estimation problem, i.e., the number of measurements will be typically smaller than the dimension of the channel. In this paper, we shall develop a generic high- dimensional structured channel estimation framework for dense cloud-RANs, which is based on a convex structured regularizing formulation. Observing that the wireless channel possesses ample exploitable statistical characteristics, we propose to convert the available spatial and temporal prior information into appro- priate convex regularizers. Simulation results demonstrate that exploiting the spatial and temporal dynamics can achieve good estimation performance even with limited training resources. The alternating direction method of multipliers algorithm is further adopted to solve the resultant large-scale high-dimensional chan- nel estimation problems. The proposed framework thus enjoys modeling flexibility, low training overhead, and computation cost scalability. Index Terms— Cloud-RANs, CSI, high-dimensional structured estimation, structured regularizers, ADMM, spatial and temporal dynamics, and massive device connectivity. I. I NTRODUCTION C LOUD radio access network (Cloud-RAN) [2] has recently been proposed as a revolutionary architecture Manuscript received March 26, 2017; revised September 3, 2017 and December 27, 2017; accepted January 16, 2018. Date of publication February 13, 2018; date of current version April 8, 2018. This work was supported in part by the National Nature Science Foundation of China under Grant 61601290, in part by the Hong Kong Research Grant Council under Grant 16211815, and in part by the Shanghai Sailing Program under Grant 16YF1407700. This paper was presented at the IEEE International Conference on Communications, Paris, France, May 2017 [1]. The associate editor coordinating the review of this paper and approving it for publication was H.-C. Wu. (Corresponding author: Yuanming Shi.) X. Liu is with the School of Electrical and Information Engineer- ing, The University of Sydney, Sydney, NSW 2006, Australia (e-mail: [email protected]). Y. Shi is with the School of Information Science and Tech- nology, ShanghaiTech University, Shanghai 201210, China (e-mail: [email protected]). J. Zhang is with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong (e-mail: [email protected]). K. B. Letaief is with Hamad Bin Khalifa University, Doha, Qatar, and also with The Hong Kong University of Science and Technology, Hong Kong (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TWC.2018.2797969 for future cellular networks to meet the exponential growth of mobile data traffic. In this new architecture, all the baseband signal processing will be shifted to a central location in a datacenter cloud, called the baseband unit (BBU) pool, while conventional powerful base stations are replaced by low-power low-cost remote radio heads (RRHs). These RRHs are connected to the BBU pool through high-bandwidth and low-latency transport links [3]. Such an approach has significant cost advantages and can reduce both the capital expenditure (CAPEX) (e.g., via low-cost site construction) and operational expenditure (OPEX) (e.g., via centralized cooling). Thanks to the centralized signal processing and resource allocation, it can significantly improve both spectrum efficiency and energy efficiency [4], [5]. The benefits of full cooperation in Cloud-RANs, however, highly depend on the quality of the available channel state information (CSI) [6]. In particular, to provide transmitter- side CSI in frequency-division duplex (FDD) systems, which dominate the current cellular networks, the channel state needs to be first estimated at the mobile users (MUs) via downlink training and then to be fed back to the transmitters [7]. However, in dense Cloud-RANs with massive devices, as the BBU pool can typically support hundreds of RRHs [2], [8], obtaining such massive CSI will deplete the radio resources, which is regarded as the “curse of dimensionality” of Cloud- RAN [9]. Specifically, Hassibi and Hochwald [10] showed that the training length for a MIMO channel should be equal to the number of transmit antennas to make channel estimation feasible with the least-squares estimator [11], which thus cannot be applied in dense Cloud-RAN with a large number of antennas at cooperative RRHs. It was shown in [12] that the training length can be reduced by exploiting the channel spatial correlation with the minimum mean-square error (MMSE) estimator [12]. Recently, temporal correlation has been proven to be necessary to reduce the training overhead for the FDD massive MIMO system [13], [14], with the Kalman filter [11] to track the channel time variations. However, to make it tractable, either the MMSE estimator or the Kalman filter usually makes the Gaussian distribution assumption for the underlying channel model. Furthermore, in the context of dense wireless cooperative networks, such spatial and temporal channel statistics may be difficult to obtain. For instance, as the dimension of the channel covariance matrix grows quadratically with the total number of transmit antennas, the overhead for feeding back such a high dimensional covariance matrix will be prohibitive, even though channel statistics change relatively slow compared 1536-1276 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 28-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 17, NO. 4, APRIL 2018 2557

Massive CSI Acquisition for Dense Cloud-RANsWith Spatial-Temporal Dynamics

Xuan Liu , Yuanming Shi , Member, IEEE, Jun Zhang , Senior Member, IEEE,and Khaled B. Letaief, Fellow, IEEE

Abstract— Dense cloud radio access networks (cloud-RANs)provide a promising way to enable scalable connectivity and han-dle diversified service requirements for massive mobile devices.To fully exploit the performance gains of dense cloud-RANs,channel state information of both the signal link and interferencelinks is required. However, with limited radio resources fortraining, the channel estimation problem in dense cloud-RANsbecomes a high-dimensional estimation problem, i.e., the numberof measurements will be typically smaller than the dimensionof the channel. In this paper, we shall develop a generic high-dimensional structured channel estimation framework for densecloud-RANs, which is based on a convex structured regularizingformulation. Observing that the wireless channel possesses ampleexploitable statistical characteristics, we propose to convert theavailable spatial and temporal prior information into appro-priate convex regularizers. Simulation results demonstrate thatexploiting the spatial and temporal dynamics can achieve goodestimation performance even with limited training resources. Thealternating direction method of multipliers algorithm is furtheradopted to solve the resultant large-scale high-dimensional chan-nel estimation problems. The proposed framework thus enjoysmodeling flexibility, low training overhead, and computation costscalability.

Index Terms— Cloud-RANs, CSI, high-dimensional structuredestimation, structured regularizers, ADMM, spatial and temporaldynamics, and massive device connectivity.

I. INTRODUCTION

CLOUD radio access network (Cloud-RAN) [2] hasrecently been proposed as a revolutionary architecture

Manuscript received March 26, 2017; revised September 3, 2017 andDecember 27, 2017; accepted January 16, 2018. Date of publicationFebruary 13, 2018; date of current version April 8, 2018. This work wassupported in part by the National Nature Science Foundation of Chinaunder Grant 61601290, in part by the Hong Kong Research Grant Councilunder Grant 16211815, and in part by the Shanghai Sailing Program underGrant 16YF1407700. This paper was presented at the IEEE InternationalConference on Communications, Paris, France, May 2017 [1]. The associateeditor coordinating the review of this paper and approving it for publicationwas H.-C. Wu. (Corresponding author: Yuanming Shi.)

X. Liu is with the School of Electrical and Information Engineer-ing, The University of Sydney, Sydney, NSW 2006, Australia (e-mail:[email protected]).

Y. Shi is with the School of Information Science and Tech-nology, ShanghaiTech University, Shanghai 201210, China (e-mail:[email protected]).

J. Zhang is with the Department of Electronic and Computer Engineering,The Hong Kong University of Science and Technology, Hong Kong (e-mail:[email protected]).

K. B. Letaief is with Hamad Bin Khalifa University, Doha, Qatar, and alsowith The Hong Kong University of Science and Technology, Hong Kong(e-mail: [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TWC.2018.2797969

for future cellular networks to meet the exponential growth ofmobile data traffic. In this new architecture, all the basebandsignal processing will be shifted to a central location ina datacenter cloud, called the baseband unit (BBU) pool,while conventional powerful base stations are replaced bylow-power low-cost remote radio heads (RRHs). These RRHsare connected to the BBU pool through high-bandwidthand low-latency transport links [3]. Such an approach hassignificant cost advantages and can reduce both the capitalexpenditure (CAPEX) (e.g., via low-cost site construction)and operational expenditure (OPEX) (e.g., via centralizedcooling). Thanks to the centralized signal processing andresource allocation, it can significantly improve both spectrumefficiency and energy efficiency [4], [5].

The benefits of full cooperation in Cloud-RANs, however,highly depend on the quality of the available channel stateinformation (CSI) [6]. In particular, to provide transmitter-side CSI in frequency-division duplex (FDD) systems, whichdominate the current cellular networks, the channel state needsto be first estimated at the mobile users (MUs) via downlinktraining and then to be fed back to the transmitters [7].However, in dense Cloud-RANs with massive devices, as theBBU pool can typically support hundreds of RRHs [2], [8],obtaining such massive CSI will deplete the radio resources,which is regarded as the “curse of dimensionality” of Cloud-RAN [9]. Specifically, Hassibi and Hochwald [10] showed thatthe training length for a MIMO channel should be equal tothe number of transmit antennas to make channel estimationfeasible with the least-squares estimator [11], which thuscannot be applied in dense Cloud-RAN with a large numberof antennas at cooperative RRHs. It was shown in [12] that thetraining length can be reduced by exploiting the channel spatialcorrelation with the minimum mean-square error (MMSE)estimator [12]. Recently, temporal correlation has been provento be necessary to reduce the training overhead for the FDDmassive MIMO system [13], [14], with the Kalman filter [11]to track the channel time variations.

However, to make it tractable, either the MMSEestimator or the Kalman filter usually makes the Gaussiandistribution assumption for the underlying channel model.Furthermore, in the context of dense wireless cooperativenetworks, such spatial and temporal channel statistics maybe difficult to obtain. For instance, as the dimension of thechannel covariance matrix grows quadratically with the totalnumber of transmit antennas, the overhead for feeding backsuch a high dimensional covariance matrix will be prohibitive,even though channel statistics change relatively slow compared

1536-1276 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

2558 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 17, NO. 4, APRIL 2018

with instantaneous channel information [14]. Hence, reducingthe prior information overhead and modeling assumptionswhile guaranteeing a good estimation performance isnecessary to overcome the curse of dimensionality for thechannel estimation problem in dense Cloud-RANs.

In this paper, we are interested in the high-dimensionalchannel estimation problem in dense Cloud-RANs, where thetraining length is smaller than the dimension of the channel.This problem is thus an ill-posed inverse problem [15]. It hasbeen well recognized that exploiting low-complexity structuresof the underlying signals (e.g., sparsity, low-rankness) canmake the underdetermined inverse problems solvable. Convexoptimization provides a powerful tool to achieve this goalwith computational efficiency (i.e., polynomial time solvable)and statistical efficiency (i.e., good estimation performance)by designing appropriate regularizing functions [15], [16] toincorporate the low-complexity signal structures. This moti-vates recent works on the compressed sensing based wirelesschannel estimation technique [17] by exploiting the channelsparsity in the angular and frequency domain. In particular,the required training length is only in the order of O(s log N)for a MIMO channel with the sparsity level s by exploiting itsspatial sparsity via ℓ1-norm minimization [15], which substan-tially reduces the training overhead compared with the least-squares estimator [10]. The training overhead can be furtherreduced if additional structured information is available, e.g.,the partial support information of sparse channels [18].

However, the spatial sparsity modeling assumption isquestionable in dense wireless networks [19]. Therefore,the conventional compressed sensing based approach [17] maylose its effectiveness in terms of training overhead reduction.This motivates us to take both spatial and temporal dynamicsinto consideration to enhance the channel estimation perfor-mance, thereby reducing the training overhead. In particular,we shall exploit the unique property of heterogeneous channelgains in the spatial domain for dense Cloud-RANs to reducethe training overhead with few modeling assumptions [5]. Theheterogeneity of large-scale fading has already been exploitedfor coordinated beamforming [9] and topological interferencealignment [20], [21] to reduce the CSI signaling overhead.We further exploit the temporal correlations [13], [14]and temporal sparsity to reduce the training overhead.The temporal sparsity in dense Cloud-RANs with massivedevices is based on the fact that only a few number ofmobile devices are active for Internet-of-Things (IoT)devices and machine-type mobile devices using randomaccess [22].

Although there have been numerous research efforts onCSI acquisition, there is still a lack of a systematic approachfor CSI training overhead reduction, especially in such densewireless networks as Cloud-RANs. In contrast to the pre-vious works, in this paper, we propose a novel modelingand algorithmic framework for high-dimensional structuredchannel estimation via the following regularized optimizationformulation

H = arg minM∈Cm×n

L(M; Z) + λR(M), (1)

where Z is a collection of observations, λ ≥ 0 is the user spec-ified parameter, L and R are convex functions. Specifically,function L measures the compatibility of the channel estimateM with the observations Z using the least-squares criteria, andfunction R encodes the available spatial and temporal priorinformation into the regularizing functions [15], [16], whichaims at enforcing structures in the solution and improvingthe estimation performance. We mainly focus on the convexstructured regularizers due to the computational and statisticalefficiency. In Section III, we shall propose to adopt theweighted ℓ1-norm [23] to exploit the spatial prior information,which consists of only the dominated large-scale fading coeffi-cients. A quadratic regularizing function [11, Sec. 2.4] will beproposed to exploit the temporal correlation prior informationin Section IV, which measures the closeness of the currentchannel estimate to the previous estimated channel via tempo-ral correlation. We further present the group-structured regular-izer [16] in Section V to exploit the temporal sparsity for com-pressive channel estimation with massive device connectivity .

To improve the computational efficiency and make thealgorithms scale well to large network sizes, the genericlarge-scale conic programming solver SCS [24] is adopted tosolve the resultant convex high-dimensional channel estima-tion problems. It is an alternating direction method of multipli-ers (ADMM) (i.e., operator splitting method) based first-orderalgorithm [25], and is amenable to parallel computation. Thisis achieved by equivalently reformulating the convex regular-ized optimization problem (1) into a standard conic programform using the epigraph form [26] and Smith form [27]. There-fore, the proposed unified massive CSI acquisition frameworkpossesses features including modeling flexibly via exploitingthe spatial and temporal dynamics, as well as computationscalability. That is, it can scale well to large network sizesin terms of the reduction of prior information overhead andthe computation cost. Simulation results shall demonstrate theeffectiveness of the proposed CSI estimation method, and theimportant role of the spatial and temporal prior informationfor channel training.

A. Contributions

The major contributions of the paper are summarized asfollows:

1) We propose a high-dimensional structured channel esti-mation framework to unify the benefits of exploiting thespatial and temporal dynamics for massive CSI acquisi-tion. It enjoys the modeling flexibility and algorithmicefficiency by developing various structured convex reg-ularizers.

2) To encode the heterogenous large-scale fading in thespatial domain, the weight ℓ1-norm regularizer isdeveloped. A composite convex regularizer is furtherproposed to exploit the spatial-temporal channel dynam-ics. We also present a group-structured regularizer fortraining to exploit the temporal sparsity for massivedevice connectivity.

3) The operator splitting method [24] is adopted tosolve the convex high-dimensional channel estimation

Page 3: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

LIU et al.: MASSIVE CSI ACQUISITION FOR DENSE CLOUD-RANs WITH SPATIAL-TEMPORAL DYNAMICS 2559

Fig. 1. The architecture of Cloud-RAN with massive mobile devices,in which, all the RRHs are connected to a single BBU pool through high-capacity and low-latency optical fronthaul links.

problems in parallel. To achieve this goal, we propose totransform the original convex programs into a standardcone program via epigraph reformulation, followed bythe Smith form reformulation [27].

4) The proposed high-dimensional structured channel esti-mation framework provides a robust way for massiveCSI acquisition with different spatial and temporaldynamics patterns. Typical examples on exploiting spa-tial and temporal prior information are simulated todemonstrate the effectiveness of the proposed methods.

This work may serve the general purpose of developingpractical modeling and algorithmic strategies for massiveCSI acquisition in dense wireless networks. Our previousworks [3], [5], [8], [9], [20], [27] mainly focus on large-scalenetwork resource management problems with different CSIassumptions.

II. SYSTEM MODEL AND PROBLEM STATEMENTS

A. System Model

Consider a frequency division duplexing (FDD) Cloud-RANwith L multi-antenna RRHs and K single-antenna mobileusers (MUs) as shown in Fig. 1, where the l-th RRH isequipped with Nl antennas. Denote N = ∑L

l=1 Nl as the totalnumber of transmit antennas at all the RRHs. In Cloud-RAN,all the baseband units (BBUs) are shifted to a single clouddatacenter, i.e., forming a BBU pool, where the centralizedsignal processing is performed. The RRHs are connected to theBBU pool via high-capacity and low-latency optical fronthaullinks. Cloud-RAN is thus a cost-effective network architecturethat leverages recent advances in cloud computing and net-work function virtualization to improve energy and spectralefficiency of wireless networks. With the shared computationresources at the cloud data center and distributed low-cost low-power remote radio heads (RRHs), Cloud-RAN provides anideal platform to harness the benefits of network cooperationand coordination [4], [5], [28].

In this paper, we first focus on the downlink channelestimation problem in Section III and Section IV, and thendiscuss how to extend to the uplink channel estimation inSection V. We adopt a block fading channel model withcoherence time T0, i.e., the channel coefficients keep constant

for a time period T0 and change to a new realization in thenext block [7]. In particular, at the τ -th block, the channelpropagation from the l-th RRH to the k-th MU is denoted ashkl [τ ] ∈ CNl . Let hk[τ ] = [hk1[τ ]T , . . . , hkL [τ ]T ]T ∈ CN bethe channel vector from all the RRHs to MU k at block τ . Thechannel matrix from all the RRHs to all the MUs at block τis denoted as H[τ ] = [h1[τ ], . . . , hK [τ ]] ∈ CN×K .

B. Linear Measurements Model

We consider the downlink training problem. Let T be thetraining length for each block, and 0 < T < T0. At the τ -th block, RRH l sends a (T × Nl )-dimensional measurementmatrix Ql [τ ], and MU k receives a T -dimensional complexobservation vector yk[τ ] ∈ CT

yk[τ ] = Q[τ ]hk[τ ] + nk[τ ], ∀τ ∈ J , (2)

where J = {0, 1 . . . , J − 1}, Q[τ ] = [ Q1[τ ], . . . , QL [τ ]] ∈CT ×N , and nk[τ ] ∈ CT is the additive noise vector. LetN[τ ] = [n1[τ ], . . . , nK [τ ]] ∈ CT ×K . The linear observationsY [τ ] = [ y1[τ ], . . . , yK [τ ]] ∈ CT ×K for channel estimationcan be rewritten as

Y [τ ] = Q[τ ]H[τ ] + N[τ ], ∀τ ∈ J . (3)

The main purpose of downlink CSI acquisition is to enablethe centralized precoding/beamforming design at the BBUpool. For FDD systems, channel estimation can be eitherperformed at MUs and then the MUs feed back the quantizedchannel estimate to the RRHs, or the MUs directly feed backthe observations (2) to the RRHs and then perform channelestimation at the BBU pool, which may lead to non-linearobservations [29] for channel estimation due to quantization.Note that either the received quantized channel estimate or theobservations at RRHs need to be further delivered to the BBUpool through the fronthaul links, where the signal process-ing (e.g., precoding, estimation) is performed. As our mainfocus in this paper is the channel estimation algorithm design,we ignore the impact of limited feedback and limited fronthaullink capacity, and assume that the BBU pool can perfectlyaccess the linear observations Y [τ ]’s. In this paper, we areinterested in the scenario with T < N . We thus propose todirectly feed back all the observations Y [τ ] to RRHs andthen recover the channel H[τ ] jointly at the BBU pool. Thishelps reduce the feedback overhead, and can further reducethe estimation overhead by exploiting the low-dimensionalstructure in H[τ ] jointly. The effect of quantization of theobservations will be evaluated via simulations. As the pro-posed methodologies can be also applied to the time divisionduplex (TDD) mode, where the uplink training is performed.Considering the amount of feedback, in the scenario with alarge number of MUs, uplink training is more attractive.

C. Channel Estimation and Modeling Assumptions

1) High-Dimensional Structured Channel Estimation: Inthis paper, we are interested in the high-dimensional channelestimation problem in dense Cloud-RANs, where the train-ing length T is smaller than the dimension of the channel.Therefore, it is an ill-posed inverse problem [15]. Fortunately,

Page 4: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

2560 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 17, NO. 4, APRIL 2018

the wireless channel propagation often possesses spatial andtemporal dynamics, thereby providing prior information toimprove estimation performance. We thus propose a generalframework to convert the available prior information into con-vex structured regularizing functions [16], yielding a convexoptimization solution to the underdetermined inverse problemfor channel estimation. We, therefore, mainly focus on theestimation algorithm design based on the available spatial andtemporal prior information.

2) Channel Modeling With Spatial-Temporal Dynamics:Channel modeling assumptions often lead to the use ofdifferent strategies to estimate channel efficiently. Channelspatial and temporal dynamics play a critical role for high-dimensional structured channel estimation [17]. To simplifythe presentation, we assume that the channel spatial andtemporal statistics keep the same during J blocks. Specifically,the instantaneous channel matrix H[τ ] of block τ can bemodeled as the following Hadamard product form

H[τ ] = D ◦ G[τ ], ∀τ ∈ J , (4)

where A ◦ B denotes the element-wise product of A and B,G[τ ] ∈ CN×K and D = [Dij ] ∈ CN×K represent the small-scale and large-scale fading coefficient matrix, respectively.Here, entry Dij models the path-loss from transmit antenna ito mobile user j . As RRHs in a Cloud-RAN are distributed atdifferent locations, we do not assume any spatial correlations.Furthermore, we assume that the time variation of the channelmatrices H[τ ]’s across different blocks follow the followingfirst-order stationary Gauss-Markov process [13], [14]

H[τ ] = C H[τ − 1] + V [τ ], ∀τ ∈ J , (5)

where C ∈ CN×N is the temporal correlation coefficientmatrix and is assumed to be the same during all the J blocks,and V [τ ] ∈ CN×K is an innovation process. Here, channelmatrices H[τ ]’s are written as (4), while (5) representsthe temporal relationship between current channel H[τ ] andprevious channel H[τ − 1].

To make the estimation approach robust, we only make fewchannel modeling assumptions on the spatial and temporalprior information for channel estimation, while keeping theestimation performance competitive. Specifically, for channelmodeling in the temporal domain, we only assume that thetemporal correlation matrix C is known. We also furtherexploit the channel temporal sparsity for massive device con-nectivity. In this scenario, out of large number of devices, onlya few number of machine-type mobile devices are active withsporadic traffic [22].

Although sparsity plays a key role for high-dimensionalstructured estimation, spatial sparsity assumption for wirelesschannel is still questionable in dense wireless networks [19].Furthermore, the channel distribution information is difficultto measure and model. Most of the available distributionmodels are basically empirical and may lead to mismatch withthe real channel distributions. In this paper, we only assumethat the large-scale fading channel coefficients are available,which only depend on the location information. The successof this proposal is based on the heterogeneity of large-scalefading [9], [21]. The recent work on network connectivity

information based localization approach [30] can be appliedin dense wireless networks for location estimation. As onlyconnectivity information is required, this approach thus scaleswell to large network sizes.

III. HIGH-DIMENSIONAL STRUCTURED CHANNEL

ESTIMATION WITH SPATIAL DYNAMICS

In this section, we propose the weight ℓ1-norm minimizationapproach for high-dimensional structured channel estimationby exploiting the spatial prior information. To simplify thenotation, we omit the block index τ in this section.

A. Channel Estimation With Structured Regularizers

Given the observation Y and the measurement matrix Q atone channel block, a popular way for estimating the channelmatrix H is based on the following least-squares criterion,

H = arg minM

∥Y − QM∥2F , (6)

which is, however, meaningful only when T ≥ N [11]. Forthe high-dimensional estimation problem considered in thispaper, we investigate how the prior information can helpimprove the performance of channel estimation. Specifically,we propose the following generic high-dimensional channelestimation framework via the regularizing formulation

H = arg minM

∥Y − QM∥2F +

i

νi fi (M), (7)

where the first term serves to measure the fitness of theestimate M to the observation Y based on the least-squarescriterion as in (6), and the second term serves to encodedifferent types of available prior information into the solu-tion using different structured regularizing functions fi (M)’s,while νi ’s (νi > 0) are the corresponding weights to be chosen.In this paper, we mainly focus on the following problems

• Design algorithmically and statistically efficient regular-izing functions fi ’s to encode the available spatial priorinformation.

• Demonstrate that high-dimensional channel estimationcan be done with a reasonable amount of prior infor-mation.

The main observation that motivates the proposed channelestimation framework is that, in wireless networks, the channelis not arbitrary and actually possesses additional exploitablestatistical characteristics. To make the estimation algorithmcomputationally tractable, we restrict the regularizing func-tions to convex functions, which have been proven powerfulin high-dimensional problems [15], [16] to achieve both sta-tistical and computational efficiency.

Furthermore, there is an alternative estimation frameworkbased on the following constrained formulation

P : H = arg minM

i

λi fi (M)

subject to ∥Y − QM∥F ≤ ϵ, (8)

where ϵ > 0 is an upper bound on the noise ∥N∥F andis assumed to be known as a prior. Lagrange multipliers

Page 5: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

LIU et al.: MASSIVE CSI ACQUISITION FOR DENSE CLOUD-RANs WITH SPATIAL-TEMPORAL DYNAMICS 2561

indicate that solving the constrained program P is equivalentto solving the regularized problem (7) under convexity andmild regularity conditions [31]. We thus focus on channelestimator design based on the constrained formulation P .Note that the denoising error ϵ in problem P is boundedby the noise variance with high probability via the denoisingprocedure [32]. Basically, the noise variance can be estimatedeither using the pilot sequences embedded in the signal orusing the transmitted data in a non-data aided manner [33].

B. Prior Information With Spatial Dynamics

We exploit the spatial domain prior information for train-ing, including the statistical distribution information and theheterogeneity of large-scale fading coefficients.

1) Channel Distribution Information: A unique propertyof distributed networks such as Cloud-RAN is that signalscoming from different RRHs are with different pathlosses,i.e., some links will be strong while some will be weak. This isdifferent from a point-to-point multi-antenna system, and willbe critical for the channel estimation design for Cloud-RANs.As vec(ABC) = (CT ⊗ A)vec(B), the signal model (3) canbe vectorized as

vec(Y)︸ ︷︷ ︸y

= (I K ⊗ Q)︸ ︷︷ ︸Q

vec(H)︸ ︷︷ ︸h

+ vec(N)︸ ︷︷ ︸n

, (9)

where y ∈ CT K , Q ∈ CT K×N K , h ∈ CN K and n ∈ CT K .Assume that only the spatial prior information (i.e., large-scale fading coefficients) is available. We first consider theoptimal performance with full prior information. Assume thath ∈ CN (µ,!), where µ ∈ CN K and ! ∈ CN K×N K

with ! ≽ 0; and n ∈ CN (θ,#), where θ ∈ CT K and# ∈ CT K×T K with # ≽ 0. The MMSE estimate for thechannel vector h is given by [11]

hmmse = E[h| y] =∫

h f (h| y)dh

= µ + ! QH( Q! Q

H + #)−1(y − Qµ − θ). (10)

The MMSE estimator can provide the optimal performancebut it requires the knowledge of the channel covariancematrix ! ∈ CN K×N K , which is difficult to obtain in denseCloud-RANs. Furthermore, most of the distribution modelsare empirical and may yield mismatch with the real channelpropagations especially in the ultra-dense wireless networks,where the exact channel model should be derived from theMaxwell’s equations [34]. In the following, we will showthat with the help of the new estimator (8), we can achievethe performance of the MMSE estimator with much lessprior information. For the conventional MMSE estimator, thenumber of orthogonal pilot symbols, i.e., the training length T ,must scale linearly with the total number of transmit antennasN at all the RRHs, i.e., O(N). However, the number ofavailable orthogonal pilot symbols is limited by the channelcoherence bandwidth and coherence time [35]. As N becomesvery large in ultra-dense Cloud-RANs, we do not have suffi-cient pilots to support channel estimation even without theconsideration of the channel signaling overhead issue. In this

paper, we are thus particularly interested in the scenario withT < N .

2) Heterogenous Large-Scale Fading: Due to the pathlossand shadowing, in dense Cloud-RANs, the wireless channelpropagation gains are heterogeneous. For example, the resultsin [9] showed that only obtaining the dominated channellinks is enough to achieve performance close to the one withfull CSI. Thus, significant CSI overhead reduction can beachieved. Furthermore, topological interference alignment isa promising proposal to manage the interference in partiallyconnected dense wireless networks [20]. However, the spatialsparsity modeling assumption for the purpose of channel esti-mation is still questionable [19]. Instead of using ∥vec(H)∥0as the sparsity measurement, the soft sparsity measurement∥vec(H)∥1/∥vec(H)∥2 [36] has recently been proposed asan effective way to address the issue of sparsity modeling.However, estimating the soft sparsity level is also challenging.

To address the above modeling issues, as well as reducethe overhead of spatial prior information, we propose to onlyacquire the dominated large-scale fading coefficients Dij ’sindexed by the following set

D = {(i, j)|Dij ≥ D0}, (11)

where D0 ≥ 0 is a predefined threshold. This is based onthe heterogeneity of large-scale fading coefficients due to thepathloss and shadowing. Furthermore, the large-scale fadingcoefficients can be obtained by the location information ofall the nodes in the networks. Localization can normallybe achieved by measuring the pairwise Euclidean distancematrix [37], which, however, yields significant overhead indense Cloud-RANs. To scale well to large network sizes,we can adopt the 1-bit matrix completion approach for local-ization, which is only based on the partial network connectivityinformation [30].

Based on the large-scale fading prior information D (11),we propose to use the following weighted ℓ1-norm [23] toencode the available spatial prior information,

f1(M) = ∥W ◦ M∥1, (12)

where ∥A∥1 = ∑mi=1

∑nj=1 |Aij | is the entry-wise ℓ1-norm

for the m × n matrix, and the weight matrix W = [Wij ] isdefined as follows

Wij ={

1/Dij , if (i, j) ∈ D;1/(2D0), otherwise.

(13)

Intuitively, smaller large-scale fading will yield smallerchannel estimate Mij , due to the large penalty weight Wij .That is, the channel coefficient with small large-scale fadingcoefficient has a high probability to yield a small instantaneouschannel coefficient. The large-scale fading coefficients providethe prior information in the form of probabilities that eachchannel coefficient is non-zero. This spatial prior informationthus provides the prior distribution of the channel coefficients,which can be leveraged by using the non-uniform weightsin (12). Specifically, we assume that partially spatial priorinformation (i.e., large-scale fading coefficients indexed by D)is available. The threshold D0 is determined by the amount ofavailable large-scale fading coefficients.

Page 6: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

2562 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 17, NO. 4, APRIL 2018

C. High-Dimensional Channel Estimation via Weightedℓ1-Norm Minimization

Based on the convex structured regularizer (12), we arriveat the following high-dimensional channel estimation approachvia weighted ℓ1-norm minimization:

P1 : H = arg minM

∥W ◦ M∥1

subject to ∥Y − QM∥F ≤ ϵ, (14)

where ϵ > 0 is an upper bound on the noise and is assumedto be known as a prior. Although problem P1 can be refor-mulated as the regularized optimization formulation (1), it isdifficult to determine the optimal regularizer parameter. Fromthe Hadamard product channel representation (4), the weightedℓ1-norm regularizing function serves to enforce the structure inthe solution of algorithm P1 such that the instantaneous chan-nel coefficients are encouraged to be small if the correspondinglarge-scale coefficients are small. Note that without weights,it will become the compressed sensing (CS) based estimator,i.e., it only tries to exploit the sparsity structure in the channel.As will be shown later, the CS approach will not work fordense Cloud-RANs, as the channel is not truly sparse [19].This demonstrates a typical channel modeling issue in denseCloud-RANs.

IV. MASSIVE CSI ACQUISITION WITH

SPATIAL-TEMPORAL DYNAMICS

In this section, we exploit both the spatial and temporaldynamics to further improve the channel estimation. This isachieved by the proposal of the composite structured regular-izer to encode the spatial and temporal prior information.

A. Prior Information With Temporal Correlations

In this subsection, we shall demonstrate that simultaneouslyexploiting the spatial and temporal dynamics can improvethe channel estimation performance if second-order statisticalinformation of the channel is available. Exploiting channelcorrelation both in time and space is known to be effectivefor training [13], [14]. Specifically, the Gauss-Markov channelmodel (5) can be vectorized as

h[τ ] = Ch[τ − 1] + v[τ ], ∀τ ∈ J , (15)

where C = (I K ⊗ C) ∈ CN K×N K and v[τ ] = vec(V [τ ]) ∈CN K . Combining (9) and (15), we arrive at the following state-space system [11]

S :{

h[τ ] = Ch[τ − 1] + v[τ ]y[τ ] = Qi [τ ]h[τ ] + n[τ ], (16)

where τ ∈ J . An optimal channel estimator for (16) is givenby the Kalman filter [11]. To make it computationally tractable,we further make the following assumptions:

• All the variables, i.e., h, n, v, in the state-space systemS are jointly Gaussian distributed.

• The channel, observation noise and innovation processcovariance matrices ! = E[h[τ ]h[τ ]H], # =E[n[τ ]n[τ ]H], and $ = E[v[τ ]v[τ ]H],∀τ ∈ J , areavailable.

The Kalman filter [11] for the channel estimate h[τ ] basedon the observations y[0], . . . , y[J − 1], is presented in Algo-rithm 1 with the following notations:

• hτ |i with i = τ or τ − 1, denotes the MMSE estimate ofh[τ ] based on y[0] through y[i ];

• The estimation error covariance is denoted as %τ |i =E[δτ |iδH

τ |i ] with δτ |i = h[τ ]− hτ |i and i = τ or i = τ −1.

Algorithm 1 Kalman Filter Based Channel Estimation

Initialization: h0|−1 = 0; %0|−1 = !.while τ = 0, 1, . . . , J − 1, doMeasurement update:

• Calculate Kalman gain matrix:

K τ = %τ |τ−1 Q[τ ]H( Q[τ ]%τ |τ−1 Q[τ ]H + #)−1.

• Update estimate with measurement yi :

hτ |τ = hτ |τ−1 + K τ (y[τ ] − Q[τ ]hτ |τ−1).

• Compute MSE matrix

%τ |τ = (I − K τ Q[τ ])%τ |τ−1.

Time update:• Prediction: hτ |τ−1 = Chτ−1.• Prediction MSE matrix:

%τ |τ−1 = C%τ−1|τ−1CH + $.

end while.

However, as we have seen that the Kalman filter requiresthe prior information on the second-order statistics of thechannel, the observation noise, and the innovation process.In last Section III, we have presented a way to reduce therequirements of the spatial prior information compared to thestatistical distribution based MMSE estimator. In this section,we propose to further reduce the temporal prior informationoverhead compared to the Kalman filter. This is achieved byonly exploiting the large-scale fading coefficients D (11) andtemporal correlation coefficient C . Furthermore, Kalman filterbased channel estimation has the cubic complexity to performmatrix inversion at each iteration. We shall propose a convexregularized optimization approach to reduce both the priorinformation overhead and the computational complexity. Thecomputational scalability issue will be addressed in Section VIvia the parallel first-order method.

B. Spatial-Temporal Structured Channel Estimation

In this subsection, we shall further demonstrate that thechannel estimation performance can be improved with infor-mation of only the large-scale fading and temporal correlationcoefficients, instead of the second-order statistical prior infor-mation used in the Kalman Filter approach in Algorithm 1.From the Gauss-Markov model (5), we can use the previouslyestimated channel H[τ − 1] to predict the current channelH[τ ], i.e.,

H[τ ] = C H[τ − 1], (17)

Page 7: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

LIU et al.: MASSIVE CSI ACQUISITION FOR DENSE CLOUD-RANs WITH SPATIAL-TEMPORAL DYNAMICS 2563

which provides a guess for the current channel at the τ -thblock. Therefore, we propose to use the following squared ℓ2-norm with a quadratic form as a convex regularizing functionto incorporate the available temporal prior information (17),

f2(M; τ ) = ∥M − H[τ ]∥2F , (18)

for the τ -th block. Note that the previous estimated channelH[τ − 1] always available as a prior, as we can store it at theBBU pool.

We thus arrive at the following channel estimation problemby encoding the spatial and temporal prior information intothe following composite convex regularizer function,

P2(τ ) : H[τ ] = arg minM

∥W ◦ M∥1 + λ∥M − H[τ ]∥2F

subject to ∥Y [τ ] − Q[τ ]M∥F ≤ ϵ, (19)

where the quadratic regularizing function (i.e., the squaredℓ2-norm) serves to enforce the fidelity of the prior informationH[τ ] and weight λ indicates how confident we are aboutthe fidelity of the optimal solution H[τ ] to the given matrixH[τ ] [11, Section 2.4]. Here, ϵ > 0 is an upper bound onthe noise and is assumed to be known as a prior. Althoughproblem P2 can be reformulated as the regularized optimiza-tion formulation (1), it is difficult to determine the optimalregularizer parameter.

We now present the proposed spatial-temporal structuredchannel estimation algorithm in Algorithm 2.

Algorithm 2 Spatial-Temporal Structured Channel Esti-mation Algorithm P2

Initialization: H[−1] = 0, H[0] = 0;while τ = 0, 1, . . . , J − 1, do

• Measurement update: Solve problem P2(τ ) andobtain H[τ ].

• Time update: Let τ = τ + 1, compute the matrix H[τ ](17) that serves as the temporal prior information.

end while.

Note that, to update the measurement, both Kalman filterand the proposed Algorithm 2 have cubic complexity toperform matrix inversion and to solve convex optimizationproblem P2 using interior-point method [26], respectively.To improve the computational efficiency, in Section VI, wepropose to use a first-order method to solve the large-scaleoptimization problem P2 to scale well to large problem sizes.

We summarize the required prior information to computedifferent channel estimation algorithms in Table I.

V. HIGH-DIMENSIONAL UPLINK CHANNEL ESTIMATION

WITH MASSIVE DEVICES

In this section, we consider the uplink channel estima-tion for dense Cloud-RANs with massive mobile devices.It is straightforward to extend the results in Section III andSection IV for downlink channel estimation to the scenarioof uplink channel estimation when all the devices are active.However, for ultra-dense Cloud-RAN with massive sporadic

TABLE I

PRIOR INFORMATION REQUIRED FOR DIFFERENCE ALGORITHMS

traffic type devices (i.e., in each time slot only a few devicesare active out of all the massive devices), it is critical toprovide radio access for massive mobile devices by exploitingthe sparsity of the mobile devices activity. In this scenario,the results in Section III and Section IV may not be effective.More structured sparsity thus needs to be exploited. In thissection, we propose to exploit the temporal sparsity for uplinktraining, thereby supporting massive device connectivity.

A. Prior Information With Temporal Sparsity

Massive device connectivity is a key requirement for 5Gcellular networks, e.g., dense Cloud-RANs. As the communi-cation for machine-type mobile devices using random accessand Internet-of-Things (IoT) devices is sporadic, only a fewdevices are active out of all the massive devices [22]. Foruplink transmission with channel coherence time T0, we con-sider the joint active user identification and channel estimationproblem. Specifically, for any given channel coherence block,the received signal at all the RRHs is given by

y(ℓ) =K∑

i=1

hi qi (ℓ) + n(ℓ), ℓ = 1, . . . , T, (20)

where T ≤ T0 is the training length, hi ∈ CN is the channelvector from mobile device i to all the RRHs with N antennasin total, qi (ℓ) ∈ C is the pilot symbol transmitted from mobiledevice i at time slot ℓ, y(ℓ) ∈ CN is the received signal at allRRHs, n(ℓ) ∈ CN is the additive noise.

Let Y = [ y(1), . . . , y(T )]H ∈ CT ×N be the received signalacross N antennas, H = [h1, . . . , hK ]H ∈ CK×N be thechannel matrix from all the mobile devices to all the RRHs,and Q = [q(1), . . . , q(T )]H ∈ CT ×K be the known pilotmatrix with q(ℓ) = [q1(ℓ), . . . , qK (ℓ)]H ∈ CK . We thenrewrite (20) as

Y = Q'H + N, (21)

where ' = diag(π1, . . . ,πK ) ∈ RK×K is the diagonal activitymatrix with πi = 1 indicating device i active and πi = 0representing device i inactive, and N = [n(1), . . . , n(T )] ∈CT ×N is the additive noise matrix. Our goal is to estimate Hand ' simultaneously, provided that active diagonal matrix 'is sparse. This is based on the fact that the number of activemobile users has temporal sparsity.

The model in (21) provides a unified framework for high-dimensional structured estimation for the unknown matrices

Page 8: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

2564 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 17, NO. 4, APRIL 2018

' and H . Specifically, when ' is an unknown diagonalmatrix, problem (21) represents the problems including phaseretrieval [38] and blind deconvolution [39]. If ' is an unknownpermutation matrix, problem (21) is the linear regressionproblem with an unknown permutation [40], which is criticalto reduce the control signaling overhead to support ultra-lowlatency short packet communications [41].

B. Group-Structured Sparsity Estimation

Let M = 'H ∈ CK×N with ' as an unknown sparsediagonal matrix. The effective matrix M thus has a group-structured sparsity in rows of matrix M [16]. Then the linearmeasurement model (21) can be rewritten as

Y = QM + N . (22)

To induce the group row sparsity in M, we introduce thefollowing group norm, i.e., mixed ℓ1/ℓ2-norm [16]

R(M) :=K∑

i=1

∥mi∥2 , (23)

where M = [m1, . . . , mK ]H ∈ CK×N with mi ∈ CN . This isalso known as the group Lasso norm in statistics.

The associated group sparse matrix estimation problem forjoint active user identification and channel estimation can beformulated as following group norm regularized optimizationproblem:

P3 : M = arg minM∈CK×N

∥Y − QM∥2F + λR(M), (24)

where λ ≥ 0 is a user defined regularization parameter tocontrol the tradeoffs between quality of fit and sparsity. Giventhe estimate matrix M, the activity matrix can be recovered as' = diag(π1, . . . , πn) with πi = 1 if ∥mi∥2 ≥ γ0 for a smallenough γ0 (γ0 ≥ 0). The estimated |A|×N channel matrix forthe active users is thus given by M(A) with A = {i |πi = 1}.

Temporal sparsity for massive device connectivity and chan-nel estimation has been investigated in the frameworks ofBayesian compressive sensing [42], approximate messagepassing [43] and orthogonal matching pursuit [22]. However,most of the results rely on either the prior distribution infor-mation or sparsity level prior information of signal M, whichis impractical. For the group Lasso based formulation P3,we may either use the approach in [36] to estimate the sparsitylevel, thereby designing the optimal regularizer, or adopt thesparsity level independent approach to determine the regu-larizer [24]. Although problem P3 is convex and thus canbe solved in polynomial time, it is still critical to design thescalable convex algorithms in the scenario of massive devices.

Remark 1: If we assume that all the mobile users areactive, we can apply the proposed downlink channel estimationapproaches P1 and P2 to solve the uplink channel esti-mation problem. Specifically, for uplink channel estimation,the training matrices are sent by the mobile users, whilethe observations are received by the RRHs. However, inthis section, we further consider a more critical scenarioswith massive machine-type mobile devices with sporadic datatraffic, i.e., only a few mobile devices are active out of all the

devices. We thus propose a group sparse estimation approachP3 to simultaneously detect the user activity and estimate thechannel coefficients for the active mobile users.

VI. LARGE-SCALE CONVEX OPTIMIZATION VIA

OPERATOR SPLITTING THEORY

In this section, we present the generic first-order optimiza-tion algorithm based on the operator splitting method (i.e.,the ADMM algorithm) to solve the large-scale convex opti-mization problems P1, P2 and P3. This is achieved byreformulating the original problems into the convex programsand then the standard conic programming form, followed bythe operator splitting method to solve the transformed standardconic program. The presented large-scale first-order convexalgorithm enjoys both the capability of infeasibility detectionand the ability to scale to large problem sizes with parallelcomputing.

A. Conic Reformulation via Epigraph Form and Smith Form

In this subsection, we propose to transform the originalconvex optimization problems P1, P2, and P3 into stan-dard conic optimization form Pcone (29). For the convexproblems P1, P2 and P3, one may use the interior-pointmethod (a second-order method), which is implemented inmost advanced off-the-shelf solvers like SeDuMi [44]. Never-theless, the computational burden of the second-order methodmakes it inapplicable in dense Cloud-RAN. For instance, fora network with L = 100 single antenna RRHs and K = 100single-antenna MUs, the dimension of the channel matrix isd = 104 and the corresponding computational complexityfor computing problem P1, P2 or P3, grows cubicallywith dimension d using the interior-point method [26]. Tomake the estimation algorithm scalable to large problem sizes,we adopt the first-order method to solve it with modestaccuracy within a reasonable time. The general idea is tofirst reformulate the original problems into the second-ordercone programs (SOCPs) based on the principle of epigraphreformulation. Then we reformulate them as the standard coneprograms, which are then solved by the ADMM algorithmsolver SCS [24].

1) SOCP Formulation via Epigraph Form: We present thebasic ideas of transforming the original problems P1, P2and P3 into a standard conic optimization form. Specifically,we can rewrite the weighted matrix ℓ1-norm ∥W ◦ M∥1 inproblem P1 as

∥W ◦ M∥1 =N∑

n=1

K∑

k=1

Wnk(|Re(Mnk )| + |Im(Mnk )|), (25)

where W = [Wnk ] ∈ RN×K and M = [Mnk ] ∈ CN×K .Let xnk,1 = Re(Mnk ) and xnk,2 = Im(Mnk ), and basedon the principle of the epigraph reformulation for convexoptimization problem, we introduce slack variables tnk,1 andtnk,2 for the associated optimization variables xnk,1 and xnk,2,

Page 9: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

LIU et al.: MASSIVE CSI ACQUISITION FOR DENSE CLOUD-RANs WITH SPATIAL-TEMPORAL DYNAMICS 2565

respectively. We thus reformulate problem P1 as

minimizeM,t

N∑

n=1

K∑

k=1

Wnk(tnk,1 + tnk,2)

subject to ∥Y [τ ] − Q[τ ]M∥F ≤ ϵ,

|xnk,1| ≤ tnk,1, |xnk,2| ≤ tnk,2, (26)

which is then rewritten as the following form:

minimizeM,t

N∑

n=1

K∑

k=1

Wnk(tnk,1 + tnk,2)

subject to ∥Y [τ ] − Q[τ ]M∥F ≤ ϵ,

− t ≼ x ≼ t, (27)

where t ∈ R2N K , x = [Re(Mnk), Im(Mnk)] ∈ R2N K , and ≼represents componentwise inequality. Similarly, problem P2is recast as the following SOCP problem:

minimizeM,t,u,s

N∑

n=1

K∑

k=1

Wnk(tnk,1 + tnk,2) + λs

subject to ∥Y [τ ] − Q[τ ]M∥F ≤ ϵ,

∥M − H[τ ]∥F ≤ u,

∥(1 − s, 2u)∥2 ≤ 1 + s,

− t ≼ x ≼ t, (28)

where u ∈ R and s ∈ R. Following the same idea in [45],problem P3 can also be equivalently reformulated as theSOCP. Please refer to [24, Sec. 6.2] for the SOCP refor-mulation with the special case of individual sparsity Lasso,i.e., R(m) = ∥m∥1 with m ∈ CK×1 for the single receiverantenna case.

2) Conic Formulation via Smith Form: Based on the Smithform reformulation [27] supported by the software CVX fordisciplined convex programming [46], all the SOCP formula-tions, e.g., (27) and (28), can be easily and automatically trans-form in turn into the following equivalent conic optimizationform:

Pcone : minimizeν,µ

cT ν

subject to Aν + µ = b

(ν,µ) ∈ Rn × C, (29)

where ν ∈ Rn and µ ∈ Rm are the optimization variables,C = {0}r × Sm1 × · · · × Smq with S p as the standard second-order cone of dimension p,

S p = {(y, z) ∈ R × Rp−1|∥z∥ ≤ y}, (30)

and S1 is defined as the cone of nonnegative reals, i.e., R+.Here, each S i has dimension mi such that (r +∑q

i=1 mi ) = m,A ∈ Rm×n , b ∈ Rm , c ∈ Rn . The equivalence means thatthe optimal solutions or the certificates of infeasibility of theoriginal problem can be extracted from the solutions of theequivalent cone program Pcone.

TABLE II

SIMULATION PARAMETERS

B. Large-Scale Conic Optimization via Operator Splitting

In this subsection, we adopt the operator splitting techniqueto solve the large-scale transformed conic optimization prob-lem Pcon (29) in parallel. To unify the capability of detectinginfeasibility and computing the optimal solutions into a singlesystem, the homogeneous self-dual embedding was proposedby adding extra variables into the Karush-Kuhn-Tucker (KKT)conditions [24] for the primal-dual programs of Pcone. Thesecond-order algorithms based on the interior-point methodwere then implemented in the well developed software pack-ages, e.g., SDPT3 [47], to solve the homogeneous self-dual embedding automatically. However, the second-orderalgorithms are still computationally expensive and thus arenot computationally feasible in dense Cloud-RANs. We thuspresent a novel first-order algorithm software package SCSbased on the principle of operator splitting method, i.e., theADMM algorithm [24], to solve the large-scale homogeneousself-dual embedding. The final iterative algorithm is presentedas follows [24], [27]:

x[i+1] = (I + J)−1(x[i] + y[i]) (31)

x[i+1] = (V (x[i+1] − y[i]) (32)

y[i+1] = y[i] − x[i+1] + x[i+1], (33)

where (V (x) denotes the Euclidean projection of x onto theconvex set V with V = Rn × C∗ × R+ and C∗ being thedual cone of C, x = [ν, η, τ ] ∈ Rm+n+1, y = [λ,µ, κ] ∈Rm+n+1 with τ ≥ 0 and κ ≥ 0 being variables to encodedifferent outcomes including the certificate of infeasibility andoptimal solutions, x is the replicating variables of x. Here,the (m + n + 1) × (m + n + 1) data matrix J is given by

J =⎡

⎣0 AT c

−A 0 b−cT −bT 0

⎦. (34)

The subspace projection (31) can be efficiently computedusing cached factorization approach [24], while the coneprojection (32) can be computed in parallel with closed-forms using the proximal algorithms [48, Sec. 6]. In thispaper, we shall use the numeric-based modeling frameworkCVX [46] to automatically transform the SOCP problems,e.g., (27) or (28), to the standard conic program form Pcone,and then call the first-order optimization solver SCS to solveit efficiently.

Page 10: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

2566 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 17, NO. 4, APRIL 2018

VII. SIMULATION RESULTS

In this section, we simulate the proposed high-dimensionalstructured channel estimation algorithms in dense Cloud-RANs. We consider the following channel model [3]

hkl [τ ] = 10−L(dkl )/20√ϕkl skl︸ ︷︷ ︸Dkl

gkl [τ ], ∀τ ∈ J , (35)

where L(dkl ) is the pathloss at distance dkl in kilometer, skl isthe shadowing coefficient, ϕkl is the antenna gain, gkl is thesmall-scale fading coefficient and Dkl is the associated large-scale fading coefficient. We adopt the standard cellular channelparameters as shown in Table II. To model the channel timevariations (5) and (15), we adopt the model in [13] and [14]such that the corresponding innovation process is given byvkl [τ ] =

√1 − η2 Dklνkl [τ ] with νkl [τ ] ∼ CN (0Nl , I Nl ), and

C = η I N with η = J0(2π fDκ) as the temporal correlationcoefficient based on the Jakes’ model, where J0 is the 0-thorder Bessel function of the first kind, fD denotes the maxi-mum Doppler frequency and κ represents the channel instan-tiation interval. We set η = 0.9881 based on the parametersprovided in [14]. In this section, we assume that all the randomvariables, including small-scale fading coefficients, large-scalefading coefficients, innovation process and observation noiseare mutually independently, due to the large distance betweendifferent antennas in Cloud-RANs. We use the unitary trainingmatrix with equal power p ≥ 0 per training symbol, i.e.,

Q[τ ] ∈{

U ∈ CT ×N , UUH = p · I T

}, ∀τ ∈ J , (36)

which is optimal in i.i.d. Rayleigh fading channels [14].We generate the unitary matrix from the orthogonal groupusing the method described in [49]. All the optimizationproblems P1, P2 and P3 are computed using the modelingframework CVX [46] with the large-scale convex optimizationsolver SCS [24]. The performance metric is given by the meansquared error [14], i.e., MSE = E[∥H − H∥2

F ]/(N K ).

A. Channel Estimation With Spatial Prior Information

Consider a network with L = 50 single-antenna RRHs andK = 30 single-antenna MUs uniformly and independently dis-tributed in the square region [−5000, 5000] × [−5000, 5000]meters. Each point of the simulation results is averaged over104 randomly generated network realizations (i.e., one smallscaling fading and observation noise realization for each large-scale fading realization). The performance of different channelestimation algorithms with different available prior informa-tion assumptions is illustrated in Fig. 2 with the training lengthas T = 10. Transmit SNR is defined as the transmit power atper transmit antenna of per training symbol over the noisepower, and “10% D” means that there are only 10% largestlarge-scale fading coefficients available.

From this figure, we can see that the gap between theproposed estimator P1 and the MMSE estimator (whichrequires the Gaussian distribution assumption) is very small.In particular, estimator P1 with only 10% dominated large-scale fading coefficients can achieve almost the same perfor-mance as the one with full large-scale fading coefficients. This

Fig. 2. Mean-squared error versus transmit SNR with T = 10.

Fig. 3. Mean-squared error versus transmit SNR with T = 30.

substantially reduces the acquisition overhead of spatial priorinformation. We emphasize that the proposed estimator P1does not require the Gaussian distribution assumption for allthe variables in the system, but it requires the bound of theobservation noise as a prior.

Furthermore, compared with the compressed sensing (CS)based channel estimation algorithm [17] (i.e., estimator P1with equal weights, so it only exploits the channel sparsitywithout any prior information on the large-scale fading coef-ficients), it is clearly demonstrated that such approach doesnot work well in dense Cloud-RANs, while the estimationperformance can be improved at the price of exploiting areasonable amount of prior information with the proposedmethod. The main reason for the poor performance of the CSbased estimator is that the channel spatial sparsity assumptionin Cloud-RANs is still questionable [19]. Similar conclusioncan be obtained with the training length T = 30 as shownin Fig. 3. In this case, the performance loss of the CS-basedestimator is smaller. Besides, Fig. 4 demonstrates that settingthe noise bound in the estimator P1 as the noise variance canachieve good performance.

Page 11: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

LIU et al.: MASSIVE CSI ACQUISITION FOR DENSE CLOUD-RANs WITH SPATIAL-TEMPORAL DYNAMICS 2567

Fig. 4. Mean-squared error versus noise bound for each entry (the underlyingnoise variance is 1 for each entry).

B. Channel Estimation With Spatial-Temporal PriorInformation

Consider a network with L = 25 2-antennas RRHs and K =5 single-antenna MUs uniformly and independently distributedin the square region [−10000, 10000] × [−10000, 10000]meters. The training time is set to be T = 5 and J is setto be 10 such that the spatial and temporal statistics keep thesame during 10 blocks. The transmit SNR is fixed and set to be0 dB. Each point of the simulation results in Fig. 5 is averagedover 104 randomly generated network realizations (i.e., onesmall scaling fading, observation noise and innovation processrealization for each large-scale fading realization).

For both the proposed estimators P1 and P2, from Fig. 5,we can see that knowing only the 10% dominated large-scale fading coefficients is enough to achieve the perfor-mance with the one with full large-scale fading coefficients.In particular, we set λ = 1 for estimator P2 to balance thespatial and temporal prior information. Note that, in principle,the performance of estimator P2 can be further improved bychoosing an optimal parameter λ. However, this is not trivialand we leave it as our future work. From our experiments,in this simulated setting, we observe that λ = 1 yields goodperformance via cross validation.

Fig. 5 shows that the estimation performance can be sig-nificantly improved by further exploiting the temporal priorinformation for both the Kalman filter and the proposedestimator P2. In particular, as the block index increases,the performance improves, i.e., the estimation error decreases.The main reason is that both algorithms can exploit all theprevious observations to make a good guess for the currentchannel. For the temporal prior information, the main differ-ence between the proposed estimator P2 and Kalman filter isthat the former only needs to know the temporal correlationcoefficient η, while the Kalman filter further requires theinnovation process covariance and the Gaussian distributionassumption for the state-space system. It is observed that, com-pared with the Kalman filter, estimator P2 can achieve goodestimation performance with reduced spatial and temporal

Fig. 5. Mean-squared error versus fading block index.

Fig. 6. Normalized mean-squared error versus the number of mobile devices.

prior information, i.e., it only requires 10% dominated large-scale fading coefficients, the temporal correlation coefficientη and a bound for the observation noise ϵ.

C. High-Dimensional Uplink Channel Estimation forMassive Device Connectivity

We consider a simple scenario with one single-antenna RRHwith K single-antenna mobile devices, in which the numberof active users is s = 0.05K and active users are generatedat random. The pilot length T is set to be 2s log(K/s) (theoptimal pilot length as obtained in [15] via convex geometry)for uplink training. For estimator P3, the channel matrix H ,pilot matrix Q and additive noise matrix N are generatedindependently from the distributions H = [hi j ] with hi j ∼CN (0, 1), Q = [qi j ] with qi j ∼ CN (0, 1), and N = [ni j ] withni j ∼ CN (0, σ 2) and σ = 0.1. We choose λ = 0.1λmax for allproblem instances with λmax = ∥ QHY∥∞ being the smallestvalue of λ such that the solution to problem P3 is zero [24].The normalized mean squared error (NMSE) is defined asNMSE = ∥M−M0∥2

F/∥M0∥2F with M0 = 'H as the ground

truth. Each point of the simulation results is averaged over

Page 12: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

2568 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 17, NO. 4, APRIL 2018

Fig. 7. Computation time versus the number of mobile devices.

Fig. 8. The estimation error versus number of quantization bits.

50 generated network realizations, i.e., H , Q and N . We com-pute the estimator P3 using CVX+SDPT3 and CVX+SCS,where the solver SDPT3 is based on the second-order algo-rithm, i.e., interior-point method, while SCS is the ADMMbased first-order algorithm solver presented in Section VI.

Fig. 6 shows that the proposed estimator P3 significantlyoutperforms the conventional least-square estimator [10] (com-puted via CVX+SDPT3), i.e., λ = 0. This indicates thatit is critical to exploit the temporal sparsity of the activemobile devices to improve the performance for uplink channelestimation. Furthermore, both the second-order algorithm andthe first-order algorithm achieve the same performance, whilethe first-order algorithm significantly reduces the computationtime as demonstrated in Fig. 7, and thus it scales well to largenetwork sizes.

Fig. 8 further investigates the estimation performance ofthe estimator P3 with quantization errors in the observations.Specifically, let K = 200 and other settings being the sameas in Fig. 6 and Fig. 7. Each entry yi in the observationvector is quantized into the interval [−10, 10] for the realpart and imaginary part accordingly, which is divided into 2q

sub-intervals with equal length with q as the number of quan-tization bits. Therefore, with the quantizer Qi , the quantizednoise corrupted measurements are zi = Qi (yi ). Based on thequantized observations zi , we apply estimator P3 to performuplink channel estimation. From Fig. 8, we see that, as thenumber of quantization bits increase, the estimator approachesthe performance of the unquantized case, and 4 quantizationbits are sufficient for this case.

VIII. CONCLUSIONS

In this paper, we proposed an efficient and flexible high-dimensional channel estimation framework for dense Cloud-RANs by exploiting spatial and temporal prior information.Novel structured regularizers and scalable first-order algo-rithms were presented to demonstrated the modeling flexibilityand algorithmic efficiency of the proposed framework. Thesimulation results showed that the proposed high-dimensionalchannel estimation algorithms with substantially reduced priorinformation can achieve almost the same performance asthe ones with full spatial and temporal prior information.In particular, it was demonstrated that the temporal sparsityplays a key role to provide radio access for massive mobiledevices.

Several future research directions are listed as follows:• While the convex formulations achieve good performance

in simulations, it is interesting to characterize the fun-damental estimation performance with the given priorinformation, thereby demonstrating the feasibility androbustness of high-dimensional channel estimation.

• To further speed up the present ADMM algorithms forreal-time implementation of the convex high-dimensionalestimation framework, it is interesting to adopt the deepneural networks to learn and approximate the algorithms,thereby improving the convergence rates [50].

• It is also interesting to develop the nonconvex regularizersto further encode more structured prior information andestablish statistical and algorithmic guarantees [51].

REFERENCES

[1] X. Liu, Y. Shi, J. Zhang, and K. B. Letaief, “Massive CSI acquisition indense cloud-RAN with spatial and temporal prior information,” in Proc.IEEE Int. Conf. Commun. (ICC), May 2017, pp. 1–6.

[2] K. Chen and R. Duan, “C-RAN the road towards green RAN,” ChinaMobile Res. Inst., Beijing, China, White Paper, 2011, vol. 2.

[3] Y. Shi, J. Zhang, and K. B. Letaief, “Group sparse beamforming forgreen cloud-RAN,” IEEE Trans. Wireless Commun., vol. 13, no. 5,pp. 2809–2823, May 2014.

[4] M. Peng, Y. Sun, X. Li, Z. Mao, and C. Wang, “Recent advancesin cloud radio access networks: system architectures, key techniques,and open issues,” IEEE Commun. Surveys Tuts., vol. 18, no. 3,pp. 2282–2308, 3rd Quart., 2016.

[5] Y. Shi, J. Zhang, K. B. Letaief, B. Bai, and W. Chen, “Large-scale convexoptimization for ultra-dense cloud-RAN,” IEEE Wireless Commun.,vol. 22, no. 3, pp. 84–91, Jun. 2015.

[6] H. Huh, A. M. Tulino, and G. Caire, “Network MIMO with linear zero-forcing beamforming: Large system analysis, impact of channel esti-mation, and reduced-complexity scheduling,” IEEE Trans. Inf. Theory,vol. 58, no. 5, pp. 2911–2934, May 2012.

[7] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran, “MultiuserMIMO achievable rates with downlink training and channel statefeedback,” IEEE Trans. Inf. Theory, vol. 56, no. 6, pp. 2845–2866,Jun. 2010.

Page 13: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

LIU et al.: MASSIVE CSI ACQUISITION FOR DENSE CLOUD-RANs WITH SPATIAL-TEMPORAL DYNAMICS 2569

[8] Y. Shi, J. Zhang, and K. B. Letaief, “Scalable coordinated beam-forming for dense wireless cooperative networks,” in Proc. IEEEGlobal Commun. Conf. (GLOBECOM), Austin, TX, USA, Dec. 2014,pp. 3603–3608.

[9] Y. Shi, J. Zhang, and K. B. Letaief, “Optimal stochastic coordinatedbeamforming for wireless cooperative networks with CSI uncertainty,”IEEE Trans. Signal Process., vol. 63, no. 4, pp. 960–973, Feb. 2015.

[10] B. Hassibi and B. M. Hochwald, “How much training is needed inmultiple-antenna wireless links?” IEEE Trans. Inf. Theory, vol. 49, no. 4,pp. 951–963, Apr. 2003.

[11] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation.Upper Saddle River, NJ, USA: Prentice-Hall, 2000.

[12] E. Björnson and B. Ottersten, “A framework for training-basedestimation in arbitrarily correlated Rician MIMO channels withRician disturbance,” IEEE Trans. Signal Process., vol. 58, no. 3,pp. 1807–1820, Mar. 2010.

[13] S. Noh, M. D. Zoltowski, Y. Sung, and D. J. Love, “Pilot beam patterndesign for channel estimation in massive MIMO systems,” IEEE J. Sel.Topics Signal Process., vol. 8, no. 5, pp. 787–801, Oct. 2014.

[14] J. Choi, D. J. Love, and P. Bidigare, “Downlink training techniques forFDD massive MIMO systems: Open-loop and closed-loop training withmemory,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 802–814,Oct. 2014.

[15] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “Theconvex geometry of linear inverse problems,” Found. Comput. Math.,vol. 12, pp. 805–849, Dec. 2012.

[16] M. J. Wainwright, “Structured regularizers for high-dimensional prob-lems: Statistical and computational issues,” Annu. Rev. Statist. Appl.,vol. 1, pp. 233–253, Jan. 2014.

[17] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressedchannel sensing: A new approach to estimating sparse multipath chan-nels,” Proc. IEEE, vol. 98, no. 6, pp. 1058–1076, Jun. 2010.

[18] J.-C. Shen, J. Zhang, E. Alsusa, and K. B. Letaief, “Compressed CSIacquisition in FDD massive MIMO: How much training is needed?”IEEE Trans. Wireless Commun., vol. 15, no. 6, pp. 4145–4156,Jun. 2016.

[19] E. Björnson, E. G. Larsson, and T. L. Marzetta, “Massive MIMO: Tenmyths and one critical question,” IEEE Commun. Mag., vol. 54, no. 2,pp. 114–123, Feb. 2016.

[20] Y. Shi, J. Zhang, and K. B. Letaief, “Low-rank matrix completionfor topological interference management by Riemannian pursuit,” IEEETrans. Wireless Commun., vol. 15, no. 7, pp. 4703–4717, Jul. 2016.

[21] S. A. Jafar, “Topological interference management through index cod-ing,” IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 529–568, Jan. 2014.

[22] G. Wunder, H. Boche, T. Strohmer, and P. Jung, “Sparse signal process-ing concepts for efficient 5G system design,” IEEE Access, vol. 3,pp. 195–208, 2015.

[23] S. N. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu, “A unifiedframework for high-dimensional analysis of M-estimators with decom-posable regularizers,” Statist. Sci., vol. 27, no. 4, pp. 538–557, 2012.

[24] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd, “Conic optimizationvia operator splitting and homogeneous self-dual embedding,” J. Optim.Theory Appl., vol. 169, no. 3, pp. 1042–1068, 2016.

[25] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributedoptimization and statistical learning via the alternating direction methodof multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122,Jan. 2011.

[26] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.:Cambridge Univ. Press, 2004.

[27] Y. Shi, J. Zhang, B. O’Donoghue, and K. B. Letaief, “Large-scaleconvex optimization for dense wireless cooperative networks,” IEEETrans. Signal Process., vol. 63, no. 18, pp. 4729–4743, Sep. 2015.

[28] T. Q. S. Quek, M. Peng, O. Simeone, and W. Yu, Cloud Radio AccessNetworks: Principles, Technologies, and Applications. Cambridge, U.K.:Cambridge Univ. Press, 2017.

[29] Y. Plan and R. Vershynin, “The generalized Lasso with non-linearobservations,” IEEE Trans. Inf. Theory, vol. 62, no. 3, pp. 1528–1537,Mar. 2016.

[30] S. A. Bhaskar, “Localization from connectivity: A 1-bit maximum likeli-hood approach,” IEEE/ACM Trans. Netw., vol. 24, no. 5, pp. 2939–2953,Oct. 2016.

[31] R. T. Rockafellar, Convex Analysis, vol. 28. Princeton, NJ, USA:Princeton univ. Press, 1997.

[32] V. Chandrasekaran and M. I. Jordan, “Computational and statisticaltradeoffs via convex relaxation,” Proc. Nat. Acad. Sci. USA, vol. 110,no. 13, pp. E1181–E1190, 2013.

[33] A. Das and B. D. Rao, “SNR and noise variance estimation for MIMOsystems,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 3929–3941,Aug. 2012.

[34] S.-H. Lee and S.-Y. Chung, “Capacity scaling of wireless ad hocnetworks: Shannon meets Maxwell,” IEEE Trans. Inf. Theory, vol. 58,no. 3, pp. 1702–1715, Mar. 2012.

[35] X. Rao and V. K. N. Lau, “Distributed compressive CSIT estimationand feedback for FDD multi-user massive MIMO systems,” IEEE Trans.Signal Process., vol. 62, no. 12, pp. 3261–3271, Jun. 2014.

[36] M. E. Lopes, “Unknown sparsity in compressed sensing: Denoising andinference,” IEEE Trans. Inf. Theory, vol. 62, no. 9, pp. 5145–5166,Sep. 2016.

[37] I. Dokmanic, R. Parhizkar, J. Ranieri, and M. Vetterli, “Euclideandistance matrices: Essential theory, algorithms, and applications,” IEEESignal Process. Mag., vol. 32, no. 6, pp. 12–30, Nov. 2015.

[38] P. Netrapalli, P. Jain, and S. Sanghavi, “Phase retrieval using alter-nating minimization,” IEEE Trans. Signal Process., vol. 63, no. 18,pp. 4814–4826, Sep. 2015.

[39] S. Ling and T. Strohmer, “Self-calibration and biconvex compressivesensing,” Inverse Problems, vol. 31, no. 11, p. 115002, 2015.

[40] A. Pananjady, M. J. Wainwright, and T. A. Courtade. (2016). “Linearregression with an unknown permutation: Statistical and computationallimits.” [Online]. Available: https://arxiv.org/abs/1608.02902

[41] G. Durisi, T. Koch, and P. Popovski, “Toward massive, ultrareliable, andlow-latency wireless communication with short packets,” Proc. IEEE,vol. 104, no. 9, pp. 1711–1726, Aug. 2016.

[42] X. Xu, X. Rao, and V. K. N. Lau, “Active user detection and channelestimation in uplink CRAN systems,” in Proc. IEEE Int. Conf. Com-mun. (ICC), London, U.K., Jun. 2015, pp. 2727–2732.

[43] G. Hannak, M. Mayer, A. Jung, G. Matz, and N. Goertz, “Joint channelestimation and activity detection for multiuser communication systems,”in Proc. IEEE Int. Conf. Commun. Workshop (ICCW), London, U.K.,Jun. 2015, pp. 2086–2091.

[44] J. F. Sturm, “Using SeDuMi 1.02, a MATLAB toolbox for optimizationover symmetric cones,” Optim. Methods Softw., vol. 11, nos. 1–4,pp. 625–653, 1999.

[45] M. S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret, “Applicationsof second-order cone programming,” Linear Algebra Appl., vol. 284,nos. 1–3, pp. 193–228, Nov. 1998.

[46] CVX: MATLAB Software for Disciplined Convex Programming, Ver-sion 2.0 (Beta), CVX Research, Inc., Austin, TX, USA, 2013.

[47] K.-C. Toh, M. J. Todd, and R. H. Tütüncü, “SDPT3—A MATLABsoftware package for semidefinite programming, version 1.3,” Optim.Methods Softw., vol. 11, nos. 1–4, pp. 545–581, 1999.

[48] N. Parikh and S. Boyd, “Proximal algorithms,” Found. Trends Optim.,vol. 1, no. 3, pp. 127–239, 2014.

[49] F. Mezzadri, “How to generate random matrices from the classicalcompact groups,” Notices AMS, vol. 54, pp. 592–604, Sep. 2007.

[50] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-net for compressivesensing MRI,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Barcelona,Spain, 2016, pp. 10–18.

[51] P.-L. Loh and M. J. Wainwright, “Regularized M-estimators withnonconvexity: Statistical and algorithmic theory for local optima,”J. Mach. Learn. Res., vol. 16, pp. 559–616, Mar. 2015.

Xuan Liu received the B.S. degree in telecommu-nication from Jilin University, Changchun, China,in 2000, the M.S. degree in electronic and computerengineering from The Hong Kong University ofScience and Technology, Hong Kong, in 2012, andthe Ph.D. degree in electronic and information engi-neering from The University of Sydney, Australia,in 2017. She is currently a Post-Doctoral Fellow withthe Department of Electronic and Computer Engi-neering, The Hong Kong University of Science andTechnology. Her research interests include network

information theory, large-scale optimization, big data analysis, and Internet ofThings.

Page 14: IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. …shiyuanming.github.io/papers/Journal/massivecsi_twc18.pdf · information (CSI) [6]. In particular, to provide transmitter-side

2570 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 17, NO. 4, APRIL 2018

Yuanming Shi (S’13–M’15) received the B.S.degree in electronic engineering from Tsinghua Uni-versity, Beijing, China, in 2011, and the Ph.D. degreein electronic and computer engineering from TheHong Kong University of Science and Technology,in 2015. Since 2015, he has been with the School ofInformation Science and Technology, ShanghaiTechUniversity, as a tenure-track Assistant Professor.He visited the University of California at Berkeley,Berkeley, CA, USA, from 2016 to 2017. He wasa recipient of the 2016 IEEE Marconi Prize Paper

Award in wireless communications and the 2016 Young Author Best PaperAward by the IEEE Signal Processing Society. He was involved in opti-mization theory and algorithms, deep and machine learning, high-dimensionalprobability and statistics, with applications in intelligent IoT, large-scale dataanalysis, and distributed edge computing.

Jun Zhang (S’06–M’10–SM’15) received theB.Eng. degree in electronic engineering from theUniversity of Science and Technology of Chinain 2004, the M.Phil. degree in information engi-neering from The Chinese University of Hong Kongin 2006, and the Ph.D. degree in electrical and com-puter engineering from The University of Texas atAustin in 2009. He is currently a Research AssistantProfessor with the Department of Electronic andComputer Engineering, The Hong Kong Universityof Science and Technology. He has co-authored the

book Fundamentals of LTE (Prentice-Hall, 2010). His research interestsinclude dense wireless cooperative networks, mobile edge computing, cloudcomputing, and big data analytics systems.

Dr. Zhang was a co-recipient of the 2016 Marconi Prize Paper Awardin wireless communications, the 2014 Best Paper Award for the EURASIPJournal on Advances in Signal Processing, the IEEE GLOBECOM BestPaper Award in 2017, the IEEE ICC Best Paper Award in 2016, and theIEEE PIMRC Best Paper Award in 2014. A paper he co-authored receivedthe 2016 Young Author Best Paper Award of the IEEE Signal ProcessingSociety. He was also a recipient of the 2016 IEEE ComSoc Asia-PacificBest Young Researcher Award. He frequently serves on the technical programcommittees of major IEEE conferences in wireless communications, such asICC, Globecom, WCNC, and VTC. He served as a MAC Track Co-Chairfor the IEEE WCNC 2011. He is an Editor of the IEEE TRANSACTIONS ONWIRELESS COMMUNICATIONS, and was a Guest Editor of the special sectionon Mobile Edge Computing for Wireless Networks in the IEEE ACCESS.

Khaled B. Letaief (S’85–M’86–SM’97–F’03)received the B.S. (Hons.), M.S., and Ph.D. degreesin electrical engineering from Purdue University,West Lafayette, IN, USA, in 1984, 1986, and 1990,respectively. From 1990 to 1993, he was a Fac-ulty Member with The University of Melbourne,Melbourne, VIC, Australia. Since 1993, he has beenwith The Hong Kong University of Science andTechnology (HKUST), where he is known as oneof HKUST’s most distinguished professors. Whileat HKUST, he has held numerous administrative

positions, including the Head of the Electronic and Computer EngineeringDepartment, the Director of the Huawei Innovation Laboratory, and theDirector of the Hong Kong Telecom Institute of Information Technology.While at HKUST, he has also served as the Chair Professor and the Dean ofEngineering. Under his leadership, the School has also dazzled in internationalrankings (rising from No. 26 in 2009 to 14 in the world in 2015 accordingto QS World University Rankings). In 2015, he joined Hamad Bin KhalifaUniversity as a Provost to help establish a research-intensive university inQatar in partnership with esteemed universities that include Northwestern Uni-versity, Cornell University, and Texas A&M. He has authored over 580 journaland conference papers and given keynote talks and courses all over theworld. He has 15 patents, including 11 U.S. patents. He is an internationallyrecognized leader in wireless communications with research interest in greencommunications, Internet of Things, Cloud-RANs, and 5G systems. He servedas a consultant for different organizations, including Huawei, ASTRI, ZTE,Nortel, PricewaterhouseCoopers, and Motorola. In addition to his activeresearch and professional activities, he has been a dedicated teacher committedto excellence in teaching and scholarship. He is a fellow of HKIE. He receivedthe Michael G. Gale Medal for Distinguished Teaching (highest university-wide teaching award and only one recipient/year is honored for his/hercontributions). He was also a recipient of many other distinguished awards andhonors, including the 2007 IEEE Joseph LoCicero Publications ExemplaryAward, the 2009 IEEE Marconi Prize Award in wireless communications,the 2010 Purdue University Outstanding Electrical and Computer EngineerAward, the 2011 IEEE Harold Sobol Award, the 2016 IEEE Marconi PrizePaper Award in wireless communications, and over 11 IEEE best paperawards. He is recognized as a long-time volunteer with dedicated serviceto professional societies and in particular IEEE, where he has served in manyleadership positions including as Treasurer, Vice President for Conferences,and Vice President for Technical Activities of the IEEE CommunicationsSociety. He is also recognized by Thomson Reuters as an ISI Highly CitedResearcher (which places him among the top 250 preeminent individualresearchers in the field of computer science and engineering) and is currentlythe President of the IEEE Communications Society in 2018. He has servedon the editorial board of other prestigious journals. He is the founding Editor-in-Chief of the IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS.