end-to-end learning from spectrum data: a deep …1 end-to-end learning from spectrum data: a deep...

16
1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications Merima Kulin * , Tarik Kazaz , Student Member, IEEE, Ingrid Moerman * and Eli de Poorter * , Member, IEEE Abstract—This paper presents end-to-end learning from spec- trum data - an umbrella term for new sophisticated wireless signal identification approaches in spectrum monitoring applica- tions based on deep neural networks. End-to-end learning allows to (i) automatically learn features directly from simple wireless signal representations, without requiring design of hand-crafted expert features like higher order cyclic moments, and (ii) train wireless signal classifiers in one end-to-end step which eliminates the need for complex multi-stage machine learning processing pipelines. The purpose of this article is to present the conceptual framework of end-to-end learning for spectrum monitoring and systematically introduce a generic methodology to easily design and implement wireless signal classifiers. Furthermore, we investigate the importance of the choice of wireless data representation to various spectrum monitoring tasks. In partic- ular, two case studies are elaborated (i) modulation recognition and (ii) wireless technology interference detection. For each case study three convolutional neural networks are evaluated for the following wireless signal representations: temporal IQ data, the amplitude/phase representation and the frequency domain representation. From our analysis we prove that the wireless data representation impacts the accuracy depending on the specifics and similarities of the wireless signals that need to be differen- tiated, with different data representations resulting in accuracy variations of up to 29%. Experimental results show that using the amplitude/phase representation for recognizing modulation formats can lead to performance improvements up to 2% and 12% for medium to high SNR compared to IQ and frequency domain data, respectively. For the task of detecting interference, frequency domain representation outperformed amplitude/phase and IQ data representation up to 20%. Index Terms—Big Spectrum data, Spectrum monitoring, End- to-End learning, Deep learning, Convolutional Neural Networks, Wireless Signal Identification, IoT. I. I NTRODUCTION W IRELESS networks are currently experiencing a dra- matic evolution. Some trends observed are the increas- ing number and diversity of wireless devices, with an in- creasing spectrum demand. Unfortunately, the radio frequency spectrum is a scarce resource. As a result, particular parts of the spectrum are used heavily whereas other parts are vastly underutilized [1]. For example, the unlicensed bands are extremely overutilized and suffer from cross-technology interference [2]. It is indisputable that monitoring and understanding the spectrum resource usage will become a critical asset for 5G in order to improve and regulate the radio spectrum utilization. However, monitoring the spectrum use in such a complex Corresponding author: Merima Kulin (e-mail: [email protected]) wireless system requires distributed sensing over a wide frequency range, resulting in a radio spectrum data deluge [3]. Extracting meaningful information about the spectrum usage from massive and complex spectrum datasets requires sophisticated and advanced algorithms. This paves the way for new innovative spectrum access schemes and the development of novel identification mechanisms that will provide awareness about the radio environment. For instance, technology identi- fication, modulation type recognition and interference source detection are essential for interference mitigation strategies to continue effective use of the scarce spectral resources and enable the coexistence of heterogeneous wireless networks. In this paper, we investigate end-to-end learning from spec- trum data as a unified approach to tackle various challenges related to the problems of inefficient spectrum management, utilization and regulation that the next generation of wireless networks is facing. Whether the goal is to recognize a technol- ogy or a particular modulation type, identify the interference source or an interference-free frequency channel, we argue that the various problems may be treated as a generic problem type that we refer to as wireless signal identification, which is a natural target for machine learning classification techniques. In particular, end-to-end learning refers to processing archi- tectures where the entire pipeline, connecting the input (i.e the data representation of a sensed wireless signal) to the desired output (i.e. the predicted type of signal), is learned purely from data [4]. This setting simplifies the overall system design and completely eliminates the need for designing expert features such as higher order cyclic moments, and results in accurate wireless signal classifiers. A. Scope and Contributions This paper provides a comprehensive introduction to end- to-end learning from spectrum data. The main contributions of this paper are as follows: Potential end-to-end learning use cases for spectrum monitoring are identified. In particular, two categories are presented. The first category are use cases where detecting spectral opportunities and spectrum sharing is necessary such as in cognitive radio and emerging cognitive IoT networks. The second, are scenarios where detecting radio emitters is needed such as in spectrum regulation. To set a preliminary background on this interdisciplinary topic a brief introduction to machine learning/deep learn- ing is provided and their role for spectrum monitoring is arXiv:1712.03987v1 [cs.NI] 11 Dec 2017

Upload: others

Post on 29-Mar-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

1

End-to-end Learning from Spectrum Data:A Deep Learning approach for Wireless Signal

Identification in Spectrum Monitoring applicationsMerima Kulin∗, Tarik Kazaz†, Student Member, IEEE, Ingrid Moerman∗ and Eli de Poorter∗, Member, IEEE

Abstract—This paper presents end-to-end learning from spec-trum data - an umbrella term for new sophisticated wirelesssignal identification approaches in spectrum monitoring applica-tions based on deep neural networks. End-to-end learning allowsto (i) automatically learn features directly from simple wirelesssignal representations, without requiring design of hand-craftedexpert features like higher order cyclic moments, and (ii) trainwireless signal classifiers in one end-to-end step which eliminatesthe need for complex multi-stage machine learning processingpipelines. The purpose of this article is to present the conceptualframework of end-to-end learning for spectrum monitoringand systematically introduce a generic methodology to easilydesign and implement wireless signal classifiers. Furthermore,we investigate the importance of the choice of wireless datarepresentation to various spectrum monitoring tasks. In partic-ular, two case studies are elaborated (i) modulation recognitionand (ii) wireless technology interference detection. For each casestudy three convolutional neural networks are evaluated forthe following wireless signal representations: temporal IQ data,the amplitude/phase representation and the frequency domainrepresentation. From our analysis we prove that the wireless datarepresentation impacts the accuracy depending on the specificsand similarities of the wireless signals that need to be differen-tiated, with different data representations resulting in accuracyvariations of up to 29%. Experimental results show that usingthe amplitude/phase representation for recognizing modulationformats can lead to performance improvements up to 2% and12% for medium to high SNR compared to IQ and frequencydomain data, respectively. For the task of detecting interference,frequency domain representation outperformed amplitude/phaseand IQ data representation up to 20%.

Index Terms—Big Spectrum data, Spectrum monitoring, End-to-End learning, Deep learning, Convolutional Neural Networks,Wireless Signal Identification, IoT.

I. INTRODUCTION

W IRELESS networks are currently experiencing a dra-matic evolution. Some trends observed are the increas-

ing number and diversity of wireless devices, with an in-creasing spectrum demand. Unfortunately, the radio frequencyspectrum is a scarce resource. As a result, particular partsof the spectrum are used heavily whereas other parts arevastly underutilized [1]. For example, the unlicensed bandsare extremely overutilized and suffer from cross-technologyinterference [2].

It is indisputable that monitoring and understanding thespectrum resource usage will become a critical asset for 5G inorder to improve and regulate the radio spectrum utilization.However, monitoring the spectrum use in such a complex

Corresponding author: Merima Kulin (e-mail: [email protected])

wireless system requires distributed sensing over a widefrequency range, resulting in a radio spectrum data deluge[3]. Extracting meaningful information about the spectrumusage from massive and complex spectrum datasets requiressophisticated and advanced algorithms. This paves the way fornew innovative spectrum access schemes and the developmentof novel identification mechanisms that will provide awarenessabout the radio environment. For instance, technology identi-fication, modulation type recognition and interference sourcedetection are essential for interference mitigation strategiesto continue effective use of the scarce spectral resources andenable the coexistence of heterogeneous wireless networks.

In this paper, we investigate end-to-end learning from spec-trum data as a unified approach to tackle various challengesrelated to the problems of inefficient spectrum management,utilization and regulation that the next generation of wirelessnetworks is facing. Whether the goal is to recognize a technol-ogy or a particular modulation type, identify the interferencesource or an interference-free frequency channel, we argue thatthe various problems may be treated as a generic problem typethat we refer to as wireless signal identification, which is anatural target for machine learning classification techniques.In particular, end-to-end learning refers to processing archi-tectures where the entire pipeline, connecting the input (i.e thedata representation of a sensed wireless signal) to the desiredoutput (i.e. the predicted type of signal), is learned purely fromdata [4]. This setting simplifies the overall system design andcompletely eliminates the need for designing expert featuressuch as higher order cyclic moments, and results in accuratewireless signal classifiers.

A. Scope and ContributionsThis paper provides a comprehensive introduction to end-

to-end learning from spectrum data. The main contributionsof this paper are as follows:• Potential end-to-end learning use cases for spectrum

monitoring are identified. In particular, two categoriesare presented. The first category are use cases wheredetecting spectral opportunities and spectrum sharingis necessary such as in cognitive radio and emergingcognitive IoT networks. The second, are scenarios wheredetecting radio emitters is needed such as in spectrumregulation.

• To set a preliminary background on this interdisciplinarytopic a brief introduction to machine learning/deep learn-ing is provided and their role for spectrum monitoring is

arX

iv:1

712.

0398

7v1

[cs

.NI]

11

Dec

201

7

Page 2: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

2

discussed. Then, a reference model for deep learning forspectrum monitoring applications is defined.

• A conceptual framework for end-to-end learning is pro-posed, followed by a comprehensive overview of themethodology for collecting spectrum data, designingwireless signal representations, forming training data andtraining deep neural networks for wireless signal classi-fication tasks.

• To demonstrate the approach, experiments are carriedout for two case studies: (i) modulation recognitionand (ii) wireless technology interference detection, thatdemonstrate the impact of the choice of wireless datarepresentation on the presented results. For modulationrecognition, the following modulation techniques areconsidered: BPSK (binary phase shift keying), QPSK(quadrature phase shift keying), m-PSK (phase shift key-ing, for m = 8), m-QAM (quadrature amplitude modu-lation, for m = 16 and 64), CPFSK (continuous phasefrequency shift keying), GFSK (Gaussian frequency shiftkeying) and m-PAM (pulse amplitude modulation form = 4). For wireless technology identification, threerepresentative technologies operating in the unlicensedbands are analysed: IEEE 802.11b/g, IEEE 802.15.4 andIEEE 802.15.1.

The rest of the paper is organized as follows. The remain-der of Section I presents related work. Section II presentsmotivating scenarios for the proposed approach. Section IIIintroduces basic concepts related to machine learning/deeplearning concluded with a high-level processing pipeline fortheir application to spectrum monitoring scenarios. Section IVpresents the end-to-end learning methodology for wireless sig-nal classification. In Section V the methodology is applied totwo scenarios and experimental results are discussed. SectionVI discusses open challenges related to the implementationand deployment of future end-to-end spectrum managementsystems. Section VII concludes the paper.

B. Related work

Traditional signal identification. Previous research effortsin wireless communication related to signal identificationare dominantly based on signal processing tools for com-munication [5] such as cyclostationary feature detection [6],sometimes in combination with traditional machine learningtechniques [7] (e.g. support vector machines (SVM), decisiontrees, k-nearest neighbors (k-NN), neural networks, etc.). Thedesign of these specialized solutions have proven to be time-demanding as they typically rely on manual extraction ofexpert features for which a significant amount of domainknowledge and engineering is required.

Deep learning for signal classification. Motivated byrecent advances and the remarkable success of deep learning,especially convolutional neural networks (CNN), in a broadrange of problems such as image recognition, speech recog-nition and machine translation [8], wireless communicationengineers recently used similar approaches to improve onthe state of the art in signal identification tasks in wirelessnetworks. One of the pioneers in the domain were the authors

of [9], who demonstrated that CNNs trained on time domainIQ data significantly outperform traditional approaches forautomatic modulation recognition based on expert featuressuch as cyclic-moment based features, and conventional clas-sifiers such as decision trees, k-NNs, SVMs, NN and NaiveBayes. Selim et al. [10] propose to use amplitude and phasedifference data to train CNN classifiers able to detect thepresence of radar signals with high accuracy. Akeret at al. [11]propose a novel technique to accurately detect radio frequencyinterference in radio astronomy by training a CNN on 2D timedomain data acquired from a radio telescope. The authors of[12] propose a novel method for interference identification inunlicensed bands using CNNs trained on frequency domaindata [12]. Several wireless technologies (e.g. DVB, GSM,LTE...) have been classified with high accuracy in [13] usingdeep learning on averaged magnitude FFT data.

These individual works focus on specific deep learningapplications pertaining to wireless signal classification usingparticular data representations. They do not provide a detailedmethodology necessary to understand how to apply the sameapproach to other potential use cases, neither they providesufficient information as a guide for selecting a wireless datarepresentations. This information is necessary for someoneaiming to reproduce existing attempts, build upon it or togenerate new application ideas.

Deep learning for wireless networks. Recently, the au-thors of [14] provided an overview of the state-of-the artand potential future deep learning applications in wirelesscommunication. The authors of [15] propose a unified deeplearning framework for mobile sensing data. However, none ofthese studies focuses on spectrum monitoring scenarios and theunderlying data models for training wireless signal classifiers.

To remedy these shortcomings, this paper presents end-to-end learning from spectrum data: a deep learning frameworkfor solving various wireless signal classification problems forspectrum monitoring applications in a unified manner. To thebest of our knowledge, this article is the first comprehensivework that elaborates in detail the methodology for (i) collect-ing, transforming and representing spectrum data, (ii) design-ing and implementing data-driven deep learning classifiers forwireless signal identification problems, and that (iii) looks atseveral data representations for different classification prob-lems at once. The technical approach depicted in this paper isdeeply interdisciplinary and systematic, calling for the synergyof expertise of computer scientists, wireless communicationengineers, signal processing and machine learning experts withthe ultimate aim of breaking new ground and raising awarenessof this emerging interdisciplinary research area. Finally, thispaper is at an opportune time, when (i) recent advances inthe field of machine learning, (ii) computational advances andparallelization used to speed up training and (iii) efforts inmaking large amounts of spectrum data available, have pavedthe way for novel spectrum monitoring solutions.

Notation and terminology. We indicate a scalar-valuedvariable with normal font letters (i.e. x or X). Matrices will bedenoted using bold capitals such as X. Vectors will be denotedwith a bold lower case letter (i.e. x), which may sometimesappear as row or column vectors of a matrix (i.e. xk is the

Page 3: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

3

k-th column vector). With xi and xij we will indicate theentries of x and X, respectively. The notation ()T denotes thetranspose of a matrix or vector, while ()∗ denotes complexconjugation. We indicate by ||x||p = (

∑N−1n=0 |xn|p)1/p the

lp-norm of vector x.

II. CHARACTERISTIC USE CASES FOR END-TO-ENDLEARNING FROM SPECTRUM DATA

End-to-end learning from spectrum data is a new approachthat can automatically learn features directly from simplewireless signal representations, without requiring design ofhand-crafted expert features like higher order cyclic moments.The term end-to-end refers to the fact that the learningprocedure can train wireless signal classifiers in one end-to-end step which eliminates the need for complex multi-stageexpert machine learning processing pipelines.

Before diving deep into the concept of end-to-end learningfrom spectrum data, we first consider the architecture pre-sented on Figure 1 with two motivating scenarios that illustratecharacteristic use-cases for the presented approach.

Detecting spectral opportunities & Spectrum Sharing1) Cognitive radio

The ever-increasing radio spectrum demand combined withthe currently dominant fixed spectrum policy assignment[16], have inspired the concepts of cognitive radio (CR) anddynamic spectrum access (DSA) aiming to improve radiospectrum utilization. A CR network (CRN) is an intelligentwireless communication system that is aware of its radioenvironment, i.e. spectral opportunities, and can intelligentlyadapt its operating parameters by interacting and learning fromthe environment [17]. In this way, the CRN can infer thespectrum occupancy to identify unoccupied frequency bands(white spaces/spectrum holes) and share them with licensedusers (primary users (PU)) in an opportunistic manner [18].

Figure 1 a) shows the basic operational process of a data-driven CRN. First, CR users intermittently sense its surround-ing radio environment and report their sensing results viaa control channel to a nearby base station (BS). Then, theBS forwards the request to a back-end data center (DC),which combines the crowdsourced sensing information fromseveral CR users into a spectrum map. The DC infers thespectrum use in order to determine the presence of PUs(a characteristic wireless signal) and diffuses the spectrumavailability information back to the cognitive users. For thispurpose, the DC first learns a CNN model offline based on thesensing reports, and then employs the model to discriminatebetween a spectrum hole and an occupied frequency channel.

2) Cognitive IoTThe Internet of Things (IoT) paradigm envisioned a world

of ”always connected” devices/objects/things to the Internet[19]. In this world, heterogeneous wireless technologies andstandards emerge operating in the unlicensed frequency bands,which puts enormous pressure on the available spectrum.The increasing wireless spectrum demand rises several com-munication challenges such as co-existence, cross-technologyinterference and scarcity of interference-free spectrum bands[2], [20]. To address these challenges, recent research work

Wireless spectrum map

I&Q samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . Signal processing Spectrum learning

Signal identification Spectrum decision

Cloud spectrum monitoring

Internet

S1 S2 S3

b) a)

CR

PU1

PU2 PU3 PU4

CR-IoT CR-IoT

CNN

BS1 BS2

DC

Fig. 1. Data-driven CNN-based flexible spectrum management framework

proposed a CR-based IoT [21], [22] to enable dynamic spec-trum sharing among heterogeneous wireless networks.

Figure 1 a) depicts this situation. It can be seen thatCR-IoT devices are equipped with cognitive functionalitiesallowing them to search for interference-free spectrum bandsand accordingly reconfigure their transmission parameters.First, CR-IoT devices send spectrum sensing reports to a CNN-based DC. Then, the DC learns and estimates the presence ofother emitters and uses that information to detect interferencesources and interference-free channels. This enables smart andeffective interference mitigation and spectrum managementstrategies for co-existence with CR and legacy technologiesand modulation types.

Spectrum management policy and regulationSpectrum regulatory bodies continuously monitor the ra-

dio frequency spectrum use to prevent users from harmfulinterference and allow optimum use thereof [23]. Interferencemay be a result of unauthorized emissions, electromagneticinterference (EMI) and devices that operate beyond techni-cal specifications. In order to resolve problems associatedwith wireless interference, spectrum managers traditionallyuse a combination of engineering analysis and data obtainedfrom spectrum measurements. However, in the era of today’s”wireless abundance”, where various services and wirelesstechnologies share the same frequency bands, the identificationof unauthorized transmitters can be very difficult to achieve.More intelligent algorithms are needed that can automaticallymine the spectrum data and identify interference sources.

Figure 1 b) presents a CNN-based spectrum managementframework for spectrum regulation. Deployed sensor devices,e.g. {S1, S2, S3}, collect spectrum measurements and con-tribute their observations to a DC to create interference maps.The DC uses signal processing techniques together with aCNN model to mine the obtained spectrum data and identifyexisting interferers. The mined patterns are key for ensuringcompliance with national and international spectrum manage-ment regulations.

Page 4: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

4

III. THE ROLE OF DEEP LEARNING IN SPECTRUMMONITORING

There are two goals of this section. The first is to introducethe key ideas underlying machine learning/deep learning.The second is to derive a reference model for machinelearning/deep learning applications for spectrum monitoring,management and spectrum regulation.

A. Machine Learning

Machine learning (ML) refers to a set of algorithms thatlearn a statistical model from historical data. The obtainedmodel is data-driven rather then explicitly derived usingdomain knowledge.

1) Preliminaries: The goal of ML is to find a mathematicalfunction, f , that defines the relation between a set of inputsX , and a set of outputs Y , i.e.

f : X → Y (1)

The inputs, X ∈ Rmxn, present a number of distinct datapoints, samples or observations denoted as

X =

x1Tx2T

...xmT

(2)

where m is the sample size, while xi ∈ Rn is a vector ofn measurements or features for the ith observation called afeature vector,

xi = [xi1, xi2, ..., xin]T , i = 1, ...,m (3)

The outputs, y ∈ Rm, are all the outcomes, labels or targetvalues corresponding to the m inputs xi, denoted by

y = [y1, y2, ..., ym]T (4)

Then the observed data consists of m input-output pairs,called the training data or training set, S,

S = {(x1, y1), (x2, y2), ..., (xm, ym)} (5)

Each pair (xi, yi) is called a training example because it isused to train or teach the learning algorithm how to obtain f .

In machine learning, f is called the predictor whose task isto predict the outcome yi based on the input values of xi. Thereare two classical data models depending on the prediction type,described by:

f(x) =

{regressor: if y ∈ Rclassifier: if y ∈ {0, 1}

In short, when the output variable y is continuous or quanti-tative, the learning problem is a regression problem. But, if ypredicts a discrete or categorical value, it is a classificationproblem.

2) Learning the model: Given a training set, S, the goalof a machine learning algorithm is to learn the mathematicalmodel for f . To make sense of this task, we assume there existsa fixed but unknown distribution, p(x, y) = pX(x)p(y|x),according to which the data sample is identically and in-dependently distributed (i.i.d). Here, pX(x) is the marginaldistribution that models the uncertainty in the sampling of theinput points, while p(y|x) is the conditional distribution thatdescribes the statistical relation between the input and output.

Thus, f is some fixed but unknown function that defines therelation between X and Y . The depicted ML algorithm deter-mines the functional form or shape. The unknown function fis estimated by applying the selected learning method to thetraining data, S, so that f is a good estimator for new unseendata, i.e.

y ≈ y = f(xnew) (6)

The predictor f is parametrized by a vector θ ∈ Rn, anddescribes a parametric model. In this setup, the problem ofestimating f reduces down to one of estimating the parametersθ = [θ1, θ2, ..., θn]

T . In most practical applications, theobserved data are corrupted versions of the expected valuesthat would be obtained under ideal circumstances. Theseunavoidable corruptions, typically termed noise, prevent theextraction of true parameters from the observations. With thisin regard, the generic data model may be expressed as

y = f(x) + ε (7)

where f(x) is the model and ε are additive measurement errorsand other discrepancies. The goal of ML is to find the input-output relation that will ”best” match the noisy observations.Hence, the vector θ may be estimated by solving a (convex)optimization problem. First, a loss or cost function l(x, y,θ)is set, which is a (point-wise) measure of the error betweenthe observed data point yi and the model prediction f(xi) foreach value of θ. However, θ is estimated on the whole trainingdata, S, not just one example. For this task, the average lossover all training examples called training loss, J , is calculated:

J(θ) ≡ J(S,θ) = 1

m

∑(xi,yi)∈S

l(xi, yi,θ) (8)

where S indicates that the error is calculated on the instancesfrom the training set and i = 1, ...,m. The vector θ thatminimizes the training loss J(θ), that is

argminθ∈Rn

J(θ) (9)

will give the desired model. Once the model is estimated, forany given input x, the prediction for y can be made with y =θT x.

In engineering parlance, the process of estimating the pa-rameters of a model that is a mapping between input andoutput observations is called system identification. Systemidentification or ML classification techniques are well suitedfor wireless signal identification problems.

Page 5: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

5

B. Deep Learning

The prediction accuracy of ML models heavily dependson the choice of the data representation or features usedfor training. For that reason, much effort in designing MLmodels goes into the composition of pre-processing and datatransformation chains that result in a representation of the datathat can support effective ML predictions. Informally, this isreferred to as feature engineering. Feature engineering is theprocess of extracting, combining and manipulating featuresby taking advantage of human ingenuity and prior expertknowledge to arrive at more representative ones, that is

φ(d) : d→ x (10)

i.e. the feature extractor φ transforms the data vector d ∈ Rdinto a new form, x ∈ Rn, more suitable for making predic-tions. The importance of feature engineering highlights thebottleneck of machine learning algorithms: their inability toautomatically extract the discriminative information from data.

Feature learning is a branch of machine learning that movesthe concept of learning from ”learning the model” to ”learningthe features”. One popular feature learning method is deeplearning. In particular, this paper focuses on convolutionalneural networks (CNN).

Convolutional neural networks perform feature learning vianon-linear transformations implemented as a series of nestedlayers. The input data is a multidimensional data array, calledtensor, that is presented at the visible layer. This is typicallya grid-like topological structure, e.g. time-series data, whichcan be seen as a 1D grid taking samples at regular timeintervals, pixels in images with a 2D layout, a 3D structureof videos, etc. Then a series of hidden layers extract severalabstract features. Those layers are ”hidden” because theirvalues are not given. Instead, the deep learning model mustdetermine which data representations are useful for explainingthe relationships in the observed data. Each layer consistsof several kernels that perform a convolution over the input;therefore, they are also referred to as convolutional layers.Kernels are feature detectors, that convolve over the input andproduce a transformed version of the data at the output. Thoseare banks of finite impulse response filters as seen in signalprocessing, just learned on a hierarchy of layers. The filters areusually multidimensional arrays of parameters that are learntby the learning algorithm [24] through a training process calledbackpropagation.

For instance, given a two-dimensional input x, a two-dimensional kernel h computes the 2D convolution by

(x ∗ h)i,j = x[i, j] ∗ h[i, j] =∑n

∑m

x[n,m] · h[i− n][j −m]

(11)i.e. the dot product between their weights and a small regionthey are connected to in the input.

After the convolution, a bias term is added and a point-wise nonlinearity g is applied, forming a feature map at thefilter output. If we denote the l-th feature map at a givenconvolutional layer as hl, whose filters are determined by the

coefficients or weights Wl, the input x and the bias bl, thenthe feature map hl is obtained as follows

hli,j = g((W l ∗ x)ij + bl) (12)

where ∗ is the 2D convolution defined by Equation 11,while g(·) is the activation function. Typically, the rectifieractivation function is used for CNNs, which is defined byg(x) = max(0, x). Kernels using the rectifier are called ReLU(Rectified Linear Unit) and have shown to greatly acceleratethe convergence during the training process compared to otheractivation functions. Others common activation functions arethe hyperbolic tangent function (tanh), g(x) = 2

1+e−2x − 1,and the sigmoid activation g(x) = 1

1+e−x .In order to form a richer representation of the input signal,

commonly, multiple filters are stacked so that each hiddenlayer consists of multiple feature maps, {h(l), l = 0, ..., L}(e.g., L = 64, 128, ..., etc). The number of filters per layeris a tunable parameter or hyper-parameter. Other tunableparameters are the filter size, the number of layers, etc. Theselection of values for hyper-parameters may be quite difficult,and finding it commonly is much an art as it is science. Anoptimal choice may only be feasible by trial and error. Thefilter sizes are selected according to the input data size so as tohave the right level of granularity that can create abstractionsat the proper scale. For instance, for a 2D square matrix input,such as spectrograms, common choices are 3x3, 5x5, 9x9, etc.For a wide matrix, such as a real-valued representation of thecomplex I and Q samples of the wireless signal in R2xN ,suitable filter sizes may be 1x3, 2x3, 2x5, etc.

The penultimate layer in a CNN consists of neurons thatare fully-connected with all feature maps in the precedinglayer. Therefore, these layers are called fully-connected ordense layers. The very last layer is a softmax classifier, whichcomputes the posterior probability of each class label over Kclasses as

yi =ezi∑Kj=1 e

zj, i = 1, ...,K (13)

That is, the scores zi computed at the output layer, also calledlogits, are translated into probabilities. A loss function, l, iscalculated on the last fully-connected layer that measures thedifference between the estimated probabilities, yi, and the one-hot encoding of the true class labels, yi. The CNN parameters,θ, are obtained by minimizing the loss function on the trainingset {xi, yi}i∈S of size m,

minθ

∑i∈S

l(yi, yi) (14)

where l(.) is typically the mean squared error l(y, y) = ‖y −y‖22 or the categorical cross-entropy l(y, y) =

∑mi=1 yilog(yi)

for which a minus sign is often added in front to get thenegative log-likelihood.

To control over-fitting, typically regularization is used incombination with dropout, which is a new extremely effectivetechnique that ”drops out” a random set of activations in alayer. Each unit is retained with a fixed probability p, typicallychosen using a validation set, or set to 0.5 which has shownto be close to optimal for a wide range of applications [25].

Page 6: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

6

Data acquisition Pre-processing Classification Decision

𝑟𝑟𝑘𝑘 𝑥𝑥𝑘𝑘 𝑦𝑦�

ML frequency [MHz]

Deep learning

I(t)

Q(t)

SP Convolutional

layers

.

.

.

Output signals A/D

backoff

Fig. 2. Processing pipeline for end-to-end learning from spectrum data

C. Deep Learning from spectrum data

Intelligence capabilities will be of paramount importancein the development of future wireless communication systemsto allow them observe, learn and respond to its complex anddynamic operating environment. Figure 2 shows a processingpipeline for realizing intelligent behaviour using deep learningin an end-to-end learning from spectrum data setup. Thepipeline consists of:

Data acquisition. Data is a key asset in the design of futureintelligent wireless networks [26]. In order to obtain spectrumdata, the radio first senses its environment by collecting rawdata from various spectrum bands. The raw data consist ofn samples, stacked into data vectors rk which represent thecomplex envelope of the received wireless signal. These datavectors are the input for end-to-end learning to obtain modelsthat can reason about the presence of wireless signals.

Data pre-processing. Data pre-processing is concernedwith the analysis and manipulation of the collected spectrumdata with the aim to arrive at potentially good wireless datarepresentations. The raw samples organized into data vectorsrk in the previous block are pipelined as input for signalprocessing (SP) tools that analyze, process and transform thedata to arrive at simple data representations such as frequency,amplitude, phase and spectrum, or more complex featuresxk such as e.g. cyclostationary features. In addition, featurelearning such as deep learning may be utilized to automaticallyextract more low-level and high-level features. In many MLapplications the choice of features is just as important, if notmore important then the choice of the ML algorithm.

Classification. The ”Classification” processing block en-ables intelligence capabilities to asses the environmental radiocontext by detecting the presence of wireless signals. This maybe the type of the emitters that are utilizing the spectrum (spec-trum access scheme, modulation format, wireless technology,etc.), type of interference, detecting an available spectrumband, etc. We refer to this process as spectrum learning [27].In future wireless networks ML algorithms may play a key rolein automatically classifying wireless signals as a step towardsintelligent spectrum access and management schemes.

Decision. The predictions calculated by the ML model areused as input for the decision module. In a CR application, adecision may be related to the best transmission strategy (e.g.frequency band or transmission power) that will maximizethe data rate without causing interference to other users. Thisprocess is called spectrum decision [18]. In the context of

CR-IoTs, the decision may relate to an interference mitigationstrategy such as back-off for a certain time period. In othercommunication scenarios such as spectrum regulation, thedecision may relate to a spectrum policy or spectrum com-pliance enforcement applied to a detected source of harmfulinterference (e.g. fake GSM tower, rouge access point, etc.).

IV. DATA-DRIVEN END-TO-END LEARNING FOR WIRELESSSIGNAL CLASSIFICATION

The next generation (5G) wireless networks are expected tolearn the diverse characteristics of the dynamically changingwireless environment and fluctuating nature of the availablespectrum, so as to autonomously determine the optimal systemconfiguration or to support spectrum regulation.

This section introduces a data-driven end-to-end learningframework for spectrum monitoring applications in future 5Gnetworks. First, the representation of wireless signals used indigital communication and a data model for wireless signalacquisition is introduced. Then, a data model for extractingfeatures, creating training data and designing wireless signalclassifiers is presented. In particular, deep learning is used forextracting low-level and higher level wireless signal featuresand for wireless signal classification.

A. Wireless signal model

A wireless communication system transmits informationfrom one point to another though a wireless medium which iscalled a channel. At the system level, a wireless communica-tion model consists of the following parts:

Transmitter. The transmitter transforms the message, i.e. astream of bits, produced by the source of information into anappropriate form for transmission over the wireless channel.Figure 3 shows the processing chain at the transmitter side.First, the bits bk ∈ {0, 1} are mapped into a new binarysequence by a coding technique. The resulting sequence ismapped to symbols sk from an alphabet or constellation whichmight be real or complex. This process is called modulation.

In the modulation step, the created symbols are mapped toa discrete waveform or signal via a pulse shaping filter andsent to the digital to analog converter module (D/A) wherethe waveform is transformed into an analog continuous timesignal, sb(t). The resulting signal is a baseband signal that is

Page 7: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

7

𝑠𝑠 𝑡𝑡 = ℛ𝑒𝑒 𝑠𝑠𝑏𝑏(𝑡𝑡)𝑒𝑒𝑗𝑗𝑗𝜋𝜋𝑓𝑓𝑐𝑐𝑡𝑡

radio channel

radio channel

Transmitter side

ML Receiver side

coding modulation

𝑒𝑒𝑗𝑗𝑗𝜋𝜋𝑓𝑓𝑐𝑐𝑡𝑡

Digital

D/A

𝑒𝑒𝑗𝑗𝑗𝜋𝜋𝑓𝑓𝑐𝑐′𝑡𝑡

A/D

Digital

𝒓𝒓𝑘𝑘

Cognitive processing chain

Window

𝐴𝐴/𝜃𝜃 𝒙𝒙𝑘𝑘

𝑦𝑦𝑘𝑘

(𝒙𝒙𝑘𝑘 , 𝑦𝑦𝑘𝑘)

-

r 𝑡𝑡

𝒃𝒃𝒌𝒌 𝒔𝒔𝒌𝒌 𝑠𝑠𝑏𝑏(𝑡𝑡)

ℎ(𝑡𝑡, 𝜏𝜏)

ℎ(𝑡𝑡, 𝜏𝜏)

ML ⊕ 𝑟𝑟𝑏𝑏 𝑡𝑡

Fig. 3. End-to-end learning processing chain to obtain radio spectrum feature vectors

frequency shifted by the carrier frequency fc to produce thewireless signal s(t) that is defined by

s(t) = <{sb(t)ej2πfct} =<{sb(t)} cos(2πfct)−={sb(t)} sin(2πfct) (15)

where s(t) is a real-valued bandpass signal with centerfrequency fc, while sb(t) = <{sb(t)} + j={sb(t)} is thebaseband complex envelope of s(t).

Wireless channel. The wireless channel is characterisedby the variations of the channel strength over time and overfrequency. The variations are modeled as (i) large-scale fading,which characterizes the path loss of the channel as a functionof distance and shadowing by large objects such as buildingsand hills, and (ii) small-scale fading, which models construc-tive and destructive interference of the multiple propagationpaths between the transmitter and receiver. The channel effectscan be modeled as a linear time-varying system described bya complex finite impulse response (FIR) filter h(t, τ). If r(t)is the signal at the channel output, the input/output relation isgiven by:

r(t) = s(t) ∗ h(t, τ) (16)

where h(t, τ) is the band-limited bandpass channel impulseresponse, while ∗ denotes the convolution operation.

Receiver. The wireless signal at the receiver output will bea corrupted version of the transmitted signal due to channelimpairments and hardware imperfections of the transmitter andreceiver. Typical hardware related impairments are:• Noise caused by the resistive components such as the

receiver antenna. This thermal noise may be modelled asadditive white Gaussian noise (AWGN), n ∼ N (0, σ2).

• Frequency offset caused by the slightly different localoscillator (LO) signal frequencies at the transmitter, fc,and receiver, fc′.

• Phase Noise, ϕ(t), caused by the frequency drift in theLOs used to demodulate the received wireless signal. Itcauses the angle of the LO signals to drift around itsintended instantaneous phase 2πfct.

• Timing drift caused by the difference in sample rates atthe receiver and transmitter.

The received wireless signal model can be given by r(t) =<{rb(t)ej2πfct}, where rb(t) is the baseband complex en-veloped defined by

rb(t) = (sb(t) ∗ hb(t, τ))1

2ej2π(fc−fc

′)t + n(t) (17)

where hb(t, τ) is the baseband channel equivalent given by

hb(t, τ) =l∑i=0

αi(t, τ)ej2πfcτi(t)+ϕi(t,τ)δ(τ − τi(t)) (18)

B. Data acquisition

To derive a machine learning model for wireless signalidentification, adequate training data needs to be collected.

Figure 3 summarizes the data acquisition process for collect-ing wireless signal features. The received signal, r(t), is firstamplified, mixed, low-pass filtered and then sent to the analogto digital (A/D) converter, which samples the continuous-timesignal at a rate fs = 1/Ts samples per second and generatesthe discrete version rn. The discrete signal rn = r[nTS ]consists of two components, the in-phase, rI , and quadraturecomponent, rQ, i.e.

rn := r[n] = rI [n] + jrQ[n] (19)

Suppose, we sample for a period T and collect a batch of Nsamples. The signal samples r[n] ∈ C , n = 0, ..., N−1, are a

Page 8: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

8

time-series of complex raw samples which may be representedas a data vector. The k-th data vector can be denoted as

rk = [r[0], ..., r[N − 1]]T (20)

These data vectors rk are windowed or segmented represen-tations of the received continuous sample stream, similarly asis seen in audio signal processing. They carry information forassessing which type of wireless signal is sensed. This maybe the type of modulation, the type of wireless technology,interferer, etc.

C. Wireless signal representation

After collecting the k-th data vector the ML receiver base-band processing chain transforms it into a new representationsuitable for training. That is, the k-th data vector rk ∈ CN istranslated into the k-th feature vector xk ∈ RN

rk 7→ xk (21)

This paper considers three simple data representations. Thefirst, is a real-valued equivalent of the raw complex temporalwireless signal inspired by the results in [9]. The second, isbased on the amplitude and phase of the raw wireless signal,similar to the one used in the work of Selim et al. [10]for identifying radar signals. The last is a frequency domainrepresentation inspired by the work of Danev et al. [28] whichshowed that frequency-based features outperform their time-based equivalents for wireless device identification. Each datarepresentation snapshot has a fixed length of N data points.

For each transformation data is visualized to form someintuition about which data representation may provide the mostdiscriminative features for machine learning. The followingdata/signal transformations are used:

Transformation 1 (IQ vector): The IQ vector is a mappingof the raw complex samples, i.e. data vector rk ∈ CN , intotwo sets of real-valued data vectors, one that carries the in-phase samples xi and one that holds the quadrature componentvalues xq . That is

xIQk =

[xiTxqT

](22)

so that xIQk ∈ R2xN . Mathematically, this may be written as

f : CN → R2xN

rk 7→ xIQk

Transformation 2 (A/φ vector): The A/φ vector is amapping from the raw complex data vector rk ∈ CN into tworeal-valued vectors, one that represents its phase, φ, and onethat represents its magnitude A, i.e.

xA/φk =

[xATxφT

](23)

where xA/φk ∈ R2xN , and the phase, xφ ∈ RN , and

magnitude vectors, xA ∈ RN , have the elements

xφn = arctan(rqnrin

) , xAn = (rq2n+ri

2n)

1/2, n = 0, ..., N−1

(24)

In short, this may be written as

f : CN → R2xN

rk 7→ xA/φk

0 20 40 60 80 100 120time [ s]

0.010

0.005

0.000

0.005

0.010

I/Q si

gnal

s

BPSKrealimag

(a) BPSK

0 20 40 60 80 100 120time [ s]

0.010

0.005

0.000

0.005

0.010

I/Q si

gnal

s

QPSKrealimag

(b) QPSK

0 20 40 60 80 100 120time [ s]

0.010

0.005

0.000

0.005

0.010

I/Q si

gnal

s

8PSKrealimag

(c) 8PSK

0 20 40 60 80 100 120time [ s]

0.010

0.005

0.000

0.005

0.010

I/Q si

gnal

s

QAM16realimag

(d) QAM16

0 20 40 60 80 100 120time [ s]

0.015

0.010

0.005

0.000

0.005

0.010

I/Q si

gnal

s

QAM64realimag

(e) QAM64

0 20 40 60 80 100 120time [ s]

0.008

0.006

0.004

0.002

0.000

0.002

0.004

0.006

0.008

I/Q si

gnal

s

CPFSKrealimag

(f) CPFSK

0 20 40 60 80 100 120time [ s]

0.008

0.006

0.004

0.002

0.000

0.002

0.004

0.006

0.008

I/Q si

gnal

s

GFSKrealimag

(g) GFSK

0 20 40 60 80 100 120time [ s]

0.015

0.010

0.005

0.000

0.005

0.010

I/Q si

gnal

s

PAM4realimag

(h) PAM4

Fig. 4. I and Q signals time plot for various modulation schemes

Transformation 3 (FFT vector): The FFT vector is a map-ping from the raw time-domain complex data vector rk ∈ CNinto its frequency-domain representation vector consisting oftwo sets of real-valued data vectors, one that carries the realcomponent of its complex FFT xFre

and one that holds theimaginary component of its FFT xFim . That is

xFk =

[xFre

T

xFimT

](25)

The translation to frequency-domain is performed by a FastFourier Transform (FFT) denoted by F so that

F : rk 7→ wxFre

= <{w}xFim

= ={w}

Here, w ∈ CN , xFre, xFim

∈ RN while <{.} and ={.} canbe conceived as operators giving the real and imaginary partsof a complex vector, respectively. Thus, the resulting FFTvector is xFk ∈ R2xN . In short, this may be denoted as

f : CN → R2xN

rk 7→ xFk

Figures 4, 5 and 6 visualize examples of IQ, A/φ andFFT feature vectors, respectively. The visualizations show rep-resentations for different modulation formats passed througha channel model with impairments as described in IV-A.These are examples of 128 samples for modulation formats

Page 9: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

9

0.00750.00500.00250.00000.00250.00500.0075In-Phase

0.01

0.00

0.01

Quad

ratu

re

0 20 40 60 80 100 120time [ s]

0.005

0.010

Ampl

itude

BPSK

0 20 40 60 80 100 120time [ s]

2

0

2

Phas

e (ra

d)

(a) BPSK

0.010 0.005 0.000 0.005 0.010 0.015In-Phase

0.01

0.00

0.01

Quad

ratu

re

0 20 40 60 80 100 120time [ s]

0.0000

0.0025

0.0050

0.0075

0.0100

Ampl

itude

QPSK

0 20 40 60 80 100 120time [ s]

2

0

2

Phas

e (ra

d)

(b) QPSK

0.010 0.005 0.000 0.005 0.010 0.015In-Phase

0.01

0.00

0.01

Quad

ratu

re

0 20 40 60 80 100 120time [ s]

0.0025

0.0050

0.0075

0.0100

Ampl

itude

8PSK

0 20 40 60 80 100 120time [ s]

2

0

2

Phas

e (ra

d)(c) 8PSK

0.015 0.010 0.005 0.000 0.005 0.010In-Phase

0.01

0.00

0.01

Quad

ratu

re

0 20 40 60 80 100 120time [ s]

0.005

0.010

Ampl

itude

QAM16

0 20 40 60 80 100 120time [ s]

2

0

2

Phas

e (ra

d)

(d) QAM16

0.015 0.010 0.005 0.000 0.005 0.010 0.015In-Phase

0.02

0.01

0.00

0.01

Quad

ratu

re

0 20 40 60 80 100 120time [ s]

0.005

0.010

0.015

Ampl

itude

QAM64

0 20 40 60 80 100 120time [ s]

2

0

2

Phas

e (ra

d)

(e) QAM64

0.010 0.005 0.000 0.005 0.010In-Phase

0.01

0.00

0.01

Quad

ratu

re

0 20 40 60 80 100 120time [ s]

0.00725

0.00750

0.00775

0.00800

Ampl

itude

CPFSK

0 20 40 60 80 100 120time [ s]

2

0

2

Phas

e (ra

d)

(f) CPFSK

0.0100 0.0075 0.0050 0.00250.0000 0.0025In-Phase

0.01

0.00

0.01

Quad

ratu

re

0 20 40 60 80 100 120time [ s]

0.00750

0.00775

0.00800

0.00825

Ampl

itude

GFSK

0 20 40 60 80 100 120time [ s]

2

0

2

Phas

e (ra

d)

(g) GFSK

0.006 0.004 0.0020.000 0.002 0.004 0.006In-Phase

0.02

0.01

0.00

0.01

Quad

ratu

re

0 20 40 60 80 100 120time [ s]

0.005

0.010

0.015

Ampl

itude

PAM4

0 20 40 60 80 100 120time [ s]

2

0

2

Phas

e (ra

d)

(h) PAM4

Fig. 5. Constellation diagram, Amplitude and Phase signal time plot forvarious modulation schemes

depicted from the ”RadioML Modulation” dataset introducedin Section V-A. Figure 4 shows xIQk time plots of theraw sampled complex signal at the receiver for differentmodulation types. Figure 5 shows the amplitude and phasetime plots for modulation format examples. Figure 6 showstheir frequency magnitude spectrum. It can be seen that thesignals are corrupted due to the wireless channel effects andtransmitter-receiver synchronization imperfections, but thereare still distinctive patterns that can be used for deep learningto extract high level features for wireless signal identification.

The motivation behind using these three transformations isto train three deep learning models where: one will explorethe raw data to discover the patterns and temporal features

0.4 0.2 0.0 0.2 0.4frequency (MHz)

120

110

100

90

80

70

60

50

|X(f)

| (dB

)

BPSK

(a) BPSK

0.4 0.2 0.0 0.2 0.4frequency (MHz)

85

80

75

70

65

60

55

50

|X(f)

| (dB

)

QPSK

(b) QPSK

0.4 0.2 0.0 0.2 0.4frequency (MHz)

90

80

70

60

50

|X(f)

| (dB

)

8PSK

(c) 8PSK

0.4 0.2 0.0 0.2 0.4frequency (MHz)

90

80

70

60

50

|X(f)

| (dB

)

QAM16

(d) QAM16

0.4 0.2 0.0 0.2 0.4frequency (MHz)

90

80

70

60

50

|X(f)

| (dB

)

QAM64

(e) QAM64

0.4 0.2 0.0 0.2 0.4frequency (MHz)

90

80

70

60

50

|X(f)

| (dB

)

CPFSK

(f) CPFSK

0.4 0.2 0.0 0.2 0.4frequency (MHz)

80

70

60

50

|X(f)

| (dB

)

GFSK

(g) GFSK

0.4 0.2 0.0 0.2 0.4frequency (MHz)

90

85

80

75

70

65

60

55

50

|X(f)

| (dB

)

PAM4

(h) PAM4

Fig. 6. Frequency magnitude spectrum for various modulation schemes

solely from raw samples, one will see the amplitude and phaseinformation in the time domain, while the third will see thefrequency domain representation to perform feature extractionin the frequency space. We investigate how the choice of datarepresentation influences the classification accuracy. The datarepresentations have been carefully designed so that all of themcreate a vector of the same dimension and type in R2xN . Thereason for that is to obtain a unified vector shape which willallow to use the same CNN architecture for training on allthree data representations and for different use cases.

D. Wireless signal classification

The problem of identifying the wireless signals from spec-trum data can be treated as a data-driven machine learningclassification problem. In order to apply ML techniques to thissetup, as described in Section III-A the wireless communica-tion problem has to be formulated as a parametric estimationproblem where certain parameters are unknown and need tobe estimated.

Given a set of K wireless signals to be detected, theproblem of identifying a signal from this set turns into aK-class classification problem. Suppose a data measurementpoint knows the transmitted signal type (e.g. modulation type,interfering emitter type, etc.) for a time period t = [0, T ) (i.e.a ”training period”) and collects several complex basebandtime series of n measurements for each signal type into a datavector rk, as described in Section IV-B. In total, m snapshotsfor the data vectors rk are collected. These data vectors containemitting signals that contain distinctive features. In order toextract these features, each data vector is transformed intoa feature vector, xk, according to the data transformationsintroduced in Section IV-C and the results are stacked into

Page 10: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

10

an observation matrix X ∈ Rmxn. Each data vector is furtherannotated with the corresponding wireless signal type in formof a discrete one-hot encoded vector yk ∈ RK , k = 1, ...,m.

The obtained data pairs, {(xk, yk), k = 1, ...,m}, form adataset suitable to estimate the parameters, θ, that characterizethe wireless signal classifier, f .

It is instructive to note that the training phase presumes aprior information about the type of wireless signal the wasused on the transmitter. However, once the classifier is trainedthis information will no longer be necessary and the signalsmay be automatically identified by the model. That is, forthe i-th spectrum data vector input, xi, the predictor’s lastlayer can automatically output an estimate of the probabilityP (yi = k|xi; θ), where k ranges from 0 to K − 1. That is ascore class. Finally, the predicted class is then the one withthe highest score, i.e. yi = argmax

kP (yi = k|xi; θ).

V. EVALUATION SETUP

To evaluate end-to-end learning from spectrum data, wetrain CNN wireless signal classifiers for two use cases: (i)Radio signal modulation recognition and (ii) Wireless interfer-ence identification, for different wireless data representations.

Radio signal modulation recognition relates to the prob-lem of identifying the modulation structure of the receivedwireless signal in spectrum monitoring tasks, as a step to-wards understanding what type of communication schemeand emitter is present. Modulation recognition is vital forradio spectrum regulation and in dynamic spectrum accessapplications.

Wireless interference identification is the task of identify-ing the type of coexisting wireless emitter, that is operating inthe same frequency band. This is essential for effective inter-ference mitigation and coexistence management in unlicensedfrequency bands such as, for example, the 2.4GHz ISM bandshared by heterogeneous wireless communication systems.

For each task the CNNs were trained on three characteristicdata representations: IQ vectors, Amplitude/Phase vectors andFFT vectors, as introduced in Section IV-C. As a result foreach task three datasets, S, one per data transformation arecreated. That is,

SIQ = {(xIQk , yk), k = 1, ...,m} (26)

SA/φ = {(xA/φk , yk), k = 1, ...,m} (27)

SF = {(xFk , yk), k = 1, ...,m} (28)

where m has the order of tens of thousands instances.

A. Datasets description

1) Radio Modulation recognition: To evaluate end-to-endlearning for radio modulation type identification, we considermeasurements of the received wireless signal for various mod-ulation formats from the ”RadioML 2016.10a Modulation”dataset [9]. Specifically, for all experiments performed inthis paper we used labelled data vectors for the followingdigital modulation formats: BPSK, QPSK, 8-PSK, 16-QAM,64-QAM, CPFSK, GFSK, 4-PAM, WBFM, AM-DSB, AM-SSB. The data vectors, xk, were collected at a sampling rate

TABLE ICNN STRUCTURE

Layer type Input size Parameters Activation function

Convolutionallayer 2x128

256 filters, fil-ter size 1x3,dropout=0.6

ReLU

Convolutionallayer 256x2x128

80 filters, fil-ter size 2x3,dropout=0.6

ReLU

Fully connectedlayer 10240x1 256 neurons,

dropout=0.6 ReLU

Fully connectedlayer 256x1 11 neurons or

15 neurons Softmax

1MS/s in N = 128 sample batches, each containing between8 and 16 symbols corrupted by random noise, time offset,phase, and wireless channel distortions as described by thechannel model in IV-A. One-hot encoding is used to create adiscrete set of 11 class labels corresponding to 11 consideredmodulations, so that the response variable forms a binary 11-vector yk ∈ R11. The task of modulation recognition is then a11-class classification problem. In total, 220,000 data vectorsxk ∈ R2x128 consisting of I and Q samples are used.

2) Wireless Interference identification in ISM bands: Therise of heterogeneous wireless technologies operating in theunlicensed ISM bands has caused severe communication chal-lenges due to cross-technology interference, which adverselyaffects the performance of wireless networks. To tackle thesechallenges novel agile methods that can assess the channelconditions are needed. We showcase end-to-end learning as apromising approach that can determine whether communica-tion is feasible over the wireless link by accurately identifyingcross-technology interference. Specifically, the ”Wireless inter-ference” dataset [12] is used which consists of measurementsgathered from standardized wireless communication systemsbased on IEEE 802.11b/g (WiFi), IEEE 802.15.4 (Zigbee)and IEEE 802.15.1 (Bluetooth) standards, operating in the2.4GHz frequency band. The dataset is labelled according tothe allocated frequency channel and the corresponding wirelesstechnology, resulting in 15 different classes. Compared to themodulation recognition dataset, this dataset consists of mea-surements gathered assuming a communication channel modelwith less channel impairments. In particular, a flat fadingchannel with additive white Gaussian noise was assumed. Iand Q samples were collected at a sampling rate 10MS/s inbatches of 128 each, capturing hereby 1 to 12 symbols foreach utilized wireless technology depending on the symbolduration. In total, 225,225 snapshots were collected.

B. CNN network structure

The convolutional neural network structure utilized for end-to-end learning from spectrum data is derived from O’Shea atal. [9], i.e the CNN2 network, as it has shown to significantlyoutperform traditional signal identification approaches.

Table I provides a summary of the utilized CNN network.The visible layer of the network has a unified size of 2x128receiving either IQ, FFT or Amplitude/Phase captured datavectors, xk ∈ R2x128, that contain sample values of the

Page 11: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

11

complex wireless signal. Two hidden convolutional layers fur-ther extract high-level features from the input wireless signalrepresentation using kernels and ReLU activation functions.The first convolutional layer consists of 256 stacked filters ofsize 1x3 that perform a 2D convolution on the input complexsignal representation padded such that the output has the samelength as the original input. These filters generate 256 (2x128)feature maps that are fed as input to the second layer whichhas 80 filters of size 2x3. To reduce overfitting, in each layerregularization is used with a Dropout p = 0.6. Finally, a fullyconnected layer with 256 neurons and ReLU units is added.The output of this layer is fed to a softmax classifier thatestimates the likelihood of the input signal, x, belonging to aparticular class, y. That is P (y = k|x; θ), where k is a one-hotencoded vector so that k ∈ R15 for the wireless interferenceidentification case, and k ∈ R11 for modulation recognition.

C. Implementation details

The CNNs were trained and validated using the Keras [29]library on a high computation platform on Amazon ElasticCompute (EC) Cloud with the central processing unit (CPU)Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, with 60GBRAM and the Cuda enabled graphics processing unit (GPU)Nvidia Tesla K80. For both use cases, 67% randomly selectedexamples are used for training in batch sizes of 1024, and 33%for testing and validation. Hence, for modulation recognition147,400 examples are used for training, while 72,600 exam-ples for testing and validation. For the task of interferenceidentification, 151,200 examples are training examples, while74,025 examples are used to test the model. Both sets ofexamples are uniformly distributed in Signal-to-Noise Ratio(SNR) from -20dB to +20dB and tagged so that performancecan be evaluated on specific subsets. To estimate the modelparameters the Adaptive moment estimation (Adam) optimizer[30] was used with a learning rate α = 0.001, and the inputdata was normalized to speed up convergence of the learningalgorithm. The CNNs were trained on 70 epochs and themodel with the lowest validation loss is selected for evaluation.In total, 6 CNNs were trained, i.e. one for each use caseand signal representation. Three for modulation recognition:CNNMIQ, CNNMA/φ and CNNMF , and three for technologyidentification CNNIFIQ, CNNIFA/φ and CNNIFF . The trainingtime on the GPU resulted in a duration of approximately 60sper epoch for the CNNs performing interference identification,while 42s for the modulation recognition CNNs.

D. Performance metrics

In order to characterize and compare the prediction accu-racy of the end-to-end wireless signal classification modelsthat recognize modulation type or identify interference, weneed to measure how well their predictions match the trueresponse value of the observed spectrum data. Therefore, theperformance of the end-to-end signal classification methodscan be quantified by means of the prediction accuracy on atest data sample. If the true value and the estimate of thesignal classifiers for any instance i are given by yi and yi,

respectively, then the overall classification test error over mtest

testing snapshots can be defined in the following way:

Etest =1

mtest

mtest∑i=1

l(yi, yi) (29)

The classification accuracy is then obtained with 1− Etest.Furthermore, for each signal snapshot in the test set, inter-

mediate statistics, i.e. the number of true positive (TP), falsepositive (FP) and false negative (FN) are calculated as follows:• If a signal is detected as being from a particular class and

it is also annotated as such in the labelled test data, thatinstance is regarded as TP.

• If a signal is predicted as being from a particular classbut does not belong to that class according to the labelledtest data, that instance is regarded as FP.

• If a signal is not detected in a particular instance but itis present in that instance in the labelled test data, thatinstance is regarded as FN.

The intermediate statistics are accumulated over all in-stances in the test set and used to derive three further per-formance metrics precision (P), recall (R) and F1 score:

P =TP

TP + FN, R =

TP

TP + FP(30)

F1score = 2× precision× recallprecision+ recall

(31)

Precision, recall and F1 score are per-class performancemetrics. In order to obtain one measure that quantifies theoverall performance of the classifier, multiple per-class per-formance measures are combined using a prevalence-weightedmacro-average across the class metrics, Pavg , Ravg and F1avg .For a detailed overview of the per-class performance theconfusion matrix is used.

E. Numerical results

1) Classification performance: The CNN network de-scribed in Table I is trained on three data representations fortwo wireless signal identification problems. Table II providesthe averaged performance for the six classifiers. That is, theprevalence-weighted macro-average of precision, recall and F1

score under three SNR scenarios, high (SNR=18dB), medium(SNR=0dB) and low (SNR=-8dB).

We observe that the models for interference classificationshow better performance compared to the modulation recog-nition case. For high SNR conditions, the CNNIF modelsachieve a Pavg , Ravg and F1avg

between 0.98 and 0.99.For medium SNR the metrics are in the range of 0.94and 0.99, while under low SNR conditions the performanceslightly degrades to 0.81-0.90. The CNNM models showless robustness to varying SNR conditions, and in generalachieve lower classification performance for all scenarios. Inparticular, under high SNR conditions depending on the useddata representation the achieved Pavg , Ravg and F1avg are inthe range of 0.67-0.86. For medium SNR, the performancedegrades more then for the CNNIF models, with a Pavg,Ravg and F1avg

in the range of 0.59-0.75. Under low SNR,

Page 12: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

12

TABLE IIPERFORMANCE COMPARISON FOR THE TRAINED CNN SIGNAL

CLASSIFIER MODELS FOR THREE SNR SCENARIOS

Case Model SNR Pavg Ravg F1 scoreavg

Mod. recognition

CNNMIQ

High 0.83 0.82 0.79Medium 0.75 0.75 0.72

Low 0.36 0.32 0.30

CNNMA/φ

High 0.86 0.84 0.82Medium 0.70 0.70 0.69

Low 0.33 0.29 0.26

CNNMF

High 0.71 0.68 0.67Medium 0.63 0.6 0.59

Low 0.28 0.25 0.22

IF identification

CNNIFIQ

High 0.98 0.98 0.98Medium 0.95 0.94 0.94

Low 0.84 0.82 0.81

CNNIFA/φ

High 0.99 0.99 0.99Medium 0.98 0.98 0.98

Low 0.87 0.86 0.86

CNNIFF

High 0.99 0.99 0.99Medium 0.99 0.99 0.99

Low 0.90 0.89 0.89

the CNNM models show poor performance with the metricsvalues in the range of 0.22-0.36.

This may be explained by the different channel models usedfor generating the datasets for the two case studies, and thetype of signals that need to be discriminated in each problem.For instance, for the IF case a simple channel model withflat fading was considered, while for modulation recognitionthe channel model was a time-varying multipath fading chan-nel and other transceiver impairments were also taken intoaccount. Hence, the modulation recognition dataset used amore realistic channel model in the data collection process.However, this impacts the classification performance because itis more challenging to design a robust signal classifier for thiscase compared to the channel condition considered in the IFclassification problem. Furthermore, the signals that are classi-fied for IF detection have different characteristics by design. Inparticular, they use different medium access schemes, channelbandwidth and modulation techniques, which makes it easierfor the classifier to differentiate them. In contrast, the selectedmodulation recognition signals are more similar to each other,because subsets of modulations are based on similar designprinciples (e.g. all are single carrier modulations).

To understand the results better confusion matrices for theCNNMIQ, CNNMA/φ and CNNMF models are presented on Figure7 for the case of SNR=6dB. It can be seen that the classifiersshows good performance by discriminating AM-DSB, AM-SSB, BPSK, CPFSK, GFSK and PAM4 with high accuracy forall three data representations. The main discrepancies are thatof QAM16 misclassified as QAM64, which can be explainedby the underlying dataset. QAM16 is a subset of QAM64making it difficult for the classifier to differentiate them. It canbe further noticed that the amplitude/phase information helpedthe model better discriminate QAM16/QAM64, leading to aclearer diagonal for the CNNMA/φ compared to CNNMIQ. Thereare further difficulties in separating AM-DSB and WBFM

8PSK

AM-DSB

AM-SSBBPS

KCPFS

KGFS

KPA

M4QAM16

QAM64QPS

KWBFM

Predicted label

8PSKAM-DSBAM-SSB

BPSKCPFSKGFSKPAM4

QAM16QAM64

QPSKWBFM

True

labe

l

0.0

0.2

0.4

0.6

0.8

1.0

(a) CNNMIQ

8PSK

AM-DSB

AM-SSBBPS

KCPFS

KGFS

KPA

M4QAM16

QAM64QPS

KWBFM

Predicted label

8PSKAM-DSBAM-SSB

BPSKCPFSKGFSKPAM4

QAM16QAM64

QPSKWBFM

True

labe

l0.0

0.2

0.4

0.6

0.8

1.0

(b) CNNMA/φ

8PSK

AM-DSB

AM-SSBBPS

KCPFS

KGFS

KPA

M4QAM16

QAM64QPS

KWBFM

Predicted label

8PSKAM-DSBAM-SSB

BPSKCPFSKGFSKPAM4

QAM16QAM64

QPSKWBFM

True

labe

l

0.0

0.2

0.4

0.6

0.8

1.0

(c) CNNMF

Fig. 7. Confusion matrices for the modulation recognition data for SNR 6dB

signals. This confusion may be caused by periods of absenceof the signal, as the modulated signals were created from realaudio streams. In case of using the frequency spectrum data,it can be noticed that the CNNMF classifier confuses mostlyQPSK, 8PSK, QAM16 and QAM16 which is due to theirsimilarities in the frequency domain after channel distortions,making the received symbols indiscernible from each other.

2) Noise Sensitivity: In this section, we evaluate the detec-tion performance for the CNN signal classifiers under differentnoise levels. This allows to investigate the communicationrange over which the classifiers can be effectively used. To

Page 13: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

13

20 15 10 5 0 5 10 15Signal to Noise Ratio (dB)

10

20

30

40

50

60

70

80

Clas

sifica

tion

Accu

racy

[%]

CNNMI/Q

CNNMA/

CNNMF

Fig. 8. Performance results for modulation recognition classifiers vs. SNR

estimate the sensitivity to noise the same testing sets wereused labelled with SNR values from -20dB to +20dB and fedinto the signal classifiers to obtain the estimated values foreach SNR.

Figures 8 and 9 show the obtained results for the modulationrecognition and IF identification models, respectively.

Modulation recognition case. Figure 8 shows that all threemodulation recognition CNN models have similar performancefor very low SNRs (< −10dB), for medium SNRs theCNNMI/Q outperforms the CNNMA/φ and CNNMF models by2-5dB, while for high SNR conditions (> 5dB) the CNNMA/φmodel outperforms the CNNMI/Q and CNNMF model with upto 2% and 12% accuracy improvements, respectively. Theauthors in [9] used IQ data and reported higher accuracythen the results we obtained. We were not able to reproducetheir results after various attempts on the IQ data, whichmay be due to the difference in the dataset (e.g. number oftraining examples), train/test split and hyper-parameter tuning.However, we noticed that the amplitude/phase representationhelped the model discriminate the modulation formats bettercompared to raw IQ time-series data for high SNR scenarios.We regret that results for amplitude/phase representations werenot reported in [9] too, as this may had helped improvingperformance. Using the frequency spectrum data did not im-prove the classification accuracy compared to the IQ data. Thisis expected as the underlying dataset has many modulationclasses, which exhibit common characteristics in the frequencydomain after the channel distortion and receiver imperfectioneffects, particularly QPSK, 8PSK, QAM16 and QAM64. Thismakes the frequency spectrum a sub-optimal representationsfor this classification problem.

Interference detection case. The IF identification modelson Figure 9 show in general better performance comparedto the modulation recognition classifiers, where the CNNIFFshowed best performance during all SNR scenarios. In partic-ular, for low SNR scenarios significant improvements can benoticed compared to the CNNIFA/φ and CNNIFI/Q models with aperformance gain improvement of at least ∼ 4dB, and classi-fication accuracy improvement of at least ∼ 9%. The authorsof [12] used IQ and FFT data representations and reported

20 15 10 5 0 5 10 15 20Signal to Noise Ratio (dB)

20

40

60

80

100

Clas

sifica

tion

Accu

racy

[%]

CNNIFI/Q

CNNIFA/

CNNIFF

Fig. 9. Performance results for interference identification classifiers vs. SNR

similar results as our CNNIFI/Q and CNNIFF models. However,again we noticed that the amplitude/phase representation isbeneficial for discriminating signals compared to raw IQ data.But the IF identification classifier performed best on FFT datarepresentations. This may be explained by the fact that thewireless signals from the ISM band standards (ZigBee, WiFiand Bluetooth) have more expressive features in the frequencydomain as they have different frequency spectrum characteris-tics in terms of bandwidth and modulation/spreading method.

3) Takeaways: End-to-end learning is a powerful tool fordata-driven spectrum monitoring applications. It can be ap-plied to various wireless signals to effectively detect thepresence of radio emitters in a unified way without requiringdesign of expert features. Experiments have shown that theperformance of wireless signal classifiers depends on the useddata representation. This suggests that investigating severaldata representations is important to arrive at accurate wirelesssignal classifiers for a particular task. Furthermore, the choiceof data representation depends on the specifics of the problem,i.e. the considered wireless signal types for classification. Sig-nals within a dataset that exhibit similar characteristics in onedata representation are more difficult to discriminate, whichputs a higher burden on the model learning procedure. Choos-ing the right wireless data representation can notably increasethe classification performance, for which domain knowledgeabout the specifics of the underlying signals targeted in thespectrum monitoring application can assist. Additionally, theperformance of the classifier can be improved by increasing thequality of the wireless signal dataset, by adding more trainingexamples, more variation among the examples (e.g. varyingchannel conditions), and tuning the model hyper-parameters.

VI. OPEN CHALLENGES

Despite the encouraging research results, a deep learning-based end-to-end learning framework for spectrum utilizationoptimization is still in its infancy. In the following we discusssome of the most important challenges posed by this excitinginterdisciplinary field.

Page 14: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

14

A. Scalable spectrum monitoring

The first requirement for a cognitive spectrum monitoringframework is to have an infrastructure that will supportscalable spectrum data collection, transfer and storage. Inorder to obtain a detailed overview of the spectrum use, theend-devices will be required to perform distributive spectrumsensing [31] over a wide frequency range and cover the areaof interest. In order to limit the data overhead caused byhuge amounts of I and Q samples that are generated bymonitoring devices, the predictive models can be pushed tothe end devices itself. Recently, [32] proposed Electrosense,an initiative for large-scale spectrum monitoring in differentregions of the world using low-cost sensors and providing theprocessed spectrum data as open spectrum data. Access tolarge datasets is crucial for evaluating research advances andenabling a playground for wireless communication researchersinterested to acquire a deeper knowledge of spectrum usageand to extract meaningful knowledge that can be used to designbetter wireless communication systems.

B. Scalable spectrum learning

The heterogeneity of technologies operating in different ra-dio bands requires to continuously monitor multiple frequencybands making the volume and velocity of radio spectrum dataseveral orders of magnitude higher compared to the typicaldata seen in other wireless communication systems such aswireless sensor networks (e.g. temperature, humidity reports,etc.). In order to handle this large volume of data and extractmeaningful information over the entire spectrum, a scalableplatform for processing, analysing and learning from bigspectrum data has to be designed and implemented [33], [3].Efficient data processing and storage systems and algorithmsfor massive spectrum data analytics [34] are needed to extractvaluable information from such data and incorporate it intothe spectrum decision/policy process in real-time.

C. Flexible spectrum management

One of the main communication challenges for 5G willbe inter-cell and cross-technology interference. To supportspectrum decisions and policies in such complex system, 5Gnetworks need to support an architecture for flexible spectrummanagement.

Software-ization at the radio level will be a key enablerfor flexible spectrum management as it allows automation forthe collection of spectrum data, flexible control and recon-figuration of cognitive radio elements and parameters. Thereare several individual works that focused on this issue. Someinitiatives for embedded devices are WiSCoP [35], Atomix[36] and [37]. Recently, there is also a growing interest inacademia and industry to apply Software Defined Networking(SDN) and Network Function Virtualization (NFV) to wirelessnetworks [38]. Initiatives such as SoftAir [39], Cloud RAN[40], OpenRadio [41] and several others are still at the concep-tual or prototype level. To bring flexible spectrum managementstrategies into realization and the commercial perspective agreat deal of standardization efforts is still required.

D. Spectrum privacy

The introduction of intelligent wireless systems raises sev-eral privacy issues. The spectrum will be monitored via hetero-geneous radios including WSNs, RFIDs, cellular phones andothers, which may lead to misuse of the applications and causesevere privacy-related threats. Therefore, privacy is required atthe spectrum data collection level. As spectrum data may beshared along the way, privacy has to be maintained also atdata sharing levels. Thus, data anonymization, restricted dataaccess, proper authentication and strict control of intelligentradio users is required.

VII. CONCLUSION

This paper presents a comprehensive and systematic intro-duction to end-to-end learning from spectrum data - a deeplearning based unified approach for realizing various wirelesssignal identification tasks, which are the main building blocksof spectrum monitoring systems. The approach developsaround the systematic application of deep learning techniquesto obtain accurate wireless signal classifiers in an end-to-endlearning pipeline. In particular, convolutional neural networks(CNNs) lend themselves well to this setting, because theyconsist of many layers of processing units capable to (i)automatically extract non-linear and more abstract wirelesssignal features that are invariant to local spectral and temporalvariations, and (ii) train wireless signal classifiers that canoutperform traditional approaches.

With the aim to raise awareness of the potential of thisemerging interdisciplinary research area, first, machine learn-ing, deep learning and CNNs were briefly introduced and areference model for their application for spectrum monitoringscenarios was proposed. Then, a framework for end-to-endlearning from spectrum data was presented. In particular,wireless data collection, the design of wireless signal featuresand classifiers suitable for several wireless signal identifi-cation tasks are elaborated. Three common wireless signalrepresentations were defined, the raw IQ temporal wirelesssignal, the time domain amplitude and phase informationdata, and the spectral magnitude representation. The presentedmethodology was validated on two active wireless signalidentification research problems: (i) modulation recognitioncrucial for dynamic spectrum access applications and (ii)wireless interference identification essential for effective inter-ference mitigation strategies in unlicensed bands. Experimentshave shown that CNNs are promising feature learning andfunction approximation techniques, well-suited for differentwireless signal classification problems. Furthermore, the pre-sented results indicated that for the wireless communicationdomain investigating different wireless data representations isimportant to determine the right representation that exhibitsdiscriminative characteristics for the signals that need to beclassified. Specifically, in the modulation recognition casestudy for medium-high SNR the CNN model trained onamplitude/phase representations outperformed the other twomodels with a 2% and 10% performance improvement, whilefor low SNR conditions the model trained on IQ data repre-sentations showed best performance. For the task of detecting

Page 15: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

15

interference, the model trained on FFT data outperformedamplitude/phase and IQ data representation models by up to20% for low SNR conditions, while for medium-high SNR upto 5% classification accuracy improvements.

These results demonstrate the importance of both choosingthe correct data representation and machine learning approach,both of which are systematically introduced in this paper. Byfollowing the proposed methodology, deeper insights can beobtained regarding the optimality of data representations fordifferent research domains. As such, we envisage this paperto empower and guide machine learning/signal processingpractitioners and wireless engineers to design new innovativeresearch applications of end-to-end learning from spectrumdata that address issues related to cross-technology coexis-tence, inefficient spectrum utilization and regulation.

VIII. ACKNOWLEDGEMENTS

We are grateful to Prof. Gerard J.M. Janssen for his in-sightful comments, and Malte Schmidt et al. [12] for sharingthe ”Wireless interference” dataset. This work was partlysupported by the EU H2020 eWINE project under grantagreement number 688116, SBO SAMURAI project and theAWS Educate/GitHub Student Developer Pack.

REFERENCES

[1] M. Hoyhtya, A. Mammela, M. Eskola, M. Matinmikko, J. Kalliovaara,J. Ojaniemi, J. Suutala, R. Ekman, R. Bacchus, and D. Roberson,“Spectrum occupancy measurements: A survey and use of interferencemaps,” IEEE Communications Surveys & Tutorials, vol. 18, no. 4, pp.2386–2414, 2016.

[2] A. Hithnawi, H. Shafagh, and S. Duquennoy, “Understanding the impactof cross technology interference on ieee 802.15. 4,” in Proceedingsof the 9th ACM international workshop on Wireless network testbeds,experimental evaluation and characterization. ACM, 2014, pp. 49–56.

[3] G. Ding, Q. Wu, J. Wang, and Y.-D. Yao, “Big spectrum data:The new resource for cognitive wireless networking,” arXiv preprintarXiv:1404.6508, 2014.

[4] U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, “Off-roadobstacle avoidance through end-to-end learning,” in Advances in neuralinformation processing systems, 2006, pp. 739–746.

[5] E. Axell, G. Leus, E. G. Larsson, and H. V. Poor, “Spectrum sensingfor cognitive radio: State-of-the-art and recent advances,” IEEE SignalProcessing Magazine, vol. 29, no. 3, pp. 101–116, 2012.

[6] K. Kim, I. A. Akbar, K. K. Bae, J.-S. Um, C. M. Spooner, and J. H. Reed,“Cyclostationary approaches to signal detection and classification incognitive radio,” in New frontiers in dynamic spectrum access networks,2007. DySPAN 2007. 2nd IEEE international symposium on. IEEE,2007, pp. 212–215.

[7] A. Fehske, J. Gaeddert, and J. H. Reed, “A new approach to signalclassification using spectral correlation and neural networks,” in NewFrontiers in Dynamic Spectrum Access Networks, 2005. DySPAN 2005.2005 First IEEE International Symposium on. IEEE, 2005, pp. 144–150.

[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in Advances in neural infor-mation processing systems, 2012, pp. 1097–1105.

[9] T. J. OShea, J. Corgan, and T. C. Clancy, “Convolutional radio modula-tion recognition networks,” in International Conference on EngineeringApplications of Neural Networks. Springer, 2016, pp. 213–226.

[10] A. Selim, F. Paisana, J. A. Arokkiam, Y. Zhang, L. Doyle, and L. A.DaSilva, “Spectrum monitoring for radar bands using deep convolutionalneural networks,” arXiv preprint arXiv:1705.00462, 2017.

[11] J. Akeret, C. Chang, A. Lucchi, and A. Refregier, “Radio frequencyinterference mitigation using deep convolutional neural networks,” As-tronomy and Computing, vol. 18, pp. 35–39, 2017.

[12] M. Schmidt, D. Block, and U. Meier, “Wireless interferenceidentification with convolutional neural networks,” arXiv preprintarXiv:1703.00737, 2017.

[13] S. Rajendran, W. Meert, D. Giustiniano, V. Lenders, and S. Pollin,“Distributed deep learning models for wireless signal classification withlow-cost spectrum sensors,” arXiv preprint arXiv:1707.08908, 2017.

[14] T. OShea and J. Hoydis, “An introduction to deep learning for thephysical layer,” IEEE Transactions on Cognitive Communications andNetworking, no. 99, pp. 1–1, 2017.

[15] S. Yao, S. Hu, Y. Zhao, A. Zhang, and T. Abdelzaher, “Deepsense:A unified deep learning framework for time-series mobile sensing dataprocessing,” in Proceedings of the 26th International Conference onWorld Wide Web. International World Wide Web Conferences SteeringCommittee, 2017, pp. 351–360.

[16] D. Chen, S. Yin, Q. Zhang, M. Liu, and S. Li, “Mining spectrumusage data: a large-scale spectrum measurement study,” in Proceedingsof the 15th annual international conference on Mobile computing andnetworking. ACM, 2009, pp. 13–24.

[17] S. Haykin, “Cognitive radio: brain-empowered wireless communica-tions,” IEEE journal on selected areas in communications, vol. 23, no. 2,pp. 201–220, 2005.

[18] I. F. Akyildiz, W.-Y. Lee, M. C. Vuran, and S. Mohanty, “A survey onspectrum management in cognitive radio networks,” IEEE Communica-tions magazine, vol. 46, no. 4, 2008.

[19] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of things(iot): A vision, architectural elements, and future directions,” Futuregeneration computer systems, vol. 29, no. 7, pp. 1645–1660, 2013.

[20] S. Gollakota, F. Adib, D. Katabi, and S. Seshan, “Clearing the rfsmog: making 802.11 n robust to cross-technology interference,” in ACMSIGCOMM Computer Communication Review, vol. 41, no. 4. ACM,2011, pp. 170–181.

[21] A. A. KHAN, M. H. Rehmani, and A. Rachedi, “Cognitive-radio-basedinternet of things: applications, architectures, spectrum related function-alities, and future research directions,” IEEE Wireless Communications,p. 3, 2017.

[22] Q. Wu, G. Ding, Y. Xu, S. Feng, Z. Du, J. Wang, and K. Long,“Cognitive internet of things: a new paradigm beyond connection,” IEEEInternet of Things Journal, vol. 1, no. 2, pp. 129–143, 2014.

[23] G. Staple and K. Werbach, “The end of spectrum scarcity [spectrumallocation and utilization],” IEEE spectrum, vol. 41, no. 3, pp. 48–52,2004.

[24] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press,2016, http://www.deeplearningbook.org.

[25] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, andR. Salakhutdinov, “Dropout: a simple way to prevent neural networksfrom overfitting.” Journal of Machine Learning Research, vol. 15, no. 1,pp. 1929–1958, 2014.

[26] M. Kulin, C. Fortuna, E. De Poorter, D. Deschrijver, and I. Moerman,“Data-driven design of intelligent wireless networks: An overview andtutorial,” Sensors, vol. 16, no. 6, p. 790, 2016.

[27] C. Jiang, H. Zhang, Y. Ren, Z. Han, K.-C. Chen, and L. Hanzo,“Machine learning paradigms for next-generation wireless networks,”IEEE Wireless Communications, vol. 24, no. 2, pp. 98–105, 2017.

[28] B. Danev and S. Capkun, “Transient-based identification of wirelesssensor nodes,” in Proceedings of the 2009 International Conference onInformation Processing in Sensor Networks. IEEE Computer Society,2009, pp. 25–36.

[29] F. Chollet et al., “Keras,” 2015. [Online]. Available: https://github.com/fchollet/keras

[30] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014.

[31] W. Liu, M. Chwalisz, C. Fortuna, E. De Poorter, J. Hauer, D. Pareit,L. Hollevoet, and I. Moerman, “Heterogeneous spectrum sensing: chal-lenges and methodologies,” EURASIP Journal on Wireless Communica-tions and Networking, vol. 2015, no. 1, p. 70, 2015.

[32] S. Rajendran, R. Calvo-Palomino, M. Fuchs, B. V. d. Bergh, H. Cor-dobes, D. Giustiniano, S. Pollin, and V. Lenders, “Electrosense: Openand big spectrum data,” arXiv preprint arXiv:1703.09989, 2017.

[33] A. Zaslavsky, C. Perera, and D. Georgakopoulos, “Sensing as a serviceand big data,” arXiv preprint arXiv:1301.0159, 2013.

[34] A. Sandryhaila and J. M. Moura, “Big data analysis with signalprocessing on graphs: Representation and processing of massive datasets with irregular structure,” IEEE Signal Processing Magazine, vol. 31,no. 5, pp. 80–90, 2014.

[35] T. Kazaz, X. Jiao, M. Kulin, and I. Moerman, “Demo: Wiscop - wirelesssensor communication prototyping platform,” in Proceedings of the 2017International Conference on Embedded Wireless Systems and Networks.USA: Junction Publishing, 2017, pp. 246–247.

Page 16: End-to-end Learning from Spectrum Data: A Deep …1 End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

16

[36] M. Bansal, A. Schulman, and S. Katti, “Atomix: A framework fordeploying signal processing applications on wireless infrastructure.” inNSDI, 2015, pp. 173–188.

[37] T. Kazaz, C. Van Praet, M. Kulin, P. Willemen, and I. Moerman,“Hardware accelerated sdr platform for adaptive air interfaces,” ETSIWorkshop on Future Radio Technologies : Air Interfaces, 2016.

[38] Z. Zaidi, V. Friderikos, Z. Yousaf, S. Fletcher, M. Dohler, and H. Agh-vami, “Will sdn be part of 5g?” arXiv preprint arXiv:1708.05096, 2017.

[39] I. F. Akyildiz, P. Wang, and S.-C. Lin, “Softair: A software definednetworking architecture for 5g wireless systems,” Computer Networks,vol. 85, pp. 1–18, 2015.

[40] A. Checko, H. L. Christiansen, Y. Yan, L. Scolari, G. Kardaras, M. S.Berger, and L. Dittmann, “Cloud ran for mobile networksa technologyoverview,” IEEE Communications surveys & tutorials, vol. 17, no. 1,pp. 405–426, 2015.

[41] M. Bansal, J. Mehlman, S. Katti, and P. Levis, “Openradio: a pro-grammable wireless dataplane,” in Proceedings of the first workshopon Hot topics in software defined networks. ACM, 2012, pp. 109–114.