radio fingerprinting using convolutional neural networks · neural networks to cognitive radio...

49
Radio Fingerprinting Using Convolutional Neural Networks A Thesis Presented by Shamnaz Mohammed Riyaz to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering Northeastern University Boston, Massachusetts July 2018

Upload: others

Post on 03-Feb-2021

14 views

Category:

Documents


0 download

TRANSCRIPT

  • Radio Fingerprinting Using Convolutional Neural Networks

    A Thesis Presented

    by

    Shamnaz Mohammed Riyaz

    to

    The Department of Electrical and Computer Engineering

    in partial fulfillment of the requirements

    for the degree of

    Master of Science

    in

    Electrical and Computer Engineering

    Northeastern University

    Boston, Massachusetts

    July 2018

  • To my family

    ii

  • Contents

    List of Figures v

    List of Tables vi

    Acknowledgments vii

    Abstract of the Thesis viii

    1 Introduction 1

    2 Related work 32.0.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.0.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    3 Causes of hardware impairments 93.1 Hardware impairments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    3.1.1 I/Q imbalance: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.2 Phase noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.3 Carrier frequency and phase offset . . . . . . . . . . . . . . . . . . . . . . 123.1.4 Harmonic distortions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.5 Power amplifier distortions . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.1 Protocols of operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2.2 Storage and processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4 Deep learning for RF fingerprinting 184.1 Initial studies on ML techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    4.1.1 Support vector machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.1.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    4.2 Convolutional neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.1 CNN architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    iii

  • 5 Results and performance evaluation 295.1 Network setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    5.2.1 CNN vs. conventional algorithms . . . . . . . . . . . . . . . . . . . . . . 325.2.2 Receiver operating characteristics for radio fingerprinting . . . . . . . . . . 335.2.3 Impact of distance on radio fingerprinting . . . . . . . . . . . . . . . . . . 35

    6 Conclusion 386.1 Research challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    Bibliography 40

    iv

  • List of Figures

    2.1 RF fingerprinting classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    3.1 Typical transceiver chain with various sources of RF impairments. . . . . . . . . . 103.2 Amplitude imbalance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Phase imbalance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.4 Phase noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Phase offset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.6 AM/AM distortion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.7 AM/PM distortion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.8 Data collection using SDR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.9 Experimental setup demonstrating data capture. . . . . . . . . . . . . . . . . . . . 153.10 Discovery cluster partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.1 Device classification using Logistic Regression and Linear SVM for WiFi and LTE. 204.2 CNN architecture for RF fingerprinting. . . . . . . . . . . . . . . . . . . . . . . . 224.3 Convolution operation: filters strided over input sequences. . . . . . . . . . . . . . 234.4 Rectified Linear Unit (ReLU) operation performed on feature maps. . . . . . . . . 244.5 An illustration of max pooling operation. . . . . . . . . . . . . . . . . . . . . . . . 254.6 An illustration of sliding operation using a window of length 128. . . . . . . . . . 28

    5.1 Software stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.2 The accuracy comparison of SVM, logistic regression and CNN for 2− 5 devices. 325.3 ROC curve fold1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.4 ROC curve fold 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.5 ROC curve fold 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.6 ROC curve fold 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.7 ROC curve fold 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.8 Computational load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.9 The plot of accuracy obtained using CNN for 4 devices over different distances

    between transmitter and receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    v

  • List of Tables

    4.1 CNN architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    vi

  • Acknowledgments

    Foremost, I would like to thank my advisor Prof. Kaushik Chowdhury for his constantguidance and encouragement in all my endeavors. His vision and ideas have always been a source ofinspiration for me. I thoroughly enjoyed my learning experience in his course ’Mobile and WirelessNetworking’ and also in the research associated with his Genesys lab. He has been extremelysupportive and patient throughout this research. I would also like to thank Prof. Stratis Ioannidis andProf. Jennifer Dy for their positive feedback and continuous association since the inception of theproject.I thank my husband Rameez for his support, inspiration and confidence in me. I am grateful for myparents, Mohammed and Sajida and in laws Rasheed and Rabia for being supportive and alwaysmotivating me to excel in everything I do. I would also like to thank my labmates in the Genesys labspecifically Kunal and Mauro for helping me with various experiments. Their company provided apositive energy in the work place.

    vii

  • Abstract of the Thesis

    Radio Fingerprinting Using Convolutional Neural Networks

    by

    Shamnaz Mohammed Riyaz

    Master of Science in Electrical and Computer Engineering

    Northeastern University, July 2018

    Dr. Kaushik Chowdhury, Advisor

    In this thesis, we describe a method for uniquely identifying a specific radio amongnominally similar devices using a combination of software defined radio (SDR) sensing capabilityand machine learning (ML) techniques. Our approach of radio fingerprinting applies ML over rawI/Q samples without specifically selecting features of interest. It distinguishes devices using only thetransmitter hardware-induced signal modifications that serve as a unique signature for a particulardevice. No higher level decoding, feature engineering, or protocol knowledge is needed, furthermitigating challenges of ID spoofing and coexistence of multiple protocols in a shared spectrum.Advances in SDR technology allows unprecedented control on the entire processing chain, allowingmodification of each functional block as well as sampling the changes in the input waveform.We first demonstrate RF impairments by modifying the operational blocks in a typical wirelesscommunications processing chain in a simulation study. We then generate over-the-air datasetcompiled from an experimental testbed of SDRs such as B210 and X310 and train the data usingan optimized deep convolutional neural network (CNN) architecture that gives good classificationaccuracy. We describe the parallel processing needs and choice of several hyper parameters to enableefficient training of the CNN model. We then compare the performance quantitatively with alternatetechniques such as support vector machines and logistic regression. Overall our results show that wecan achieve up to 90-99% experimental accuracy at transmitter-receiver distances varying between2-50 feet over a noisy, multi-path wireless channel.

    viii

  • Chapter 1

    Introduction

    Emerging applications in the context of smart cities, autonomous vehicles, Internet of

    Things (IoT), and complex military missions, among others, require reconfigurability both at the

    systems and the protocol level within its communications architecture. These advances rely on a

    critical enabling component, namely, software defined radio (SDR): this allows cross-layer pro-

    grammability of the transceiver hardware using high level directives [1]. The promise of intelligent

    or so called cognitive radios builds on the SDR concept, where the radio is capable of gathering

    contextual information and adapting its own operation by changing the settings on the SDR based on

    what it perceives in its surroundings.

    In the last few decades, there has been an incredible growth in the application of internet

    and connected devices. However, the privacy and security of such billions of devices is a paramount

    concern in the IoT network. Any device that has network connectivity is vulnerable. Data gathered

    by IoT devices are susceptible to attacks such as ID spoofing by an intruder. Most of the IoT

    devices have limited computing power and memory capacity, which makes it difficult to use complex

    cryptographic algorithms that require more resources than the devices can provide. Therefore,

    there is insufficient authentication/authorization. Additionally, in many mission critical scenarios,

    problems in authenticating devices, ID spoofing and unauthorized transmissions are major concerns.

    Moreover, high bandwidth applications are causing a spectrum crunch, leading network providers

    to explore innovative spectrum sharing regimes in the TV whitespace and the sub-6GHz bands. In

    all of the above, identifying (i) the type of the protocol in use, and (ii) the specific radio transmitter

    (among many other nominally similar radios) become important. Our work on SDR-enabled radio

    fingerprinting tackles these two scenarios by learning characteristic features of the transmitters in

    a pre-deployment training phase, which is then exploited during actual network operation. We

    1

  • CHAPTER 1. INTRODUCTION

    recognize that SDRs come in diverse form factors with varying on-board computational resources.

    Thus, for general purpose use, any device fingerprinting approach must be computationally simple

    once deployed in the field. For this reason, we propose machine learning (ML) techniques, specifically,

    Deep Convolutional Neural Networks (CNNs), and experimentally demonstrate near-perfect radio

    identification performance in many practical scenarios.

    ML techniques have been remarkably successful in image and speech recognition, however,

    their utility for device level fingerprinting by feature learning has yet to be conclusively demonstrated.

    True autonomous behavior of SDRs, not only in terms of detecting spectrum usage, but also in terms

    of self-tuning a multitude of parameters and reacting to environmental stimulus is now a distinct

    possibility. We collect over 20 · 106 RF I/Q samples over multiple transmission rounds for eachtransmitter-receiver pair composed of off-the-shelf Universal Software Radio Peripheral (USRP)

    SDRs. The approach of providing raw time series radio signal by treating the complex data as

    dimension of 2 real valued I/Q inputs to the CNN, is motivated from modulation classification [2].

    It has been found to be a promising technique for feature learning on large time series data. Our

    technique of RF fingerprinting using the I/Q samples that carry embedded signatures characteristic

    of different active transmitter hardware is a first in this field to the best of our knowledge. My

    contributions in this project are:

    • Generation of large real-time series data composed of 802.11ac signals using SDRs• Simulation study on the causes of hardware impairments of the transmitters• Developed a CNN architecture composed of multiple convolutional and max-pooling layers

    optimized for the task of radio fingerprinting

    • Partitioned the collected samples into separate instances for data pre-processing• Implemented CNN training in Keras running on top of TensorFlow on the Northeastern

    discovery cluster environment

    • Evaluated performance of CNN along with support vector machines and logistic regressionThe thesis is organized as follows. We briefly survey and classify existing approaches in Chapter 2.

    In Chapter 3, we design a simulation model of a typical wireless communications processing chain

    in MATLAB, and then modify the ideal operational blocks to demonstrate the RF impairments that

    we wish to learn. This is followed with generation of real data and preprocessing data for training the

    classifier. In Chapter 4, we architect and experimentally validate an optimized deep convolutional

    neural network for radio fingerprinting. Experimental results and quantitative comparison of our

    approach with support vector machines and logistic regression is provided in Chapter 5. Finally,

    research challenges associated with our approach and conclusions are summarized in Chapter 6.

    2

  • Chapter 2

    Related work

    There has been a significant amount of research going on in the application of deep

    neural networks to cognitive radio tasks in the wireless communications field. While, the focus

    is mainly on modulation classification which has shown impressive results[3]. Our interest is in

    radio fingerprinting using deep learning architectures. The key idea behind radio fingerprinting is

    to extract unique patterns (or features) and use them as signatures to identify devices. A variety of

    features at the physical (PHY) layer, medium access control (MAC) layer, and upper layers have been

    utilized for radio fingerprinting [4] in the literature. Simple unique identifiers such as IP addresses,

    MAC addresses, mobile identification number (MIN), international mobile station equipment identity

    (IMEI) numbers can easily be spoofed. Location-based features such as radio signal strength (RSS)

    and channel state information (CSI) are susceptible to mobility and environmental changes. We

    are interested in studying those features that are inherent to a device’s hardware, which are also

    unchanging and not easily replicated by malicious agents. We classify existing approaches in Fig. 2.1.

    2.0.1 Supervised learning

    This type of learning requires a large collection of labeled samples prior to network

    deployment for training the ML algorithm. It takes thousands of input samples from the devices with

    labels correponding to each of the devices. The algorithm will then learn the relationship between

    the samples and their associated numbers, and apply that learned relationship to classify completely

    new samples (without labels) that the machine hasnt seen before. We study three types of learning

    namely similarity based, classification based and deep learning based mechanisms.

    3

  • CHAPTER 2. RELATED WORK

    Figure 2.1: RF fingerprinting classification.

    4

  • CHAPTER 2. RELATED WORK

    2.0.1.1 Similarity-based

    Similarity measurements involve comparing the observed signature of the given device

    with the references present in a master database. In [5], a passive fingerprinting technique is proposed

    that identifies the wireless device driver running on an IEEE 802.11 compliant node by collecting

    traces of probe request frames from the devices. They used binning approach on the time difference

    between probes as features. These bins are iterated to compute similarity by summing the difference

    of the percentages and mean differences scaled by percentage. They obtained an identification

    accuracy varying from 77% to 97% depending on the bin size. [6] describes a passive blackbox-based

    technique, that uses transmission control protocol (TCP) or user datagram protocol (UDP) packet

    inter-arrival time (ITA) from access points (APs) as signatures to identify AP types. APs exhibit

    different characteristics due to the manufacturing effects, because of which each AP will act upon

    the packet ITA differently. In this case, an AP is considered as blackbox, since there is no apriori

    information about the architecture of the AP. They collected multiple packet traces for each AP to

    compute the ITAs. An unique pattern is then extracted using wavelet analysis on these ITAs. These

    time intervals are sampled using bin sizes between 1-10µs. Optimal bin size is determined based on

    the difference in the ITAs among different APs, that lead to a maximum value. Cross-correlation is

    used to compute the similarity between the unknown signals and the signatures extracted from the

    wavelet analysis for pattern matching.

    2.0.1.2 Classification-based

    There are several studies on supervised learning that exploit RF features such as I/Q

    imbalance, phase imbalance, frequency error, and received signal strength, to name a few. These

    imperfections are transmitter-specific and manifest themselves as artifacts of the emitted signals.

    There are two types of algorithms

    • ConventionalThis form of classification examines a match with pre-selected features using domain knowl-

    edge of the system, i.e., the dominant feature(s) must be known a priori. This requires an

    expertise in the RF domain for feature engineering. [7] proposes classification by extracting

    the known preamble within a packet. The preamble signals are subjected to spectral analysis

    by using fast fourier transform (FFT) to obtain the spectral components from the time domain

    steady part of the signal. These log spectral energy features are fed as input to the k-nearest

    neighbors (k-NN) discriminatory classifier, which uses Euclidean distance to compute the

    5

  • CHAPTER 2. RELATED WORK

    distance. The training preambles are mapped into multidimensional feature space which is

    divided into sections depending on the class labels. A given preamble is categorized based on

    the highest frequency of occurrence of the label among all other k nearest training preambles.

    This approach provides promising results with 97% accuracy to distinguish between eight

    identical transmitters at 30dB signal-to-noise ratio (SNR). PARADIS [8] fingerprints 802.11

    devices based on modulation-specific errors in the network interface card (NIC) of a wireless

    frame. PARADIS demonstrated its effectiveness with an accuracy of 99% in distinguishing

    between more than 130 similar 802.11 network interface cards (NICs). It is also shown to be

    robust against alterations and noise in the wireless channel. In [9], a technique for physical

    device and device-type classification called GTID is proposed. This method exploits variations

    in clock skews as well as hardware compositions (such as processor, DMA controller, memory)

    of the devices and applies artificial neural networks (ANNs) for classification. Unique device

    specific signatures are created from the time-variant behavior of the traffic using statistical

    techniques. GTID performed classification across various device classes such as iPhones,

    Google phones that support variety of traffic types such as internet control message protocol

    (ICMP), Skype etc and achieves high accuracy and recall on identification. In general, as

    multiple different features are used, selecting the right set of features is a major challenge.

    Additionally, RF domain knowledge plays significant role in extracting features, which by

    itself is a time consuming task. This also causes scalability problems when large number of

    devices are present, leading to increased computational complexity in training.

    • Deep learningDeep learning offers a powerful framework for supervised learning approach. It can learn

    functions of increasing complexity, leverages large datasets, and greatly increases the the

    number of layers, in addition to neurons within a layer. [2] and [10] apply deep learning at

    the physical layer, specifically focusing on modulation recognition using convolutional neural

    networks. It involves identifying and differentiating broadcast radio, local and wide area

    data and voice radios, radar users, and other sources of radio interference in the surroundings

    which each have different behaviors and requirements. Modulation recognition is the task

    of classifying the modulation type of a received radio signal with an aim to determine the

    communication scheme. They classify 8 digital and 3 analog, totally 11 different modulation

    schemes that are used in wireless systems. These consist of BPSK, QPSK, 8PSK, 16QAM,

    64QAM, BFSK, CPFSK, and PAM4 for digital modulations, and WB-FM, AM-SSB, and

    6

  • CHAPTER 2. RELATED WORK

    AM-DSB for analog modulations. Overall, 87.4% classification accuracy is obtained on the

    test dataset. However, this approach does not identify a device, as we do here, but only the

    modulation type used by the transmitter.

    2.0.2 Unsupervised learning

    Unsupervised learning is effective when there is no prior label information about devices.

    In [11], an infinite Hidden Markov Random field (iHMRF)-based online classification algorithm is

    proposed for wireless fingerprinting using unsupervised clustering techniques and batch updates.

    This approach can model both time-dependent features such as received signal strength (RSS), Time-

    of-Arrival (TOA) and angle-of-arrival (AOA) using Markov property and time-independent features

    such as I/Q offset, carrier frequency offset (CFO), phase shift difference (PSD) using embedded

    gaussian mixture model (GMM). A combination of these features is used to identify the number

    of devices in a simulation testbed. However, this approach is yet to be demonstrated on real set of

    devices. Transmitter characteristics are used in [12] where a non-parametric Bayesian approach

    (namely, an infinite gaussian mixture model) classifies multiple devices in an unsupervised, passive

    manner. A multi-variable Gaussian distribution with unknown parameters is used to model the feature

    space of every single device. Similarly, an infinite gaussian mixture model used for multiple devices.

    The features chosen by this approach are invariant to the channel, resistant to mobility and are not

    affected by transmitter/receiver antenna gain and are independent of distance. It does not need to

    create a database of legitimate devices unlike supervised approaches. This approach specifically

    aims to detect identity spoofing by comparing the cluster labels with the device IDs. It identifies

    masquerading attacks, when it encounters multiple devices which share the same device ID.

    Our choice of algorithms is deep learning based which is made up of deep neural networks,

    where several hidden layers are present between the input and output nodes. These hidden layers

    extract the features from the input data and perform much more complicated classification tasks over

    the learned features. This approach does not require feature engineering unlike other conventional

    algorithms, thus reducing human intervention in identifying features. In the recent years, deep

    learning has been found to be successful in object recognition, image classification and powering

    vision in robots. Also most of the voice-activated personal assistants from Alexa, Cortana, Google

    assistant, Siri and other high bandwidth applications such as Youtube, Netflix are all powered with

    artificial-intelligence (AI) search engines, that provide us with information/recommendation as per

    user’s interests. However, building such intelligent systems is not an easy task. Training these

    7

  • CHAPTER 2. RELATED WORK

    deep learning algorithms require copious amount of data, nearly terabytes of data so that they can

    perform better. This means, it involves careful selection of hyper parameters and efficient tuning of

    these parameters required to solve complex functions. The number of such parameters can go up to

    millions and hence careful consideration of training platform and resources is necessary. Generally

    multi-core high performing GPUs are preferred to ensure efficient data processing. The complex

    functions take weeks to train the large amount of data even with hundreds of GPU machines. It is

    necessary to balance the trade off between training time and classification accuracy. Transmitter iden-

    tification using deep learning architectures is still in a nascent stage. Our work focuses on generation

    and processing of large number of RF I/Q samples to train the classifiers and eventually identify

    the devices uniquely. The data collection procedure, data pre-processing, choice of parameters,

    implementation details are explained in successive chapters.

    8

  • Chapter 3

    Causes of hardware impairments

    Radio fingerprinting is a mechanism through which wireless devices can be identified based

    on the unique characteristics in their analog components. Even though there has been an immense

    growth in electronic design, RF transmitters are naturally imperfect devices due to the tolerances in

    manufacturing of the analog electronics. These imperfections lead to differences in device specific

    parameters such as channel doping and oxide thickness. One important thing to consider is that, these

    imperfections are too small to compromise the specifications of communication standards [13]. Such

    imperfections are specifically found in the transmitter front end such as frequency mixers, digital to

    analog converters, band-pass filters and power amplifiers. RF fingerprint of a transmitter can not be

    easily cloned and hence it provides an extra layer of security over other cryptographic mechanisms.

    These fingerprints are unique to each device and cannot be replicated by any other device, since each

    device adds its own impairments on the transmitted signal.

    3.1 Hardware impairments

    MATLAB Communications System Toolbox provides applications for the design and

    analysis of communication systems. Using this, we design a simulation model of a typical wireless

    communications processing chain, and then modify the ideal operational blocks to introduce RF

    impairments, typically seen in actual hardware implementations. This allows us to individually study

    the I/Q imbalance, phase noise, carrier frequency and phase offset, nonlinearity of power amplifier

    and harmonic distortions in isolation of each other.

    A block diagram of transceiver pair is shown in Fig. 3.1, with various sources of RF

    impairments highlighted. We first study the effect of the hardware-induced causes of I/Q deviation

    9

  • CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS

    Digital Baseband

    (DSP) PA

    I/Q Imbalance

    Nonlinear Distor;on

    LO

    DAC

    DAC

    Phase Noise

    An;-aliasing Filter

    Digital Baseband

    (DSP)

    Carrier Frequency Offset

    ADC

    ADC

    An;-aliasing Filter

    LO

    Sampling Frequency Offset

    LNA

    Harmonics Distor;on

    (a) TransmiIer

    (b) Receiver

    π/2

    π/2

    Figure 3.1: Typical transceiver chain with various sources of RF impairments.

    10

  • CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS

    -1 -0.5 0 0.5 1

    In-phase Amplitude

    -1

    -0.5

    0

    0.5

    1

    Quadra

    ture

    Am

    plit

    ude

    Input Symbols

    Reference Points

    Figure 3.2: Amplitude imbalance.

    -1 -0.5 0 0.5 1

    In-phase Amplitude

    -1

    -0.5

    0

    0.5

    1

    Quadra

    ture

    Am

    plit

    ude

    Input Symbols

    Reference Points

    Figure 3.3: Phase imbalance.

    from the ideal values.

    3.1.1 I/Q imbalance:

    Quadrature mixers that convert baseband to RF and vice versa are often impaired by gain

    and phase mismatches between the parallel sections of the RF chain dealing with the in-phase (I) and

    quadrature (Q) signal paths. The analog gain is never the same for each signal path and the difference

    between their amplitude causes amplitude imbalance, i.e., it occurs when one modulator produces a

    larger signal than the other. In addition, the delay is never exactly 90◦, which causes phase imbalance,

    which means that the cosine and the sine local oscillator (LO) signals are not perfectly orthogonal.

    Figs. 3.2 and 3.3 illustrate the effect of amplitude imbalance and phase imbalance on a 16-QAM

    constellation. In practice, I/Q amplitude imbalance is expressed in the range [-5, 5] dB, whereas

    phase imbalance in the range [-30, 30] degrees.

    3.1.2 Phase noise

    The up-conversion of a baseband signal to a carrier frequency fc is performed at the

    transmitter by mixing the baseband signal with the carrier signal. Instead of generating a pure tone at

    frequency fc, i.e., ej2πfct, the generated tone is actually ej2πfct+φ(t), where φ(t) is a random phase

    noise. The phase noise introduces a rotational jitter as shown in Fig. 3.4. Phase noise is expressed

    in units of dBc/Hz, which represents the noise power relative to the carrier contained in a 1 Hz

    11

  • CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS

    -1 -0.5 0 0.5 1

    In-phase Amplitude

    -1

    -0.5

    0

    0.5

    1

    Quadra

    ture

    Am

    plit

    ude

    Input Symbols

    Reference Points

    Figure 3.4: Phase noise.

    -1 -0.5 0 0.5 1

    In-phase Amplitude

    -1

    -0.5

    0

    0.5

    1

    Quadra

    ture

    Am

    plit

    ude

    Input Symbols

    Reference Points

    Figure 3.5: Phase offset.

    bandwidth centered at a certain offset from the carrier. Typical values of phase noise level is in the

    range [−100, −48] dBc/Hz, with frequency offset in the range [20, 200] Hz.

    3.1.3 Carrier frequency and phase offset

    The performance of crystal oscillators used to generate the carrier frequency is specified

    with a certain accuracy in parts per million (ppm). The difference in transmitter and receiver carrier

    frequencies is referred to as carrier frequency offset (CFO). Due to CFO, the received signal spectrum

    shifts by a frequency offset:

    y(t) = x(t)ej2π(fTx−fRx)t = x(t)ej2π∆CFOt (3.1)

    where ∆CFO is the shift introduced by CFO between transceiver.

    Phase shift difference is defined as the phase shift from one constellation to a neighboring

    one. The uniqueness of CFO and phase offset in each transceiver pair make them excellent features

    for classification of devices. Although orthogonal frequency divison multiplexing (OFDM) uses

    different modulation techniques and each technique produces a specific constellation, most of the

    constellations share some commonalities. For example, the phase shifts from one symbol to the next

    one are created in the similar way in hardware and are transmitter dependent. Thus, for the sake

    of simplicity, we use quadrature phase shift keying (QPSK) as an example and consider features

    extracted from the constellation of QPSK as shown in 3.5. In QPSK, four symbols with different

    12

  • CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS

    -1 -0.5 0 0.5 1

    In-phase Amplitude

    -1

    -0.5

    0

    0.5

    1

    Quadra

    ture

    Am

    plit

    ude

    Input Symbols

    Reference Points

    Figure 3.6: AM/AM distortion.

    -1 -0.5 0 0.5 1

    In-phase Amplitude

    -1

    -0.5

    0

    0.5

    1

    Quadra

    ture

    Am

    plit

    ude

    Input Symbols

    Reference Points

    Figure 3.7: AM/PM distortion.

    phases are transmitted and each symbol is encoded with two bits. The phase difference between

    two consecutive symbols is ideally 90. However, the transmitter amplifiers for I-phase and Q-phase

    might be different. Consequently, the degree shift can have some variances. The constellation may

    deviate from its original position due to hardware variability, and different devices may have different

    constellations. Therefore, phase shift can be considered as a main feature.

    3.1.4 Harmonic distortions

    The harmonics in a transmitted signal are caused by nonlinearities in the transmitter-side

    amplifiers. These harmonics are unique to the transmitting device. Harmonic distortion is measured

    in terms of total harmonic distortion, which is a ratio of the sum of the powers of all harmonic

    components to the power of the fundamental frequency of the signal. This distortion is usually

    expressed in either percent or in dB relative to the fundamental component of the signal.

    3.1.5 Power amplifier distortions

    Power amplifier (PA) non-linearities mainly appear when the amplifier is operated in its

    non-linear region, i.e., close to its maximum output power, where significant compression of the

    output signal occurs. The distortions of the power amplifiers (PA) are generally modeled using

    AM/AM (amplitude to amplitude) and AM/PM (amplitude to phase) curves. If we consider a complex

    13

  • CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS

    baseband signal x(t) = a(t)ejφ(t) , the output of the PA can be written

    yPA(t) = AM(a(t))ej[φ(t)+PM(a(t))] (3.2)

    where AM(a(t)) is the AM/AM function describing the PA output amplitude as a function of the

    input signal amplitude, and PM(a(t)) is the AM/PM function describing the PA output phase as a

    function of the input signal amplitude.

    The AM/AM causes amplitude distortion whereas AM/PM introduces phase shift. As

    shown in Fig. 3.6, the corner points of the constellation have moved toward the origin due to amplifier

    gain compression. The constellation has rotated due to the AM/PM conversion in Fig. 3.7. The

    nonlinearity of amplifier is modeled using Cubic Polynomial and Hyperbolic Tangent methods using

    Third-order input intercept point (IIP3) parameter. IIP3 expressed in (dBm) represents a scalar

    specifying the third order intercept.

    3.2 Data Collection

    Figure 3.8: Data collection using SDR.

    Data collection is the first and foremost step in machine learning. The performance of

    our predictive model is based on the quality and quantity of data that is gathered, and hence data

    collection is the critical step. Our approach of deep learning applied for RF fingerprinting requires

    ample data in order for the training to be effective. In our case, we generate raw I/Q samples and

    transmit these samples over air and finally collect them at the receiver. We collect millions of samples

    14

  • CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS

    from each of the devices and generate corresponding class labels. For the purpose of data collection

    at the receiver end, we use a fixed USRP B210. For the transmitter we use 4 different devices i.e.,

    USRP B210s and X310s. Fig. 3.8 shows I/Q raw data collection using the SDR.

    3.2.1 Protocols of operation

    We transmit different physical layer frames defined by the IEEE 802.11ac and LTE

    standards (as parameters defined in technical specification 36.141) on each transmitter SDR. These

    frames are generated using the MATLAB WLAN Systems toolbox and LTE Systems toolboxes,

    which provides standard-compliant functions for the generation of the waveforms. The data frames

    generated are random since we intend to transmit any data streams. Once the waveforms are

    generated, these protocol frames are streamed to the selected SDR for transmission, considering

    separately the cases of over-the-air wireless propagation and through RF cable. The latter approach

    eliminates wireless channel effects and captures the signals as they are modified by the transmitter.

    The receiving SDR samples the incoming signals at 1.92 MS/s sampling rate at center frequency

    of 2.45 GHz for WiFi and 900 MHz for LTE. Ultimately, we study the performance of different

    learning algorithms, including linear support vector machine (SVM), logistic regression, and CNNs,

    using I/Q samples collected from an experimental setup of USRP SDRs.

    Figure 3.9: Experimental setup demonstrating data capture.

    15

  • CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS

    As shown in Fig.3.9, the host computer enabled with MATLAB WLAN and LTE toolboxes

    generates waveforms and transmits these waveforms through an X310 USRP. These waveforms are

    received through another USRP, a B210 which is connected to another computer through a high

    speed link and that has all the required MATLAB packages to receive and store the raw IQ samples.

    The workstations are equipped with typical configurations of Core-i7 processor, 8GB RAM, and

    flash-based 512GB storage. Data is collected using different B210/X310 USRPs at the transmitter

    end, but the receiver is kept fixed. Such experiments are repeated over distances starting from 2ft to

    50ft with an interval of 4ft. Overall, we collect approximately 20 million samples for each of the five

    SDRs at each distance.

    3.2.2 Storage and processing

    The samples are further analyzed offline over Northeastern’s Discovery cluster located

    at Massachusetts Green High Performance Computing Center (MGHPCC). It provides high-end

    research computing resources such as centralized high performance computing (HPC) clusters and

    storage, visualization and software. There are 30352 compute cores shared across all users. The

    method to connect to the Discovery cluster is via ssh (secure shell). Partitioning of discovery cluster

    into dedicated CPU and GPU nodes is shown in Fig.3.10 The configuration details of the nodes

    Figure 3.10: Discovery cluster partitioning.

    which we use is given below. Each of these CPU nodes have : 2x Intel(R) Xeon(R) CPU E5-2680

    v4 @ 2.40GHz with 28 physical / 56 logical cores and are equipped with 500GB RAM, whereas on

    each node GPU has 4x NVIDIA Tesla K80 boards equipped with 4992 @ 560MHz CUDA cores per

    board and 24 GB GDDR5 memory per board. These GPU servers are on a 10Gb/s TCP/IP backplane.

    The collected complex I/Q samples are partitioned into subsequences in the cluster environment

    16

  • CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS

    before passing onto the classifiers. For our experimental study, we set a fixed subsequence length of

    128, additional details of data preprocessing are provided in Chapter 4.

    3.2.2.1 Signal metadata format

    Signal Metadata Format (SigMF) is a standard way to store the signal data [14]. Deep

    learning works best when large amounts of data are available. Since, deep learning in RF domain is

    in nascent stage, sharing these datasets is important in order to reproduce the experimental results and

    provide direct access to those users who do not have direct access to the tools/equipments required to

    generate the datasets. SigMF is a method of sharing metadata descriptions of the captured signal

    data written in JSON. It stores signal data using two files:

    • JSON format text file which is made up of:• Core data namespaces: gives general file information• global : It includes information applicable to the whole recording such as description

    of the SigMF recording, hardware used to make recording, sample rate and the data

    file format

    • capture : It provides parameters of the signal capture such as the center frequency ofthe signal and the sample index at which the segment takes effect

    • annotations : It includes the signal data which is not part of the captures and globalsuch as the number of samples that each segment applies to and the frequency of the

    lower/upper edge of the feature

    • Extension namespace: used to define fields that are not in the core namespace i.e., capturedetails such as:

    • signal reference number: the sequential label for signals in a data file• the type of RF transmitter• manufacturer of the transmitter• the source of the RF signal.

    • A binary file, where IQ samples are stored as defined in in the ’datatype’ field in the metadatafile, for example ci16: complex data type of integer 16-bit data

    It is encouraged to store all signal datasets in widely accepted formats such as SigMF as a standard

    practice.

    17

  • Chapter 4

    Deep learning for RF fingerprinting

    Assume a set of multiple wireless devices placed in a room and the task for one of the

    devices is to identify rest of the devices uniquely. The identification is purely based on the inherent

    hardware characteristics of the devices, which can be used as their unique signatures. To enable

    the task of RF fingerprinting, we collected raw data from all the devices and build a model that can

    effectively perform the classification. Different learning algorithms such as SVM, logistic regression

    and CNNs are used to fit the data. Based on the preliminary results, we choose CNN to be the

    best working model compared to other conventional ML algorithms. CNN really shines when it

    comes to complex problems such as image classification, natural language processing, and speech

    recognition and from our analysis we can say that it indeed is the most suitable choice for RF

    fingerprinting as well. RF fingerprinting using CNN solves one of the major hurdles associated with

    feature engineering. Deep learning offers algorithms that learns features and hence we do not have to

    worry about specifically selecting features of interest. The major challenge that we faced with this

    approach is finding the right model that nearly perfectly fits our data. Different components of the

    CNN architecture and parameter selection, followed by hyper parameter tuning are presented in this

    chapter.

    4.1 Initial studies on ML techniques

    As part of our preliminary experiments, we started with shallow (single layer) supervised

    learning classifiers such as linear support vector machine (SVM) and logistic regression [15]. Several

    features such as amplitude, phase and FFT values along with mean, standard deviation, normalized

    phase, absolute normalized frequency components are extracted from the I/Q samples to build a rich

    18

  • CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING

    set of features to train the classifiers. The frequency components of the samples are computed using

    the FFT function in MATLAB.

    4.1.1 Support vector machines

    The SVM classifier is a supervised ML approach used for classification problem. It

    is based on finding the hyperplane that segregates any two classes. The right hyperplane is the

    one with the largest margin between the closest data point and the hyperplane. Selecting the best

    hyperplane is necessary to ensure robustness in the classification. For a dataset of points xj ∈ Rd

    and corresponding labels yj ∈ {−1, 1}, j = 1, . . . , N , the hyperplane is given by

    f(x) = x′β + b = 0 (4.1)

    The optimal hyperplane is obtained by finding β ∈ Rd and b ∈ R that minimizes

    ‖β‖22 + CN∑j=1

    ζj (4.2)

    subject to the constraints that

    yj(f(xj) + ζj) ≥ 1 (4.3)

    for all the data points (xj , yj), where ζj are the slack variables and in our experiments C is set to 1.

    We use libraries/packages to implement support vector machines. In order to best fit the data with

    hyperplane, we use Linear SVC (Support Vector Classifier). Python offers huge set of libraries and

    for our training we use sklearn library which provides functions like LinearSVC (linear kernel) to

    perform classification. The features which are already computed are fed into this classifier along

    with label information to predict the performance of SVM. We chose LinearSVC since it offers

    flexibility in choosing parameters. We also used squared hinge loss function and `2 regularizer to

    prevent overfitting. These parameters helps in better scaling to larger dataset and faster convergence.

    4.1.2 Logistic regression

    Logistic regression is another supervised learning algorithm which transforms its output

    using the logistic sigmoid function. This is the core of logistic regression which squashes any real

    values into values between 0 and 1. Each of the returned probability value can then be mapped to

    two or more discrete classes. Logistic regression can be thought of as a single-neuron dense neural

    19

  • CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING

    network. In logistic regression, again yj is binary in {−1,+1} and

    P(yi = +1) = σ(β′xj + b) =1

    1 + e−(β′xj+b)

    (4.4)

    We use scikit library to train the model in Python. This algorithm learns the regression variables β

    and b that minimizes the squared error between each yj and ŷj . Cross entropy is the loss function

    used for this learning. Overfitting is handled using `2 regularizer. New data points x are classified

    based on σ(β′x + b). Classification is performed on three different datasets. First task is to classify

    devices that operate on WiFi and the second task is to identify devices that use LTE. Last one being

    the data combined from devices that use both WiFi and LTE. For each of these cases, data is divided

    into three parts namely, training, validation and testing sets. Ultimately, the performance of the

    aforementioned classifier is measured on the testing data which is not seen by the trained model.

    Fig. 4.1 provides the accuracy obtained through cross validation for the validation data using both

    SVM and logistic regression. Results were obtained for various combination of devices over air

    for both WiFi and LTE respectively. In Fig. 4.1, we also report the accuracy in performance for

    WiFi LTE WiFi-LTE40

    50

    60

    70

    80

    90

    100

    % A

    ccur

    acy

    Logistic Regression B210-B210Linear SVM B210-B210Logistic Regression B210-X310Linear SVM B210-X310

    Figure 4.1: Device classification using Logistic Regression and Linear SVM for WiFi and LTE.

    identifying different protocols. Being able to detect a protocol considerably reduces the number of

    feasible constellations supported by the protocol, which in turn influences the constellation type and

    20

  • CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING

    structure. One important thing to note is that SVM and logistic regression are both able to achieve

    high accuracies (≈ 90%) for the simpler task of protocol detection compared to device recognitionaccuracy of less than ≈ 60%.

    4.2 Convolutional neural networks

    Convolutional neural networks (ConvNets or CNNs) are a category of neural networks

    that have found to be very effective in areas such as image recognition and classification [16] The

    success of CNNs in recognizing faces, things, speech domain as well as empowering vision in robots

    motivates our investigation in using these networks for radio fingerprinting. Our first challenge was

    to understand, what these neural networks are made up of and how they can be used to achieve our

    task. An Artificial Neural Network (ANN) is a model inspired by the neurons in the human brain.

    The computation of ANN is similar to that of brain functioning, with neuron being the basic unit

    of computation. Each neuron receives input from either an external source or from other neurons.

    The input passed into the neuron is associated with weight, which is assigned based on the input’s

    relative importance with other inputs. The neurons apply a non linear function namely activation

    function to the entire weighted sum of inputs and ultimately computes an output. It is important to

    use an activation function since most of the data is non linear and activation function introduces non

    linearity to the neuron’s output [17]. The three most important activation functions are:

    • Sigmoid : It takes the input and maps into a value between 0 and 1• Relu : Rectified Linear Unit takes input and replaces negative values with zero. This is done

    by finding maximum value between the input and zero

    • tanh : It takes the input and maps into a value in the range of [-1,1]A neural network is made up of input layer and multiple interconnected neurons in the middle layers

    called as hidden layers followed by output layer. The CNNs are similar to ordinary neural networks

    but are made up of multiple hidden layers and fully connected layers. Additionally, CNNs slides

    a filter across the input dimensions, with the filter’s weights being shared across all the slides in

    that particular layer. This results in lesser parameters as compared to far more parameters in regular

    neural networks.

    21

  • CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING

    4.2.1 CNN architecture

    The proposed method consists of two stages, i.e., a training stage and an identification

    stage. In the former, the CNN is trained using raw IQ samples collected from each SDR transmitter

    to solve a multi-class classification problem. In the identification stage, raw I/Q samples of the

    unknown transmitter are fed to the trained neural network and the transmitter is identified based

    on observed value at the output layer. In this section, we first describe the CNN architecture and

    then present preprocessing of input data necessary to improve the performance. There exists several

    CNN architectures namely LeNet, ResNets, AlexNet, GoogleNet, VGGNet, ZFNet, DenseNet. Our

    CNN architecture is inspired in part by AlexNet [18], which shows remarkable performance in

    image recognition. As shown in the Fig. 4.2, our network has four layers, which consists of two

    convolutional layers and two fully connected or dense layers. Our goal is to first understand how the

    Figure 4.2: CNN architecture for RF fingerprinting.

    layers are stacked and the functional operation of the layer components. The most difficult challenge

    in building the CNN network is to find how many layers to use, how many filters/kernels in each

    layer, what the filter sizes, values for padding and stride should be. None of these are standard and

    22

  • CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING

    the complexity of the network depends on the type of data and its processing. A lot of effort was

    spent on experimenting with different parameters and ultimately finding the right combination of

    these hyperparameters that generalizes our data well.

    We describe various CNN components and hyperparameters in detail in this chapter. The

    input to the CNN is a windowed sequence of raw I/Q samples with length 128. Each complex value is

    represented as two-dimensional real values, which results in the dimension of our input data growing

    to 2× 128. This is then fed to the first convolution layer.

    4.2.1.1 Convolution layer

    The convolution layer is the core building block of the CNN, whose primary purpose is

    to extract features from the input data. It consists of a set of spatial filters (also called kernels, or

    simply filters) that perform a convolution operation over input data. The operation of the convolution

    filter is shown with an example in Fig. 4.3 for intuitive understanding. A filter of size 2 × 2 is

    Figure 4.3: Convolution operation: filters strided over input sequences.

    convolved with input data of size 4 × 4 by sliding across its dimension. This convolution meanscomputing the element wise multiplication between input matrix and the filter matrix and then sum

    all the multiplication outputs that produces a single value in the output matrix. Such convolution is

    performed over the entire input to produce a two-dimensional feature map/activation map. The next

    hyperparameter is called stride, which controls how the filter convolves around the input data. In the

    Fig. 4.3, we set stride to 1, i.e., the filter convolves around the entire input matrix by shifting one

    value at a time. In generic, stride is the sliding interval of the filter and determines the dimension

    of the feature map. Our example produces a feature map of dimension 3 × 3 at the end of the

    23

  • CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING

    convolution. In our architecture, each convolution layer consists of a set of such filters, which in turn

    operates independently to produce a set of two-dimensional feature maps.

    4.2.1.2 ReLU activation

    Convolution is a linear operation which involves basic element wise multiplication and

    additions. Therefore to introduce non linearity to the system ReLU (Rectified Linear Units) layers

    are used after each of the convolution layers. Their main function is to perform a pre-determined

    non-linear transformation on each element of the feature map. There are many possible activation

    functions, such as sigmoid and tanh; we use the ReLU function, as CNNs with ReLU train faster

    compared to alternatives with greater computational efficiency. It also reduces the vanishing gradient

    problem, where network training becomes slower because the gradients reduces exponentially to

    minimal values close to zero. Mathematically it is expressed as:

    f(x) = max(0, x) (4.5)

    Figure 4.4: Rectified Linear Unit (ReLU) operation performed on feature maps.

    As shown in Fig. 4.4, ReLU outputs max(x, 0) for an input x, replacing all negative

    activations in the feature map by zero.

    4.2.1.3 Pooling layers

    The convolution layer is generally followed by a pooling layer. Its functionality is to (a)

    introduce shift invariance (as well as (b) reduce the dimensionality of the rectified feature maps of the

    preceding convolution layer, while retaining the most important information. We choose a pooling

    24

  • CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING

    layer with filters of size 2× 2 and stride 2, which downsamples the feature maps by 2 along boththe dimensions. Among different filter operations (such as average, sum), max pooling gives better

    performance. As shown in Fig. 4.5, max pooling of size 2× 2 with stride 2 selects the maximumelement in the non-overlapping regions (shown with different colors). We apply pooling operation

    separately onto each of the feature maps. Thus, it reduces the dimensionality of the feature maps,

    Figure 4.5: An illustration of max pooling operation.

    which in turn reduces the number parameters and computations in the network and control overfitting.

    Additionally it makes network invariant to any sort of transformations in the input data.

    4.2.1.4 Fully connected layers

    A fully connected or dense layer is a traditional Multi Layer Perceptron (MLP), where

    the neurons have full connections to all activation steps in the previous layer, similar to regular

    neural networks. The output of the second pooling layer is provided as input to the fully connected

    layer. Its primary purpose is to perform the classification task on high-level features extracted from

    the preceding convolution layers. At the output layer, a softmax activation function is used. The

    classifer with softmax activation function gives probabilities (e.g. [0.9, 0.09, 0.01] for three class

    labels), i.e., it ensures the sum of the probabilities from the fully connected layer is 1. To sum it

    up, the convolution, pooling layers function as feature extractors from the input data while the fully

    connected layers (dense layers) perform the classification based on these features. The network

    architecture for our RF fingerprinting is shown in Table 4.1.

    Next, we discuss the selection of hyperparameters of CNN to optimize the performance,

    followed by preprocessing of input data necessary for proper operation of CNN and finally shift-

    25

  • CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING

    Table 4.1: CNN architecture.

    Layer Output dimensions

    Input 2*128Conv1 50*128Conv2 50*128

    FC/ReLU 256FC/ReLU 80

    FC/Softmax 4

    invariance property of our classifier.

    4.2.1.5 Model selection

    We start with a baseline architecture consisting of two convolution layers and two dense

    layers, then progressively vary the hyperparameters to analyze their effect on the performance. The

    first parameter is the number of filters in the convolutional layers. We observed that the number of

    filters within a range of (30 − 256) provide reasonably similar performance. However, since thenumber of computations increases with an increase in the number of filters, we set 50 filters in both

    convolution layers for balancing the performance and computational cost. Similarly, we set 1× 7 and2×7 as the filter size in the first and second convolution layer respectively, since larger filter size doesnot offer significant performance improvement. Furthermore, increasing the number of convolution

    layers from 2 to 4 shows no improvement in the performance, which justifies continuation with two

    convolution layers. We then try to analyze the effect of the number of neurons in the first dense

    layer by varying it between 64 to 1024. Interestingly, we find that increasing the number of neurons

    beyond 256 does not improve the performance. Therefore, we set 256 neurons in the first dense

    layer. In all of these parameters selection, we observe that having a single fully connected layer or

    increasing the number of neurons to as large as 1024, increases the model complexity and makes the

    training slower. Overfitting is one of the major problems during network training, during which the

    network weights gets tuned so well to the training examples while the network fails to perform well

    when given the unseen data. Thus we need to take measures to alleviate the problem of overfitting.

    We use dropout layer whose main function it to drop out a set of activations in that specific layer by

    setting them to zero. By doing this, we can make our network robust and ensure that it does not get

    too fitted to the training data. After finalizing the architecture and parameters of CNN, we carefully

    26

  • CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING

    select the regularization parameters as follows: We use a dropout rate of 50% at dense layers. In

    addition, we use an `2 regularization parameter λ = 0.0001 to avoid overfitting.

    4.2.1.6 Preprocessing data

    Our experimental studies conducted on different representative classes of ML algorithms

    demonstrate significant performance improvement by choosing deep CNN. However, to ensure

    scalable performance over large number of devices, our CNN architecture needs to be modified. In

    addition, our input I/Q sequences, which represent a time-trace of collected samples, need to be

    suitably partitioned and augmented beyond a stream of raw I/Q samples. Our classifiers operate

    on sequences of I/Q samples of a fixed length. In general, given sequences of length L, we can

    create N = L/` subsequences of length ` by partitioning the input stream. We thus create L − `subsequences by sliding a window of length ` over the larger sequence (or stream) of I/Q samples.

    Training classifiers over small subsequences leads to more training data points, which in turn yields a

    low variance but potentially high bias in the classification result. Conversely, large sequences may

    lead to high variance and low bias. We set 128 as sequence length. From a wireless communications

    viewpoint, the channel remains invariant in smaller durations of time. Hence, the ability to operate

    on smaller subsequences carved out of in-order received samples allows us to estimate the complex

    coefficients representing the wireless channel. Thus we train our classifiers over the input I/Q

    sequences by treating each real and imaginary part of a sample as two inputs, leading to a training

    vector of 2× ` samples for a sequence of length `.

    4.2.1.7 Shift invariance

    Another prominent characteristic of our CNN classifier both with respect to our final goal

    of identifying the transmitting device, but also in terms of feature extraction, is shift invariance. In

    short, all events like I/Q imbalance, phase noise, carrier frequency and phase offset, and nonlinearity

    of power amplifier and harmonic distortions can occur at an arbitrary position in a given I/Q sequence.

    A classifier should be able to detect a device-specific impairment irrespectively of whether it occurs

    at e.g., the 1-st or 15-th position of an I/Q sequence. Convolved weights in each layer detect signals

    in arbitrary positions in the sequence, and a max-pool layer passes the presence of a signal to a higher

    layer irrespectively of where it occurs.

    To enhance the shift-invariance property of our classifier during training, we train it over sliding

    windows of length ` as shown in Fig. 4.6, rather than partitioned windows: this further biases the

    27

  • CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING

    trained classifiers to shift-invariant configurations. In our initial experiments, we verified the efficacy

    1 2 ⋯ 128 129 ⋯ ⋯ N

    1 ⋯ 128 129 ⋯ 256 ⋯ ⋯ ⋯ M

    Slidingoperation

    IQsamples aftersliding

    Figure 4.6: An illustration of sliding operation using a window of length 128.

    of using sliding window by comparing the performance of our CNN with data preprocessed using

    partitioned windows. We observed an improved performance with the usage of sliding window.

    Finally, since deep learning performs well with large data, it was evident from our analysis that

    sliding window is an efficient way for data augmentation.

    28

  • Chapter 5

    Results and performance evaluation

    5.1 Network setup

    The performance of the CNN architecture for RF fingerprinting is analyzed on the raw

    IQ samples collected from USRPs. We use MATLAB as the host based software to interact with

    the USRP radios. Once the data is collected at the receiver end, the samples are first partitioned

    into subsequences in the Northeastern’s Discovery cluster. The details of the software packages and

    structuring is shown in the figure 5.1. The core software implementation is in Python language. It

    is easy to read and write than other programming languages. It offers a wide varieties of standard

    libraries and built-in functions. In addition to it, there are many third-party open source libraries

    offering high end modules for a wide range of applications. The compute nodes in the discovery

    cluster are equipped with CUDA, a parallel computing platform and programming model by NVIDIA

    for the computing purpose on the graphical processing units (GPUs). We implement our CNN

    training and classifier in Keras which is a model level library that provides building blocks for

    deep learning [19]. In the backend, we use TensorFlow library, a specialized well optimized tensor

    manipulation library for high dimensional matrix operations. We install these packages in the open

    source distribution of Python, namely Anaconda which is specifically used for machine learning

    related applications. It eases package, environment management and deployment. All of these

    packages are installed on a NVIDIA Cuda enabled Tesla K80m GPU, which is our platform for

    training and evaluation.

    29

  • CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION

    Figure 5.1: Software stack.

    5.2 Evaluation

    Our CNN implementation has a network depth of 5 layers, with 50 filters in layers 1

    and 2, 256 neurons in layer 3, 80 neurons in layer 4 and the final classifier with 4 neurons. Each

    convolution layer is followed by a max pooling layer with pool size 2. We calculate total error at

    the output neurons and propagate these error back through the network using back propagation to

    calculate gradients.Thus the important function during training is finding right set of weights that

    fit our data well and classifies devices correctly by reducing the error at the output layer. This is

    done using optimizers, whose basic purpose is to update weights using gradients. In our network

    we use Adam, an optimization method which is well suited for problems that are large in terms of

    data and parameters. Here, we should also consider another parameter called learning rate, which

    decides by how much we update the network weights by their gradients. It is important to note that,

    the learning rate should be chosen carefully. If it is too high, the network learns faster at the cost

    of divergence and never reach a global minimum. On the other hand, if the learning rate is too low,

    30

  • CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION

    then the network learns too slow and may take days together to converge. Optimizer goes hand in

    hand with learning rate and decides how it should use current weight gradients along with previous

    weight gradients to decide the learning rate. Adam optimizer uses the gradients to find an adaptable

    learning rate for each individual weight (parameter) unlike stochastic gradient descent where a single

    learning rate is set for all weight updates i.e., the learning rate doesn’t change during training. The

    next parameter is called Batch size, which defines the total number of examples taken from a dataset

    at once to perform optimization. The choice of these parameters is done by progressively varying

    their values to analyze the effect on the performance. The following steps summarizes the network

    training process:

    • The first step is the initialization of filters and weights. it is done using the Glorot uniforminitializer, also called Xavier uniform initializer. It draws samples from a uniform distribution

    within [-limit, limit] where

    limit =√

    6/(fan in+ fan out) (5.1)

    where fan in is the number of input units in the weight tensor and fan out is the number of

    output units in the weight tensor.

    • Training data is passed as input to the network and it goes through forward propagation step(convolutional, Relu and pooling operations along with fully connected layers) to find the

    output probabilities for each of the classes.

    • Total error is calculated at the output layer using categorical cross entropy loss function whichinternally uses softmax function.

    • Gradients of the error is calculated w.r.t all the weights in the network and Adam optimizerupdates all filter weights and parameters to minimize the output error.

    Finally, we evaluate the performance of our CNN algorithm using k-fold Cross Validation

    technique. The value of k is set to 5 in our case. It is done by splitting the training dataset into 5

    folds and takes turns training models on all the folds except one which is held out. This is followed

    by evaluating model performance on the hold out dataset. The same process is repeated until each

    of the subfolds becomes a part of the hold out dataset. Thus we can measure the trained model’s

    performance on the unseen data and avoid overfitting. This leads us to obtain less biased estimate

    of the performance of our model. We have used StratifiedKFold class from the scikit-learn Python

    31

  • CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION

    machine learning library to split up the training dataset into 5 folds. Our training set consists of

    ≈ 720K training examples and ≈ 80K examples for validation. We use another 200K examples fortesting the performance of our trained model. We also represent the class labels associated with the

    devices as binary vectors since classification works better when the categorical variables are mapped

    into binary values. This ensures equal importance is given to all the devices. It took ≈ 23min totrain our model. Performance evaluation on hold out dataset of 200K examples took only ≈ 2min.There exists several metrics to evaluate the model performance, of which accuracy which gives the

    proportion of correct classifications among all the classifications is not a good measure. This is

    because if the data is imbalanced the output predictions may show that every instance belongs to the

    majority class (99%). Hence we do not solely rely on accuracy but use better metrics such as Area

    Under the Curve (AUC), which is evaluated on the Receiver Operating Characteristic (ROC) curve

    comprising true positive rate on the Y-axis and false positive rate on the X-axis.

    5.2.1 CNN vs. conventional algorithms

    We first measure the performance of our WiFi dataset using SVM and logistic regression

    for the classification of nominally similar devices. We extract several features such as amplitude,

    2 3 4 5

    Number of transmitters

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    Accu

    racy (

    %)

    SVM Logistic Regression CNN

    Figure 5.2: The accuracy comparison of SVM, logistic regression and CNN for 2− 5 devices.

    phase and FFT values along with mean, standard deviation, normalized phase, absolute normalized

    frequency components from the raw I/Q samples and built a rich set of features to train the classifiers.

    We obtain the classification accuracy for identification among 2, 3, 4 and 5 devices. As seen in

    32

  • CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION

    Fig. 5.2, accuracy measure with SVM and logistic regression algorithms for 2 devices is ≈ 55% andit decreases further as the number of devices increases. The performance deterioration can be clearly

    seen in the Fig. 5.2.

    We then train our CNN classifier using raw data to classify the same set of devices. With

    our deep CNN network, we are able to achieve accuracy 98% for five devices, as opposed to less

    than ≈ 33% for the shallow learning SVM and logistic regression algorithms.

    5.2.2 Receiver operating characteristics for radio fingerprinting

    We obtained false positive rate and true positive rate to measure AUC. Figs. 5.3, 5.4, 5.5, 5.6

    and 5.7 shows the ROC curve for the classification of four similar WiFi devices, for each folds of

    cross validation. We can see that the CNN model works extremely well, as AUC ranges between

    0.93 and 1. The AUC attained for each device is 0.964, 0.936, 1, and 0.994, respectively as shown in

    Fig. 5.3. This demonstrates that CNN is the effective model for radio fingerprinting. Additionally,

    training our CNN network over a large dataset with Keras takes significantly lower time compared to

    any other aforementioned algorithms. To demonstrate this, Fig. 5.8 shows computational load for

    training, scaled as a function of the number of training examples and estimated time for every epoch

    on average. Clearly performance with GPU is faster than the CPU.

    0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    True

    Pos

    itive

    Rate

    Receiver operating characteristic example

    B210 #1 (area = 0.96402)B210 #2 (area = 0.93601)B210 #3 (area = 1.00000)X310 #1 (area = 0.99461)

    Figure 5.3: ROC curve fold1.

    33

  • CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION

    0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    True

    Pos

    itive

    Rat

    e

    Receiver operating characteristic example

    B210 #1 (area = 0.96194)B210 #2 (area = 0.93165)B210 #3 (area = 1.00000)X310 #1 (area = 0.99391)

    Figure 5.4: ROC curve fold 2.

    0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    True

    Pos

    itive

    Rat

    e

    Receiver operating characteristic example

    B210 #1 (area = 0.96378)B210 #2 (area = 0.93489)B210 #3 (area = 1.00000)X310 #1 (area = 0.99431)

    Figure 5.5: ROC curve fold 3.

    34

  • CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION

    0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    True

    Pos

    itive

    Rat

    e

    Receiver operating characteristic example

    B210 #1 (area = 0.96502)B210 #2 (area = 0.93441)B210 #3 (area = 1.00000)X310 #1 (area = 0.99504)

    Figure 5.6: ROC curve fold 4.

    0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    True Positive Rate

    Receiver operating characteristic example

    B210 #1 (area = 0.96418)B210 #2 (area = 0.93030)B210 #3 (area = 1.00000)X310 #1 (area = 0.99475)

    Figure 5.7: ROC curve fold 5.

    5.2.3 Impact of distance on radio fingerprinting

    We run experiments to collect data over a distance ranging between 2-50 ft over steps

    of 4 ft, to evaluate the impact of distance (and possible multipath effect owing to reflections) on

    35

  • CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION

    Figure 5.8: Computational load.

    0 2 6 10 14 18 22 26 30 34 38 42 46 50

    Distance (ft)

    60

    65

    70

    75

    80

    85

    90

    95

    100

    Accu

    racy (

    %)

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    SN

    R (

    dB

    )

    Accuracy (%)

    Observed SNR in dB

    Analytical SNR in dB

    Figure 5.9: The plot of accuracy obtained using CNN for 4 devices over different distances between

    transmitter and receiver.

    classification accuracy. Fig. 5.9 demonstrates the accuracy measure for the classification of 4 devices

    using CNN. It achieves classification accuracy greater than 95% up to the distance of 34ft. In addition,

    the observed SNR and analytical SNR (calculated using free-space path model) are shown in the

    same plot to elucidate the effect of received SNR on the classification accuracy. It is evident that the

    classification is robust against the fluctuations in SNR occurred due to path loss and multipath fading

    36

  • CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION

    up to the distance of 34ft.

    37

  • Chapter 6

    Conclusion

    With the increase in the demand for high data rate applications and the advancement in the

    IoT space enabling millions of devices to stay interconnected, wireless security has become one of

    the crucial functionality. In addition to this, the available spectrum is limited to support enormous

    amount of mobile devices. Therefore, the need for novel techniques for identifying devices and

    hence detect malicious activity and gain spectrum awareness is of great importance. Existing device

    fingerprinting approaches require feature engineering and are not efficient enough to train large

    datasets. We propose a radio fingerprinting approach based on deep learning CNN architecture to

    train using I/Q sequence examples. Our design enables learning features embedded in the signal

    transformations of wireless transmitters, and identifies specific devices. Furthermore, we have shown

    that our approach of device identification with CNN outperforms alternate ML techniques such

    as SVM, logistic regression for the identification of four nominally similar devices. Finally, we

    experimentally validate the performance of our design on a dataset collected over range of distances,

    2 ft to 50 ft. We observe that detection accuracy decreases as the distance between transmitter and

    receiver increases. We also show how computational resources such as Keras running with GPU

    support speed up the training time. Our future work involves increasing the robustness of the CNN

    architecture to allow scaling up to correct identification of 1000s of similar radios.

    6.1 Research challenges

    We now summarize the challenges associated with the implementation of CNNs for radio

    fingerprinting. In our experiments, we set the partition length as 128 through a rectangular windowing

    process. However, identifying the optimal length is a critical research objective and should be depen-

    38

  • CHAPTER 6. CONCLUSION

    dent on the channel coherence time. Varied CNN architectures may lead to significantly different

    results. Finding an optimal architecture which enhances device classification is an open research

    issue. A related challenge is obtaining the right balance between training time and the classification

    accuracy. Increasing the depth of the CNN beyond a point may not help the classification; in fact

    there are risks of overfitting the training set, as we found in some of our early experiments. Our

    work focuses on training the model with actual experimental data while a large body of earlier

    works attempt to solve a similar problem using synthetic data. There exists no standard dataset to

    benchmark the performance of our classifier, and releasing all datasets in widely accepted formats

    such as SigMF is essential for correct replication of experiments. Our classifier performs very well

    on limited set of devices, however to identify large number of devices (1000s) and also at wide

    distances of 100-200 ft, it may require us to effect major changes in the architecture and find new

    optimum parameters. Additionally, the effects of wireless channel conditions on the classification

    accuracy is yet to be studied. It is important to note that our technique relies on the fact that devices

    can be identified uniquely based on their hardware imperfections. This leads to a wide scope to

    determine the kind of features that can be learnt in the wireless domain.

    39

  • Bibliography

    [1] J. Mitola, “Software radio architecture: a mathematical perspective,” IEEE Journal on Selected

    Areas in Communications, vol. 17, no. 4, pp. 514–538, Apr 1999.

    [2] T. J. O’Shea and J. Corgan, “Convolutional radio modulation recognition networks,” CoRR, vol.

    abs/1602.04105, 2016. [Online]. Available: http://arxiv.org/abs/1602.04105

    [3] N. E. West and T. O’Shea, “Deep architectures for modulation recognition,” in 2017 IEEE

    International Symposium on Dynamic Spectrum Access Networks (DySPAN), March 2017, pp.

    1–6.

    [4] Q. Xu, R. Zheng, W. Saad, and Z. Han, “Device fingerprinting in wireless networks: Challenges

    and opportunities,” IEEE Communications Surveys Tutorials, vol. 18, no. 1, pp. 94–104,

    Firstquarter 2016.

    [5] J. Franklin, D. McCoy, P. Tabriz, V. Neagoe, J. Van Randwyk, and D. Sicker, “Passive data link

    layer 802.11 wireless device driver fingerprinting,” in Proceedings of the 15th Conference on

    USENIX Security Symposium - Volume 15, ser. USENIX-SS’06. Berkeley, CA, USA: USENIX

    Association, 2006. [Online]. Available: http://dl.acm.org/citation.cfm?id=1267336.1267348

    [6] K. Gao, C. Corbett, and R. Beyah, “A passive approach to wireless device fingerprinting,” in

    2010 IEEE/IFIP International Conference on Dependable Systems Networks (DSN), June 2010,

    pp. 383–392.

    [7] I. O. Kennedy, P. Scanlon, F. J. Mullany, M. M. Buddhikot, K. E. Nolan, and T. W. Rondeau,

    “Radio transmitter fingerprinting: A steady state frequency domain approach,” in 2008 IEEE

    68th Vehicular Technology Conference, Sept 2008, pp. 1–5.

    [8] V. Brik, S. Banerjee, M. Gruteser, and S. Oh, “Wireless device identification with radiometric

    signatures,” in Proceedings of the 14th ACM International Conference on Mobile Computing

    40

    http://arxiv.org/abs/1602.04105http://dl.acm.org/citation.cfm?id=1267336.1267348

  • BIBLIOGRAPHY

    and Networking, ser. MobiCom ’08. New York, NY, USA: ACM, 2008, pp. 116–127.

    [Online]. Available: http://doi.acm.org/10.1145/1409944.1409959

    [9] S. V. Radhakrishnan, A. S. Uluagac, and R. Beyah, “Gtid: A technique for physical device and

    device type fingerprinting,” IEEE Transactions on Dependable and Secure Computing, vol. 12,

    no. 5, pp. 519–532, Sept 2015.

    [10] T. J. O’Shea and J. Hoydis, “An introduction to machine learning communications systems,”

    CoRR, vol. abs/1702.00832, 2017. [Online]. Available: http://arxiv.org/abs/1702.00832

    [11] F. Chen, Q. Yan, C. Shahriar, C. Lu, W. Lou, and T. C. Clancy, “On passive wireless device

    fingerprinting using infinite hidden markov random field,” submitted for publication.

    [12] N. T. Nguyen, G. Zheng, Z. Han, and R. Zheng, “Device fingerprinting to enhance wireless

    security using nonparametric bayesian method,” in 2011 Proceedings IEEE INFOCOM, April

    2011, pp. 1404–1412.

    [13] S. U. Rehman, K. Sowerby, and C. Coghill, “Analysis of receiver front end on the performance

    of rf fingerprinting,” in 2012 IEEE 23rd International Symposium on Personal, Indoor and

    Mobile Radio Communications - (PIMRC), Sept 2012, pp. 2494–2499.

    [14] The signal metadata format specification. [Online]. Available: https://github.com/gnuradio/

    SigMF

    [15] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics).

    Springer, 2006.

    [16] Cs231n convolutional neural networks for visual recognition. [Online]. Available:

    http://cs231n.github.io/convolutional-networks/

    [17] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, ser. Springer

    Series in Statistics. New York, NY, USA: Springer New York Inc., 2001.

    [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional

    neural networks,” in Proceedings of the 25th International Conference on Neural Information

    Processing Systems - Volume 1, ser. NIPS’12. USA: Curran Associates Inc., 2012, pp.

    1097–1105. [Online]. Available: http://dl.acm.org/citation.cfm?id=2999134.2999257

    [19] Keras: The python deep learning library. [Online]. Available: https://keras.io/

    41

    http://doi.acm.org/10.1145/1409944.1409959http://arxiv.org/abs/1702.00832https://github.com/gnuradio/SigMFhttps://github.com/gnuradio/SigMFhttp://cs231n.github.io/convolutional-networks/http://dl.acm.org/citation.cfm?id=2999134.2999257https://keras.io/

    CoverTable of ContentsList of FiguresList of TablesAcknowledgmentsAbstract of the Thesis1 Introduction2 Related work2.0.1 Supervised learning2.0.2 Unsupervised learning

    3 Causes of hardware impairments3.1 Hardware impairments3.1.1 I/Q imbalance:3.1.2 Phase noise3.1.3 Carrier frequency and phase offset3.1.4 Harmonic distortions3.1.5 Power amplifier distortions

    3.2 Data Collection3.2.1 Protocols of operation3.2.2 Storage and processing

    4 Deep learning for RF fingerprinting4.1 Initial studies on ML techniques4.1.1 Support vector machines4.1.2 Logistic regression

    4.2 Convolutional neural networks4.2.1 CNN architecture

    5 Results and performance evaluation5.1 Network setup5.2 Evaluation5.2.1 CNN vs. conventional algorithms5.2.2 Receiver operating characteristics for radio fingerprinting5.2.3 Impact of distance on radio fingerprinting

    6 Conclusion6.1 Research challenges

    Bibliography