continuous multimodal mobile authentication: an exploratory ...tjneal/projects/reu-2019.pdfare using...

Continuous Multimodal Mobile Authentication: AnExploratory Study

Matthew SumpterUniversity of South Florida

Valesia DavisSavannah State University

in association withUniversity of South Florida

Dr. Shaun CanavanUniversity of South Florida

Dr. Tempestt NealUniversity of South Florida

Abstract—In recent years, an increasing number of peopleare using mobile devices for secure and private applications,leading researchers and developers to search for advancedauthentication methodologies. Continuous authentication usingbiometrics, while promising, has yet to provide concrete andreliable implementations for consistently identifying a user ina non-obtrusive and efficient way. This study began developingstrategies for preprocessing an extensive open-source dataset toprovide opportunities for testing different modalities, networks,and fusion techniques. Initial authentication experiments usingfacial recognition proved promising despite flawed data, whilethe combined modalities of motion and touch events proved torequire more nuanced treatment. The methods and findings ofthese experiments serve as a foundation, and provide suggestionsfor future research into the most efficient process to authenticatethe identity of mobile device users.

Index Terms—authentication, biometrics, identification

I. INTRODUCTION

In the advancing age of technology, an increasing numberof people are using mobile devices for secure and privateapplications. Online banking and brokering, along with rapidand efficient communication are great conveniences, howeverthey come with a strong risk to security. To ensure securetransmission and storage of user information on mobile de-vices, there have been three main schemas for authenticating ausers identity. These schemas are based on traits an individualalone may possess, namely (1) knowing something (i.e. apassword or PIN), (2) having something (i.e. a hardwarekey), or (3) being someone (i.e. fingerprint uniqueness) [1].Many of the contemporary paradigms for authentication sufferfrom antiquity, as they were developed for desktop computers,which are not easily shared or lost [2]. Mobile devices raisenew concerns, regarding both their form factor and the usersbehavior. They are more easily lost and shared with multiplepeople and tend to be used for shorter periods of time.Due to this high turnover of use, consistent authenticationis inconvenient. When a user feels inconvenienced, it leadsto a significant decrease in mobile security: 34% of usersdo not use passwords, citing them as being time-consuming,cumbersome, or unnecessary. Many who do use passwordsuse weak passwords, which are easier to remember, but alsoaccount for 76% of all attacks on smartphones [2] [3] [4].

Ultimately, relying on the users knowledge or possessionfor authentication although effective places the burden of

proof on the user. One possible solution to these challengesis through classifying the unique traits an individual possessesknown as biometrics. Using biometrics, researchers have de-veloped a new paradigm continuous authentication whichinvolves continuously probing the physical traits and behaviorof the user to identify them. This also works as a passivemethod the burden of identification lies on the developer,rather than the user all while improving both security andusability. The problem now becomes discovering which datacan most reliably identify an individual, and how best to gatherand process said data.

In this paper, we discuss an experimental approach ana-lyzing biometric recognition of 48 subjects using face andsmartphone sensor data (Section 3). We utilize face andsensor data from a publicly available database collected ona continuous basis. This dataset presents several challenges,including variations in pose and lighting in the images andmissing sensor data at different instances. We trained twodeep neural networks (convolutional for images and longshort-term memory for sensor data) for learning the bestfeatures representative of each individual (Section 4). Wethen tested our networks using n-fold cross-validation. Weachieved good identification accuracy using the face, whileresults show poorer performance on images with extreme intra-class variations. However, results were poor for sensor data,indicating more work is necessary for optimizing our network.

II. PRIOR RESEARCH

Biometric recognition is a field of research currently ex-periencing rapid change and advancement. Modern mobilephones have an increasing number of sensors, including high-resolution cameras, fingerprint scanners, accelerometers, am-bient light sensors, gyroscopes, and GPS, to name a few.Between the growing Internet of Things and the increasingnumber of sensors packed into mobile devices, there is anexpanding wealth of potential data to collect an individualsappearance, actions, and behavior.

A. Biometric modalities

Using mobile sensors and machine learning methods, anumber of biometric modalities have been explored in researchin an attempt to identify individuals, however they all suffer

from their unique pitfalls and complications. The most widely-available and researched modalities include [5] [6]:

1) Application and Phone Usage: The way an individualuses their phone daily is a strong behavioral indicator. Metricssuch as battery life, application preference, and typical active-session times are easily captured and available. This methodhas proven to have a relative measure of accuracy in identi-fying a user, however it requires a lot of data collection todetermine user behavior, and a shared phone could result in aflawed data set [7]. Location data, such as GPS location, cellphone towers pinged, and WIFI/Bluetooth devices accessedcan reveal insight into a users patterns of daily behavior,however it is not a totally unique identifier, as many people- especially those who may be the most likely to access theusers device, i.e. coworkers, family, etc. - will share similarlocations and access points as the intended individual.

2) Image/Facial Recognition: Image and facial recognitionis one of the most popular and widely used of biometricidentifiers. A myriad of data can be collected from an image ofa user including age, gender, background, location, and mostimportantly, physical attributes. Although facial recognitionis very successful at identification, collecting and processingimages passively presents many problems. The front-facingcamera on mobile devices is often a lower-resolution than therear-facing camera. While pictures are being taken passively,there is no enforcement on obstructions, such as bad angles,poor lighting, or physical obstructions such as new facial hairor a scarf worn on a cold day. Additionally, image capture andprocessing is one of the more resource-consuming operationsin biometric recognition [8].

3) Touch/swipe data: Most modern users interact withmobile devices through touch interfaces. The sensors thatallow for these interactions to occur can be harnessed to gatherinformation on how the user uses the phone. A large variety ofraw data can be collected from user touch, including screencoordinates, touch pressure, event start and end times, etc.Biometrics extracted from taps, keystrokes, handwriting, andswipes have been explored in research as a unique identifier.Swipes, for reference, have been shown to carry a high levelof uniqueness among users, with Equal Error Rates (EER) aslow as 10% [9]. Despite the relative success of this method,issues arise from the volume and variety of data that mustbe collected on a user to successfully identify their swipebehavior in different circumstances, as well as the risk thata shared phone will result in flawed data.

4) Gyroscopic/Accelerometer data: Modern phones comewith sensitive gyroscopes and accelerometers that can reveala lot about a users behavioral motion. Identifiers such ashandedness, orientation, and gait have shown relative successin biometric authentication experiments [10]. Unfortunately,this method fails to provide much, if any, data on the user ifthey are not in motion - or if an injury alters their motion.Furthermore, the more secure the application, the less likelythe user will be moving while they complete their task.

Ultimately, individual biometrics suffer from the effects

of non-universal availability, noisy data, intraclass variations(users incorrectly interacting with the sensor) [6].

B. Multimodal Authentication

For effective consumer use, continuous biometric authen-tication must be very accurate and convenient. Ideally, thetrue positive rate (TPR) would approach 100%, while thefalse positive rate (FPR) would approach 0%. In order toaccomplish this, researchers have begun exploring the effec-tiveness of combining single biometrics into a larger iden-tification scheme, referred to as multimodal authentication.By combining multiple sensor identifiers, researchers aimto develop a holistic fingerprint of an individual. Besidesthe practical concerns of processing power, there are morespecific theoretical considerations regarding how to effectivelyaccomplish this: which data should be gathered? Is the dataacquired simultaneously in parallel or in sequence, and succes-sively process sequentially or simultaneously? At what levelof machine learning does the fusion of the separate modalitiesoccur? And what scheme of fusion should be used [11]?

These complex questions are currently leading the field ofresearch. There is not necessarily one correct answer, either.It is possible that under different circumstances, different datawill be more useful, and the processing of that data mayhave to be done differently. Current research in the fieldhas provided a series of intriguing results from which tobuild from. The University of Maryland has shown relativesuccess with the feature-level fusion of touch gestures and facecapture [12]. Stylometry, application usage, web browsing, andGPS location have all been successful in identification underdecision-level fusion [13]. These are two examples however,there is a multitude of research detailing the results of multipledata and fusion methods for identifying individual users basedon their inherent traits and behaviors [6] [14]. In review of thiswork, no perfect solution has yet presented itself to reach thelevel of accuracy required to provide this functionality to realusers in a secure manner.

III. DISCUSSION AND ORGANIZATION OF EXPERIMENTALDATA

The following work utilized the open-source Universityof Maryland (UMD) Active Authentication-02 (UMDAA-02)Dataset [5]. From the documentation provided by UMD,the dataset is approximately 242GB of smartphone sensorsignals from 48 volunteers on Nexus 5 phones over a two-month period in late 2015. The dataset includes data from thefront-facing camera, touchscreen, gyroscope, accelerometer,magnetometer, light sensor, GPS, Bluetooth, WiFi, proximitysensor, temperature sensor and pressure sensor. Non-sensordata captured includes the timing of screen lock and unlockevents, start and end time stamps of calls, and currentlyrunning foreground applications.

Although the data collection was extensive, it was notorganized in a manner useful for running classification andidentification experiments (Fig. 1). In order to make the datauseful, there was a relatively large amount of preprocessing

Fig. 1: Raw Dataset Directory Structure

Fig. 2: Sorted Dataset Directory Structure

that had to occur. There were two separate modalities thatneeded to be aggregated in order to conduct useful experi-ments: (1) captured images and (2) sensor data.

A. Image Processing

Images were processed in a two-step process. First, allimages captured of a unique user were collected into arespective directory. The images captured from a front-facingcamera during use are inherently inconsistent. As can be seenin Figure 3, images often would only contain partial user faces(usually from a poor angle) with varied lighting if the userwas even captured at all.

Fig. 3: Unprocessed user images. (a) Successful subject cap-ture (b) Unexpected second subject in frame (c) No subject inframe (d) Subject face partially occluded

The following step of processing involved using a HaarFeature-based Cascade Classifier to iterate through all thecaptured images of a user, recognize those that contained faces,and crop them to only contain the captured face. The croppedfaces were placed in a sub-directory of the user with the failedimages logged (removing 70% of captured photos for somesubjects). From a cursory analysis of the cropped images ofa single user, there are several inconsistencies that need to beconsidered when doing experiments (Fig. 4). Although someimages were accurately cropped, many exhibit inconsistenciesin lighting, angle, or face detection. It was also discovered thatmany subjects handed their phone off to another subject (Fig4.f). There are some proposed solutions to this that will bediscussed later in the Future Work section.

B. Sensor Processing

Processing the sensor data required some considerationsabout the modalities to include. A manual analysis of thecaptured sensor data in paired consideration with the intendedexperiments led to some conclusions about how to best aggre-gate all the sensor data. The accelerometer, gyroscope, andtouch events were the most consistently present modalitiescaptured throughout data collection and considering thesehave proven to be successful in previous user authenticationexperiments were determined to be the most realistic forexperimental purposes. Each user directory was scanned for allrelevant sensor events, and their time stamps were recorded.All sensor data was then averaged over a predetermined timewindow (2 seconds and 10 seconds), and recorded in a newdata file, with user id, session id, and time stamps kept intact.The filenames for images taken during the respective timewindows were also appended to the sensor data.

The result was a batch data file for each user, where eachrow represented an average of sensor values recorded over thedesignated time window, as well as a list of all the imagestaken during that time.

Fig. 4: Haar Cascade cropped photos. (a) Successful crop offace (b) Blurry subject (c) Poor lighting (d) Poor angle (e)Inaccurate face detection (f) Undesired subject

Fig. 5: Accelerometer Data

IV. EXPERIMENTAL CONDITIONS AND RESULTS

A. Image Classification

For image classification, three folds were created fromthe entire volume of photos with detected faces(without anyfurther data cleaning). Photos were resized to a resolution of80x80 for efficient processing and memory management. Thetraining was performed with a convolutional neural network(CNN). The 3 convolutional layers used a ReLU activationfunction, a softmax output layer, and were optimized with theAdam algorithm (see Figure 6).

Fig. 6: CNN architecture for classifying face images.

Results from image classification proved to be promising.The average precision of the generated models from all threefolds was 81.73%, and the average F1 score was 79.69%.Figure 7 plots the output from each layer in the CNN fora single test image. From a comparison of correctly andincorrectly classified photos (Fig. 8 and Fig. 9), it can be seenthat changes in capture angle, lighting, and misidentified facesare the images that create difficulties in identification. TheFuture Work section proposes solutions to properly adjustingfor these inconsistencies.

B. Sensor Classification

Multiple different configurations were tested for sensordata classification. Using a long short term memory (LSTM)network, training was conducted with various configurations.ReLU activation paired with Softmax optimization reacheda validation accuracy of 3.81% for 10 second intervals and7.28% for 2 second intervals. Using Tanh activations withRMSProp optimization reached a validation accuracy of 4.41%

(a) Test image. (b) Convolution layer 1. (c) Pooling layer 1.

(d) Convolution layer 2. (e) Pooling layer 2. (f) Convolution layer 3.

Fig. 7: CNN output at each layer for a test image.

Fig. 8: Correct predictions

for 10 second intervals and 5.25% for 2 second intervals (seeFigure 10). The most pressing concern would be designing

Fig. 9: Incorrect predictions

a network that can accurately identify a user based on theirsensor data. This is a relatively complex metric to model, as a

moment by moment analysis of touch and motion data does notgive the full picture - the correct interpretation of this data isreliant upon the previous data point. Although a different deeplearning network could be constructed to achieve some level ofaccuracy, the more vital focus should be detailed preprocessing- sensor data contains many nuances in identifying uniquetraits, and a simple time-based average (as was performed inthese experiments) is too imprecise of an interpretation. Aftercareful processing, a fine-tuned RNN could be expected tomodel the data in the proper manner, as they create modelsbased on a temporal sequence. After all these considerationsare accounted for, the questions can then begin to focus moreon which specific combinations of sensor modalities indicatethe most success in identifying a user.

Fig. 10: LSTM architecture for classifying sensor data.

V. FUTURE WORK

The work detailed in this report intends to lay the foun-dation for continued work into continuous authentication. Inlooking towards the future, there are several recommendedconsiderations.

To improve face detection, there is more advanced datagrooming that must occur. Simply cropping faces from thecaptured images is insufficient - conditions such as lightingand capture angle create inconsistencies that a deep learningnetwork cannot properly resolve. There are a few options thatmay improve the quality of the images the network trainson, depending on which part of the process is adjusted. One

proposed solution to this is a more powerful face detectionalgorithm - by establishing a threshold for image quality,the facial detection algorithm could effectively remove anyphotos that did not meet the desired standard (too blurry,poor lighting). Another option involves adjusting the weightof image classification in the fusion process. A poor qualityphoto can still be used to return an identification decision,however it may just receive less weight when fused with asensor identification decision.

As was noted in the results, some subjects had handed offtheir phones to another user. While this presents a problemin training a deep learning network, it provides an excitingopportunity for testing continuous authentication in a labenvironment. Once the photos containing the unidentified userhave been flagged, they can be correlated with the sensor datafor that event. This will then provide a very realistic test set forcontinuous authentication experiments - can the trained modelidentify the moment that the phone has been passed to a newuser?

While the UMDAA-02 dataset is quite comprehensive, it isinsufficient in the greater realm of this research. The UMDAA-02 dataset was generated in 2015, and mobile technologyadvances at such a rapid rate that data captured from mobiledevices today could be expected to have a higher quality ofinformation available. To produce a dataset at the scale of theUMDAA-02 is not unreasonable, it just requires significantdirected resources - and in this field of research, there is agreat need for an open dataset to allow for more consistent andubiquitous advances in continuous biometric authentication.However, the UMDAA-02 dataset provides the opportunityto make these exploratory probes into successful biometricmodalities and classification/fusion methods.

ACKNOWLEDGMENT

This research has been partially supported by the NationalScience Foundation under grants No. 1458928 and 1645025,An REU Site on Ubiquitous Sensing.

We would also like to show our gratitude to the SaurabhHinduja (University of South Florida) for sharing his expertisewith us during the course of this research, and Dr. MiguelLabrador (University of South Florida) for allowing us theopportunity to conduct this research.

REFERENCES

[1] N. Sivasankari and A. Muthukumar, ”A review on recent techniquesin multimodal biometrics,” 2016 International Conference on ComputerCommunication and Informatics (ICCCI), Coimbatore, 2016, pp. 1-11.

[2] S. Rasnayaka and T. Sim, ”Who wants Continuous Authenticationon Mobile Devices?,” 2018 IEEE 9th International Conference onBiometrics Theory, Applications and Systems (BTAS), Redondo Beach,CA, USA, 2018, pp. 1-9.

[3] I. T. Fischer, C. Kuo, L. Huang, M. Frank. (2012). Short paper:Smartphones: Not smart enough?. Proceedings of the ACM Conferenceon Computer and Communications Security. 27-32.

[4] Verizon. Data Breach Investigations Report. (2014).[5] U. Mahbub, S. Sarkar, V. M. Patel and R. Chellappa, ”Active user

authentication for smartphones: A challenge data set and benchmarkresults,” 2016 IEEE 8th International Conference on Biometrics Theory,Applications and Systems (BTAS), Niagara Falls, NY, 2016, pp. 1-8.

[6] V. M. Patel, R. Chellappa, D. Chandra and B. Barbello, ”ContinuousUser Authentication on Mobile Devices: Recent progress and remainingchallenges,” in IEEE Signal Processing Magazine, vol. 33, no. 4, pp.49-61, July 2016.

[7] R. Yampolskiy, ”Behavioral Modeling: an Overview,”, in AmericanJournal of Applied Sciences, vol 5, May 2008.

[8] U. Mahbub, V. M. Patel, D. Chandra, B. Barbello and R. Chellappa,”Partial face detection for continuous authentication,” 2016 IEEE Inter-national Conference on Image Processing (ICIP), Phoenix, AZ, 2016,pp. 2991-2995.

[9] R. Masood, B.Z.H. Zhao, H.J. Asghar, and M.A. Kaafar, ”Touch andyoure trapp(ck)ed: quantifying the uniqueness of touch gestures fortracking” Proceedings on Privacy Enhancing Technologies, vol. 2018,no. 2, pp. 122142.

[10] C. Shen, Y. Li, Y. Chen, X. Guan and R. A. Maxion, ”PerformanceAnalysis of Multi-Motion Sensor Behavior for Active SmartphoneAuthentication,” in IEEE Transactions on Information Forensics andSecurity, vol. 13, no. 1, pp. 48-62, Jan. 2018.

[11] Anil K. Jain, Arun A. Ross, and Karthik Nandakumar. 2011. Introductionto Biometrics. Springer Publishing Company, Incorporated.

[12] H. Zhang, V. M. Patel and R. Chellappa, ”Robust multimodal recognitionvia multitask multivariate low-rank representations,” 2015 11th IEEEInternational Conference and Workshops on Automatic Face and GestureRecognition (FG), Ljubljana, 2015, pp. 1-8.

[13] L. Fridman, S. Weber, R. Greenstadt and M. Kam, ”Active Authen-tication on Mobile Devices via Stylometry, Application Usage, WebBrowsing, and GPS Location,” in IEEE Systems Journal, vol. 11, no. 2,pp. 513-521, June 2017.

[14] Z. Sitov et al., ”HMOG: New Behavioral Biometric Features forContinuous Authentication of Smartphone Users,” in IEEE Transactionson Information Forensics and Security, vol. 11, no. 5, pp. 877-892, May2016.

continuous multimodal mobile authentication: an exploratory ...tjneal/projects/reu-2019.pdfare using...

Documents