zhengyang qu ryan riley arxiv:1602.04693v2 [cs.cr] 21 feb...

18
DroidNative: Semantic-Based Detection of Android Native Code Malware Shahid Alam Qatar University [email protected] Zhengyang Qu Northwestern University [email protected] Ryan Riley Qatar University [email protected] Yan Chen Northwestern University [email protected] Vaibhav Rastogi University of Wisconsin-Madison [email protected] February 23, 2016 Abstract According to the Symantec and F-Secure threat re- ports, mobile malware development in 2013 and 2014 has continued to focus almost exclusively (99%) on the Android platform. Malware writers are applying stealthy mutations (obfuscations) to create malware variants, thwarting detection by signature based de- tectors. In addition, the plethora of more sophisti- cated detectors making use of static analysis tech- niques to detect such variants operate only at the bytecode level, meaning that malware embedded in native code goes undetected. A recent study shows that 86% of the most popular Android applications contain native code, making this a plausible threat. This paper proposes DroidNative, an Android mal- ware detector that uses specific control flow patterns to reduce the effect of obfuscations, provides automa- tion and platform independence, and as far as we know is the first system that operates at the An- droid native code level, allowing it to detect malware embedded in both native code and bytecode. When tested with traditional malware variants it achieves a detection rate (DR) of 99.48%, compared to academic and commercial tools’ DRs that range from 8.33% – 93.22%. When tested with a dataset of 2240 samples DroidNative achieves a DR of 99.16%, a false positive rate of 1.3% and an average detection time of 26.87 sec/sample. Keywords End point security, Android native code, Malware Analysis, Malware variants detection, Control flow analysis, Data mining 1 Introduction Worldwide losses due to malware attacks and phish- ing between July 2011 and July 2012 were $110 billion [5]. In 2013 there was a 42% increase in the malware attacks over 2011. Web-based attacks increased by 30 percent in 2012 [5]. Mobile malware development in 2013 [6] and in 2014 [7] continues to focus almost exclusively (99%) on the Android platform, due to its popularity and open-nature. In recent years there has been a significant amount of research focusing on detection of Android malware through both static [20, 27, 30, 35, 41, 45, 47, 49, 50] and dynamic [15, 23, 35, 48, 52, 54] analysis tech- niques. Among the static analysis techniques, all the systems operate at the Java bytecode level. This is logical, as most Android applications are primarily written in Java. Many Android applications, how- ever, contain functionality not implemented in Java but instead in native code. In this paper, we refer to native code as machine code that is directly executed by a processor; as op- posed to bytecode, which is executed in a virtual ma- chine such as the JVM or Dalvik-VM. While the stan- 1 arXiv:1602.04693v2 [cs.CR] 21 Feb 2016

Upload: others

Post on 18-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

DroidNative: Semantic-Based Detection of Android

Native Code Malware

Shahid Alam

Qatar University

[email protected]

Zhengyang Qu

Northwestern University

[email protected]

Ryan Riley

Qatar University

[email protected]

Yan Chen

Northwestern University

[email protected]

Vaibhav Rastogi

University of Wisconsin-Madison

[email protected]

February 23, 2016

Abstract

According to the Symantec and F-Secure threat re-ports, mobile malware development in 2013 and 2014has continued to focus almost exclusively (∼99%) onthe Android platform. Malware writers are applyingstealthy mutations (obfuscations) to create malwarevariants, thwarting detection by signature based de-tectors. In addition, the plethora of more sophisti-cated detectors making use of static analysis tech-niques to detect such variants operate only at thebytecode level, meaning that malware embedded innative code goes undetected. A recent study showsthat 86% of the most popular Android applicationscontain native code, making this a plausible threat.This paper proposes DroidNative, an Android mal-ware detector that uses specific control flow patternsto reduce the effect of obfuscations, provides automa-tion and platform independence, and as far as weknow is the first system that operates at the An-droid native code level, allowing it to detect malwareembedded in both native code and bytecode. Whentested with traditional malware variants it achieves adetection rate (DR) of 99.48%, compared to academicand commercial tools’ DRs that range from 8.33% –93.22%. When tested with a dataset of 2240 samplesDroidNative achieves a DR of 99.16%, a false positiverate of 1.3% and an average detection time of 26.87sec/sample.

Keywords End point security, Android nativecode, Malware Analysis, Malware variants detection,Control flow analysis, Data mining

1 Introduction

Worldwide losses due to malware attacks and phish-ing between July 2011 and July 2012 were $110 billion[5]. In 2013 there was a 42% increase in the malwareattacks over 2011. Web-based attacks increased by30 percent in 2012 [5]. Mobile malware developmentin 2013 [6] and in 2014 [7] continues to focus almostexclusively (∼99%) on the Android platform, due toits popularity and open-nature.

In recent years there has been a significant amountof research focusing on detection of Android malwarethrough both static [20, 27, 30, 35, 41, 45, 47, 49, 50]and dynamic [15, 23, 35, 48, 52, 54] analysis tech-niques. Among the static analysis techniques, all thesystems operate at the Java bytecode level. This islogical, as most Android applications are primarilywritten in Java. Many Android applications, how-ever, contain functionality not implemented in Javabut instead in native code.

In this paper, we refer to native code as machinecode that is directly executed by a processor; as op-posed to bytecode, which is executed in a virtual ma-chine such as the JVM or Dalvik-VM. While the stan-

1

arX

iv:1

602.

0469

3v2

[cs

.CR

] 2

1 Fe

b 20

16

Page 2: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

dard programming model for Android applications isto write the application in Java and distribute thebytecode, the model also allows for developers to in-clude native code binaries as part of their applica-tions. This is frequently done in order to make use ofa legacy library or to write code that is hardware de-pendent, such as for a game graphics engine or videoplayer.

A recent study [46] shows that 86% of the mostpopular Android applications contain native code.Given that existing, static analysis based malwaredetection systems operate on a bytecode level, thismeans these applications cannot be completely ana-lyzed. If the malicious payload of the app was in-cluded in the native code, existing systems would beunable to detect it.

In addition to native code, another issue plagu-ing malware detection on Android is obfuscation. Toprotect an Android application for reverse engineer-ing attacks, even legitimate developers are encour-aged to obfuscate [19] their code. Similar techniquesare used by an Android malware writer to preventanalysis and detection. Obfuscation can be used tomake the code more difficult to analyze, or to createvariants of the same malware in order to evade detec-tion. Previous studies [26, 37, 51] have evaluated theresilience of commercial anti-malware products whentested against variants of known malware. In [37], theauthors found that none of the products were able todetect the variants. In more recent work [51] the de-tection rate of the products ranged from 50.95% –76.67%.

In order to try and detect malware variants, sev-eral works have focused on Android malware de-tection using static analysis. Some of these, suchas [50] and [20], use API call graphs (ACGs); [47]uses component-based ACGs; [27] uses byte se-quence features. Other previous works have focusedon graphs to represent program semantics, such ascontrol-flow graphs [18] and data dependency graphs[28, 31]. These techniques make an exact matchagainst manually-crafted specifications for malwaredetection, and therefore can potentially be evaded bymalware variants. Other semantic-based techniquesare compute intensive, on reducing the processingtime [33] they produce poor detection rates (DRs)

and are not suitable for real-time malware detection.This paper introduces DroidNative, a malware de-

tection system for Android that operates at the na-tive code level and is able to detect malware in ei-ther bytecode or native code. DroidNative performsstatic analysis of the native code and focuses on pat-terns in the control flow that are not significantly im-pacted by obfuscations. This allows DroidNative toeffectively detect Android malware, even with obfus-cations applied and even if the malicious payload isembedded in native code. DroidNative is not lim-ited to only analyzing native code, it is also ableto analyze bytecode by making use of the Androidruntime (ART) [1] to compile bytecode into nativecode suitable for analysis. The use of control flowwith patterns enables DroidNative to detect smallersize malware, which allows DroidNative to reduce thesize of a signature for optimizing the detection timewithout reducing the DR. Our experimental analysisshows that DroidNative is able to achieve an over-all DR of 99.16% and a false positive rate (FPR)of 1.3%. DroidNative achieves an average detectiontime of 26.87 sec/sample when tested with 2240 sam-ples, and is ∼6.5 times faster than the other comparedwork [50].The contributions of this work are as follows:

• DroidNative, as far as we know, is the first systemthat builds and designs cross-platform signaturesfor Android and operates at the native code level,allowing it to detect malware embedded in eitherbytecode or native code.

• When applied to traditional, bytecode based An-droid malware, the detection rate is on par withthat of many research systems that only operateat the bytecode level.

• When tested with previously unseen variants of ex-isting malware, the detection rate is 99.48%, sig-nificantly surpassing that of commercial tools.

• Based on our analysis of related work, DroidNativeis noticeably faster than existing systems, makingit suitable for real-time analysis.

• We provide n-fold cross validation results for thesystem, providing a more systematic approach to

2

Page 3: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

demonstrating accuracy than is employed by mostsimilar works.

2 Background

DroidNative detects malware by building control flow(behavioral) signatures of an application and com-paring them to signatures derived from known mal-ware. In order to build these signatures, the Androidprogram executable (provided in either bytecode ornative code) is converted to an intermediate repre-sentation. This intermediate representation providesautomation, platform independence and reduces theeffect of obfuscations.

In this section we provide background informa-tion regarding the intermediate representation usedas well as the Android Runtime Environment (ART)that are used in the design and implementation ofDroidNative.

2.1 Malware Analysis IntermediateLanguage (MAIL)

DroidNative performs its analysis on an intermedi-ate representation of an Android application’s binarycode. In order to accomplish this, DroidNative makesuse of the Malware Analysis Intermediate Language(MAIL) [10]. Traditionally, intermediate languagesare used in compilers to translate the source code intoa form that is easy to optimize and to provide porta-bility. We can apply these same concepts to malwareanalysis and detection. MAIL provides a high levelrepresentation of the disassembled binary programincluding specific information such as control flow in-formation, function/API calls and patterns etc. Hav-ing this information available in an IL allows for eas-ier and optimized analysis and detection of malware.

DroidNative uses MAIL to provide an abstract rep-resentation of an assembly program, and that repre-sentation is used for malware analysis and detection.By translating native binaries compiled for differentplatforms (Android supports ARM, x86 and MIPSarchitectures) to MAIL, DroidNative achieves plat-form independence, and uses MAIL Patterns to opti-mize malware analysis and detection.

2.1.1 Patterns for Annotation

MAIL is used to annotate a CFG (control flow graph)of a program using different patterns available in thelanguage. The purpose of these annotations is to as-sign patterns to MAIL statements that can be usedlater for pattern matching during malware detection,that helps us detect smaller malware, and enables usto reduce the size of a signature and the detectiontime.

There are a total of eight basic statements (e.g, as-signment, control and conditional, etc) in MAIL thatcan be used to represent the structural and behav-ioral information of an assembly program. There are21 Patterns in the MAIL language, and every MAILstatement can be tagged by one of these patterns.A pattern represents the type of a MAIL statementand can be used for easy comparison and matching ofMAIL programs. For example, an assignment state-ment with a constant value and an assignment state-ment without a constant value are two different pat-terns. The details of the patterns can be found inTable 1.

2.1.2 Signature Generation Techniques

To counter simple signature-based malware de-tectors, malware will often use obfuscation tech-niques [26, 37, 51] to obscure its host application andmake it difficult to understand, analyze and detectthe malware embedded in the code. In order to de-tect malware protected by obfuscations, we apply twosignature generation techniques designed to countersuch approaches. MAIL Patterns are used to buildthese signatures.

Annotated Control Flow Graph The first tech-nique, Annotated Control Flow Graph (ACFG), an-alyzes a control flow graph derived from an appli-cation and then annotates it using MAIL patterns inorder to capture the control flow semantics of the pro-gram. A complete formal definition of an ACFG andits matching for malware detection is given in [12].

Annotating the CFG with MAIL statements allowsmuch smaller CFGs to be reliably analyzed and usedfor malware detection than are traditionally used in

3

Page 4: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

Pattern Description

ASSIGN An assignment statement, e.g. r0 = r0 + r1;

ASSIGN CONSTANT An assignment statement including a constant, e.g. r0 = r0 + 0x01;

CONTROL A control statement where the target of the jump is unknown, e.g. if (zf == 1) JMP [r0 + r1+ 0x10];

CONTROL CONSTANT A control statement where the target of the jump is known. e.g. if (zf == 1) JMP 0x400567;

CALL A call statement where the target of the call is unknown, e.g. CALL r5;

CALL CONSTANT A call statement where the target of the call is known, e.g. CALL 0x603248;

FLAG A statement including a flag, e.g. cf = 1;

FLAG STACK A statement including flag register with stack, e.g. eflags = [sp = sp – 0x1];

HALT A halt statement, e.g. HALT;

JUMP A jump statement where the target of the jump is unknown, e.g. JMP [r0 + r5 + 0x10];

JUMP CONSTANT A jump statement where the target of the jump is known, e.g. JMP 0x680376

JUMP STACK A return jump, e.g. JMP [sp = sp – 0x8]

LIBCALL A library call, e.g. compare(r0,r5);

LIBCALL CONSTANT A library call including a constant, e.g. compare(r0,0x10);

LOCK A lock statement, e.g. lock;

STACK A stack statement, e.g. r0 = [sp = sp – 0x1];

STACK CONSTANT A stack statement including a constant, e.g. [sp = sp + 0x1] = 0x432516;

TEST A test statement, e.g. r0 and r5;

TEST CONSTANT A test statement including a constant, e.g. r0 and 0x10;

UNKNOWN Any unknown assembly instruction that cannot be translated.

NOTDEFINED The default pattern, e.g. all the new statements when created are assigned this default value.

Table 1: Patterns used in MAIL. r0, r1, r5 are the general purpose registers, zf and cf are the zero and carry flags respectively,and sp is the stack pointer.

CFG based techniques. The advantage of this is thatan application can be broken up into a collection ofsmaller ACFGs instead of one, large CFG. This col-lection can then be analyzed and pattern matchedwith malware much more quickly than the largerCFG.

For example, an application being analyzed is bro-ken up into a set of small ACFGs which becomethat application’s signature. At detection time, theACFGs within that signature are compared to theACFGs of known malware, and if a high percentageof ACFGs in the application match that of knownmalware, then the application can be flagged as mal-ware. This matching is also very fast, because theindividual ACFGs being compared against are fairlysmall.

Sliding Window of Difference The second tech-nique for signature building, Sliding Window of Dif-

ference (SWOD) [11], is an opcode-based techniquefor malware detection. One of the main advantages ofusing opcodes for detecting malware is that detectioncan be performed in real-time. However, the currenttechniques [14, 36, 39, 42] using opcodes for malwaredetection have several disadvantages: (1) The pat-terns of opcodes can be changed by using a differentcompiler, the same compiler with a different level ofoptimizations, or if the code is compiled for a differ-ent platform. (2) Obfuscations introduced by mal-ware variants can change the opcode distributions.(3) The execution time depends on the number offeatures selected for mining in a program. Selectingtoo many features results in a high detection rate butalso increases the execution time. Selecting too fewfeatures has the opposite effect.

A SWOD is a window that represents differencesin MAIL patterns’ (while provide a high-level repre-sentation of a program’s opcodes) distributions and

4

Page 5: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

hence makes the analysis independent of differentcompilers, compiler optimizations, instruction set ar-chitectures and operating systems. In order to miti-gate the effect of obfuscations introduced by malwarevariants, SWOD uses a control flow weight schemebased on heuristics [11] to extract and analyze thecontrol flow semantics of the program. Furthermore,it uses statistical analysis of MAIL patterns’ distri-butions to develop a set of heuristics that help in se-lecting an appropriate number of features and reducethe runtime cost.

2.2 The ART Runtime Environment

Android applications are available in the form ofAPKs (Android application packages). An APK filecontains the application code as Android bytecodeand precompiled native binaries (if present). Tra-ditionally, those applications are executed on thephone using a customized Java virtual machine calledDalvik [2]. Dalvik employs just-in-time compilation(JIT), similar to the standard JVM.

Starting from Android 5.0, the Dalvik-VM was re-placed by the ART [1]. ART uses ahead of time(AOT) compilation to transform the Android byte-code into native binaries when the application is firstinstalled on the device. ART’s AOT compiler canperform complex and advanced code optimizationsthat are not possible in a virtual machine using aJIT compiler. ART improves the overall executionefficiency and reduces the power consumption of anAndroid application, but increases the installationtime and may increase the code size of an applica-tion. In this work, we use ART in order to translatebytecode into native code that can be analyzed usingour techniques.

3 System Design

In this section we present the design of DroidNative.DroidNative provides a fully automated approach tomalware detection based on analysis of the nativecode of an Android application. It analyzes control-flow patterns found in the code in order to comparethem to patterns seen in known malware.

Figure 1 provides an overview of the DroidNativearchitecture. As mentioned previously, the systemoperates by analyzing native code. The vast major-ity of Android applications, however, are distributedas bytecode. (For example, a standard applicationmight include the main application as Java bytecode,but also include native code libraries for performancereasons.) When an app is presented to DroidNativefor analysis, the application is separated into byte-code and native code components. Any bytecodecomponents are passed to the ART compiler and anative binary is produced. Any native code compo-nents are passed directly to the next stage. At thispoint, all parts of the application exist as native code.

Next, the binary code is disassembled and trans-lated into MAIL code, which is processed usingACFG and SWOD (more details can be found below)in order to produce a behavioral signature. Next, thesimilarity detector uses this signature and detects thepresence of malware in the application using the mal-ware templates previously discovered during a train-ing phase. A simple binary classifier (decision tree)is used for this purpose. If the application being an-alyzed matches a malware template (within a giventhreshold), then the app is tagged as malware.

We will now discuss the various components ofDroidNative in more detail.

3.1 Disassembler

When provided with the native code for an applica-tion, the first step in the DroidNative process is todisassemble it. A challenge related to this stage is en-suring that all code is found and disassembled. Thereare two standard techniques [40] for disassembly. (1)The Linear Sweep technique starts from the first byteof the code and disassembles one instruction at a timeuntil the end. This assumes that the instructions arestored adjacently, and hence does not distinguish be-tween embedded data and instructions. When datais mixed with the code either by the compiler or bymalware writers, this technique may produce incor-rect results. The advantage of this technique is that itprovides complete coverage of the code. (2) The Re-cursive Traversal technique decodes instructions byfollowing the control flow of the program. This tech-

5

Page 6: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

MalwareTemplates Template

ART

Disassembler

Optimizer

MAILGenerator

Optimized codeAssembly code

MAIL

Report

SignatureGenerator

Signature

Data Miner

SimilarityDetector

Androidbytecode

Nativebinary

AndroidApps

Mined dataNative binary

Figure 1: Overview of DroidNative.

nique only disassembles an instruction if it is refer-enced by another instruction. The advantage of thistechnique is that it distinguishes code from data. Butin case of a branch whose destination address cannotbe determined statically, this technique may fail tofind and disassemble valid instructions. To overcomethe deficiencies of linear sweep and recursive traversalwe combine these two techniques while disassembling.For example, in case of an unknown branch, we useMAIL to tag such address, and use a linear sweep tomove to the next instruction.

Another challenge related to this is that most bi-naries used in Android are stripped, meaning they donot include debugging or symbolic information. Thismakes it extremely difficult to statically build a call(API) graph for such binaries. It can also become dif-ficult to find function boundaries. IDA Pro [3], themost popular disassembler, uses pattern matching tofind such functions, but these patterns change fromone compiler to another and also from one version toanother version of the same compiler. Therefore, thistechnique is difficult to maintain and is not general.We handle this problem by building control flow pat-terns and use them for malware detection. Our focusis mostly on the detection of malware variants, there-fore, we may only need to find where the control isflowing (i.e: just the behavior and not the functionboundaries), and then compare this behavior withthe previous samples of malware available to detectsuch malware.

3.2 Optimizer

The Optimizer performs normalization of the assem-bly code. The normalizations performed are the re-moval of NOP, junk and some of the prefixes. TheOptimizer also prepares the assembly code to betranslated to MAIL by removing other instructionsthat are not required for malware analysis.

A key challenge for static analysis of native codeis the large number of instructions supported by theIntel x86 and ARM microprocessors (the two mostpopular architectures supported by Android). Thereare hundreds of different instructions in instructionsets for these architectures. In order to speedup staticanalysis, we need to reduce the number of these in-structions that we fully process. Many instructionsare not relevant to malware analysis. Examples in-clude the instruction PREFETCH in Intel x86 andPRFM in ARM. These instructions move data fromthe memory to the cache. The optimizer removesthese and other similar instructions.

An Android application is divided into compo-nents, and each component serves a different pur-pose, and is called when an event occurs. In addi-tion, exceptions permeate the control flow structureof an Android app. Exceptions are resolved by dy-namically locating the code specified by the program-mer for handling the exception. This produces sev-eral unknown branch addresses in a disassembled An-droid binary program. The existence of multiple en-try points makes it difficult to build a correct controlflow graph [8] for the program using static analysis.

6

Page 7: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

To handle this problem, DroidNative builds multi-ple, smaller, interwoven CFGs for a program insteadof a single, large CFG. These smaller CFGs are thenused for matching and malware detection. Match-ing smaller CFGs, instead of a single large CFG, alsohelps in reducing the runtime.

3.3 MAIL Generation

The MAIL Generator translates an assembly pro-gram to a MAIL program. Some of the majortasks performed by this component are: (1) Trans-lating each assembly instruction to the correspondingMAIL statement(s). Some of the assembly instruc-tions, such as PUSHA (x86) and STMDB (ARM),are translated to more than one MAIL statement. (2)Assigning a pattern to each MAIL statement. (3) Re-ducing the number of different instructions by group-ing together functionally equivalent assembly instruc-tions so that the MAIL Generator can combine theminto one MAIL statement using the MAIL Patternslisted in Table 1.

Further details regarding the transformation of In-tel x86 and ARM assembly programs into MAIL canbe found in our technical report [9].

3.4 Malware Detection

In this Section we explain how the components, DataMiner, Signature Generator and Similarity De-tector, shown in Figure 1, use the two techniques,ACFG and SWOD, for malware detection. The DataMiner searches for the control and structural infor-mation in a MAIL program to help the SignatureGenerator build a behavioral signature (currently twotypes of signatures, ACFG or SWOD) of the MAILprogram. The Similarity Detector matches the signa-ture of the program against the signatures of the mal-ware templates extracted during the training phase,and determines whether the application is malwarebased on thresholds that are computed empiricallyas explained in Section 3.5, for malware detection.

entry

a

exit

b

c

d

e

entry

a

exit

b

d

c

e

hf g

ji

k m

Figure 2: An example of ACFG matching. The graph on theleft (M , a malware sample) is matched as a subgraph of thegraph on the right (Psg , the malware embedded inside a benignprogram), i.e, M = (a, b, c, d, e) ∼= Psg = (b, c, e, h, j), whereM ∼= Psg denotes that M is isomorphic [29] to Psg .

3.4.1 DroidNative-ACFG

For detecting Android malware variants usingACFGs, a CFG is built for each function in the an-notated MAIL program, yielding the ACFGs. TheseACFGs, i.e, the signature of the Android program,are matched against the malware templates to seeif the program contains malware or not. If a highnumber of ACFGs within a program are flagged asmalware, then the application itself is flagged.

If a part of the control flow of a program containsa malware, we classify that program as malware, i.e,if a percentage (compared to some predefined thresh-old) of the number of ACFGs in the signature of amalware program match with the ACFGs in the sig-nature of a program, then the program is classifiedas malware. An example of ACFG matching can befound in Fig. 2.

If an ACFG of a malware program matches withan ACFG of a program, we further use the patternsto match each statement in the matching nodes ofthe two ACFGs. A successful match requires all thestatements in the matching nodes to have the samepatterns, although there could be differences in thecorresponding statement blocks. Example of a failed

7

Page 8: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

pattern matching of two isomorphic ACFGs is shownin Figure 3. This allows us to detect malware withsmaller ACFGs, and also allows us to reduce the size(number of blocks) of an ACFG for optimizing thedetection time. We reduce the number of blocks inan ACFG by merging them together. Two blocks arecombined only if their merger does not change thecontrol flow of the program.

3.4.2 DroidNative-SWOD

Every MAIL statement is assigned a pattern dur-ing translation from assembly language to MAIL.Each MAIL pattern is assigned a weight based on theSWOD that represents the differences between mal-ware and benign samples’ MAIL patterns’ distribu-tions. The CFWeight (control flow weight) of a MAILstatement is computed by adding all the weights as-signed to the elementary statements involved in itusing some heuristics. The final weight of a MAILstatement is the sum of its CFWeight and the weightof the pattern assigned to the statement. The fi-nal weights of the statements of a MAIL program arestored in a weight vector that represents the programsignature. This signature is then sorted in ascend-ing order for easy and efficient comparison using anindex-based array.

After building the signature of a new program asexplained above, it is compared with the signatures ofall the training malware samples. In case of a matchwith any of the signatures we tag the new program asmalware. An example of successful SWOD signaturematching can be found in Fig. 4, where 11 out of16 (a threshold computed empirically) index-basedarray values of the program match with the malwareprogram M15.

3.5 Malware Templates

The malware templates used for comparison are gen-erated during a training phase that analyzes knownmalware and extracts their signatures as explained inSections 2.1.2, 3.4.1, and 3.4.2. In this phase we findthe threshold to be used for ACFG matching, andthe differences between malware and benign samples’MAIL patterns to find the SWOD values as described

SignatureGenerator

MAILProgram

Originalsignature

Signaturesorted

Signature storedfor comparison

Signatures ofmalware samples

000301304000002002

01234567891011121314151617

33

3

14

17

17

14

8

8

6

6

6

53

33

1717

14

88

8

666

53

8

02

20

3104

M1

04

03

0120

02

31

0221

00

05

8094

0213013040200221

0123456789101112131415

3

814

8 M2

M15

Mz

Figure 4: A successful SWOD signature matching.

in [11]. The threshold is the best similarity score ofACFGs matching that achieves the highest DR forthe chosen dataset. The ACFG matching thresholdcomputed (using 25% of the dataset) for the datasetused in this paper is 70%. The control flow weightof each MAIL pattern is computed, which is thenused with the SWOD to compute the total weightof each MAIL pattern. The computed threshold andweight of each MAIL pattern are then used to buildthe Malware Templates (malware signatures) shownin Figure 1.

4 Evaluation

In order to analyze the correctness and efficiency ofDroidNative we performed a series of experiments ofthe system, comparing it to existing commercial sys-tems as well as research systems.

4.1 Experimental Setup

4.1.1 Dataset

Our dataset for the experiments consists of total 2240Android applications. Of these, 1240 are Androidmalware programs collected from two different re-sources [34, 53] and the other 1000 are benign pro-grams containing Android 5.0 system programs, li-braries and standard applications.

Table 2 lists the class/family distribution of the1240 malware samples. As can be seen, some ofthe malware in our set is native code based while

8

Page 9: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

908 RSP = 0x8 - RSP; (ASSIGN)90c [sp=sp+1] = 0x911; call (0xaec); (CALL_C)

911 [sp=sp+1] = 0x916; jmp (0xb80); (JMP_C)

916 jmp (0x180); (JMP_C)

91b RSP = RSP + 0x8; (ASSIGN)91f jmp [sp=sp-0x1]; (JUMP_S)

Block 0

Block 1

Block 2

Block 3

129 RAX = RAX + 0xf; (ASSIGN)12d [sp=sp+1] = 0x132; call (0x4b8); (CALL_C)

132 jmp (0xed6); (JMP_C)

136 jmp (0x068); (JMP_C)

13e EDI = EDI; (ASSIGN)140 [sp=sp+0x1] = EBP; (STACK)141 EBP = ESP; (ASSIGN)143 EAX = [EBP+0x8]; (ASSIGN)146 EAX = [EAX]; (ASSIGN)148 compare([EAX], 0xe06d7363); (LIBCALL)14e if (ZF == 0) jmp (0x17a); (CONTROL_C)

Block 0

Block 1

Block 2

Block 3

=

=

=

=

(a) One of the ACFGs of a malware sample (b) One of the ACFGs of a benign program

Block 4

Figure 3: Example of a failed pattern matching of two isomorphic ACFGs. The ACFG on the left (a malware sample) isisomorphic to the subgraph (blocks 0 – 3) of the ACFG on the right (a benign sample).

others are bytecode based. CarrierIQ is an adwarethat performs keystroke logging, location trackingand has the ability to intercept text messages. Droid-Pak is an APK loader. It is a malware specificallydesigned to infect Android devices. ChathookPtraceperforms malicious code injection. The 4 Droid-KungFu variants make use of obfuscations to evadedetection, such as different ways of storing the com-mand and control server addresses and change of classnames, etc. ADRD, DroidDream, DroidDreamLightand BaseBridge also contain number of variantsthrough obfuscations. AnserverBot employs anti-analysis techniques, such as, tampering of repack-aged application; aggressively obfuscating its inter-nal classes, methods and fields to make them hu-manly unreadable; and detection of some of the anti-malware programs. The samples collected from Con-tagiominidump [34] consists of differrent latest mal-ware applications, such as RansomCollection, Fake-Mart, FakeJobOffer and FakeAntiVirus, etc.

In order to facilitate the possibility of other au-thors comparing their systems to DroidNative, wehave made the source code (60,000+ lines of C/C++code) of DroidNative and our testing dataset publiclyavailable [4].

4.1.2 Test Platform

All experiments were run on an Intel R© Xeon R© CPUE5-2670 @ 2.60GHz with 128 GB of RAM, runningUbuntu 14.04.1. Although such a powerful machineis not required for running DroidNative, it was usedin order to speed the results of n-fold testing, whichrequires a significant number of repetitions of exper-iments. The ART compiler, cross build on the abovemachine, was used to compile Android applications(malware and benign) to native code.

4.2 N-Fold Cross Validation

We use n-fold cross validation to estimate the per-formance of our technique. In n-fold cross validationthe dataset is divided randomly into n equal size sub-sets. n−1 sets are used for training and the remainingset is used for testing. The cross validation process isthen repeated n times with each of the n subsets usedexactly once for validation. This results in very sys-tematic, accurate testing results for a given datasetand removes the potential for skewed results basedon the choice of which data is used for training andtesting.Before evaluating the proposed techniques, we first

9

Page 10: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

Class/Family Number of samples Code Type

CarrierIQ 8 ARM

DroidPak 5 Intel x86

ChathookPtrace 2 ARM

DroidKungFu1 35 Bytecode

DroidKungFu2 30 Bytecode

DroidKungFu3 310 Bytecode

DroidKungFu4 95 Bytecode

JSMSHider 15 Bytecode

ADRD 21 Bytecode

YZHC 22 Bytecode

DroidDream 16 Bytecode

GoldDream 45 Bytecode

DroidDreamLight 46 Bytecode

KMin 50 Bytecode

Geinimi 68 Bytecode

Pjapps 75 Bytecode

BaseBridge 122 Bytecode

AnserverBot 186 Bytecode

Contagiominidump 89 Bytecode

Table 2: Class distribution of the 1240 malware samples, in-cluding whether they are native code or byte-code

define the following evaluation metrics:

• DR. The Detection Rate, also called the true posi-tive rate, corresponds to the percentage of samplescorrectly recognized as malware out of the totalmalware dataset.

• FPR. The False Positive Rate corresponds to thepercentage of samples incorrectly recognized asmalware out of the total benign dataset.

• ROC. The Relative Operating Characteristiccurve is a graphical plot used to depict the per-formance of a binary classifier. It is a two dimen-sional plot, where DR is plotted on the Y-axis andFPR is plotted on the X-axis, and hence depicts thetrade-offs between benefits (DR) and costs (FPR).We want a higher DR and a lower FPR, so a pointin ROC space to the top left corner is desirable.

• AUC. The Area Under the ROC Curve is equal tothe probability that a detector/classifier will detecta randomly chosen malware sample as malware anda randomly chosen benign sample as benign. Forexample, an AUC=1.0 means a perfect detector,i.e, the detector will always detect correctly, and anAUC=0.5 means the detector will detect correctlyhalf of the time.

Figure 5: ROC graph plots of DroidNative’s ACFG and SWOD

Through n-fold cross validation four experimentswere conducted using two different subsets of thedataset. The first set consisted of 40 benign and40 malware samples and the second set consisted of350 benign and 330 malware samples. DroidNative-ACFG and DroidNative-SWOD were each testedwith these two subsets of the dataset.

The ROC plot of both techniques using twodifferent cross-validations can be seen in Fig. 5.DroidNative-ACFG produces better results thanDroidNative-SWOD. This difference is highlightedby the AUC of the two techniques. The AUC ofDroidNative-ACFG range from 94.12 – 94.75 whilethe AUC of DroidNative-SWOD range from 70.23 –80.19.

4.3 Large-Scale Test

To make our technique lightweight and general, weperform normalization of the binary code, as ex-plained in Section 3, that helps improve our detec-tion results, i.e, the DR and the runtime. We couldhave used other binary analysis and optimizationtechniques such as method (API) calls, number ofmethods, loops and symbolic analysis etc. In gen-eral, these techniques require control and data flowanalysis that are compute intensive and are not suit-able for a real-time detector. Moreover, some of theseanalysis may not be accurate or possible in generalto perform statically on machine code, as explainedin Section 3.

To further reduce the runtime, the current imple-

10

Page 11: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

mentation of DroidNative uses a simple binary de-cision tree classifier, which has proven exceptionallygood for detecting malware variants. Currently theclassifier used in DroidNative is only using common-alities among the malware samples. In the future wewill be carrying out a study to also find common-alities among the benign samples, and between themalware and benign samples. These commonalitieswill be used to train the classifier that will help fur-ther reduce DroidNative’s FPR.

In order to demonstrate the scalability of our ap-proach we also performed an experiment with a largerdataset of 2240 samples.

We trained DroidNative on 1000 malware samplesand then performed detection on a set of 240 malwareand 1000 benign samples. The results of this testshowed that DroidNative achieved a DR of 99.16%and a FPR of 1.3%, with an average detection timeof 26.87 sec/sample. These results show that Droid-Native is applicable to larger datasets.

4.4 Comparison with CommercialTools

In order to compare DroidNative to a variety of com-mercial tools we generated variants of known mal-ware in order to test their effectiveness. We se-lected 30 different families (class/type) of Androidmalware from [34], and used DroidChameleon [37]to generate 210 malware variants based on them.DroidChameleon applies the following 7 obfuscations:NOP = no-operation insertion; CND = call indirec-tion; FND = function indirection; RDI = removingdebug information; RVD = reversing order; RNF =renaming fields; RNM = renaming methods.

In total, we generated 210 variants, but 18 of themwere not able to run on the new ART runtime. Droid-Chameleon was implemented and tested using theDalvik-VM, and apparently some of the variants re-sult in bytecode that is consider invalid by ART. Assuch, we excluded these variants, leaving 192 sam-ples.

We tested 16 commercial anti-malware programsand DroidNative with these 192 Android malwarevariants. For testing DroidNative we included onlythe original 30 malware samples as the training

dataset and the other 192 variants and 20 benignsamples as the testing dataset. DroidNative was ableto properly classify all benign samples and 191 out of192 malware samples. This is an overall DR = 99.48%and FPR = 0%. The DR of DroidNative reportedhere is the combination of both DroidNative-ACFGand DroidNative-SWOD, i.e, the malware samplesdetected by both.

Overall, only one malware variant sample was notdetected by DroidNative, and it belonged to classRVD. This class/obfuscation inserts goto instruc-tions that changes the CFG of a method/program.After further analysis of the original and the variantmalware samples we notice that the difference (thenumber of blocks and the control flow change perACFG) in ACFGs of the two samples was more than50%. As we discuss later in Section 4.6, DroidNa-tive may not be able to detect a malware employingexcessive control flow obfuscations. Therefore, Droid-Native was not able to detect this variant as a copyof the original malware sample.

We then tested a variety of commercial anti-malware systems with the same 192 variants. Theresults can be found in Table 3. The majority(11/16) of the commercial anti-malware tools per-formed very poorly, with overall DRs ranging from8.33% – 72.91%. Five of the tools performed sig-nificantly better, but were still not able to matchthe results of DroidNative. Overall, the commer-cial tools did better in our test than in previous re-ports [26, 37, 51], which shows that their detectiontechniques are continuously being improved.

4.5 Comparison with Existing Re-search

In this Section we compare three [20, 27, 50] stateof the art research efforts carried out in 2014 thatuse machine learning for Android malware variant de-tection. Very few works report the runtime of theirdetector, and to compare the runtime of our detec-tor we selected DroidSift [50]. We selected DroidLe-gacy [20], because of their use of n-fold cross valida-tion for evaluating the performance of their detector.Like DroidNative, DroidSift and AndroSimilar [27]

11

Page 12: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

Anti-malware Detection Rate (%)

NOP CND FND RDI RVD RNF RNM Overall

DroidNative 100 100 100 100 96.43 100 100 99.48

Kaspersky 92.85 92.85 92 92.59 96.43 93.1 92.59 93.22

Sophos 92.85 92.85 92 85.18 96.43 93.1 92.59 92.18

AVG 78.57 85.71 92 92.59 78.57 93.1 92.59 87.5

BitDefender 85.71 92.58 84 85.18 85.71 86.2 85.18 86.45

DrWeb 85.71 53.57 84 85.18 89.28 79.31 85.18 80.31

ESET 89.28 53.57 88 88.88 17.85 89.65 85.18 72.91

Microsoft 39.28 75 32 33.33 32.14 34.48 33.33 40.1

VIPRE 17.85 57.14 16 14.81 14.28 13.79 14.81 21.35

Symantec 14.28 53.57 8 3.7 7.14 10.34 7.4 15.1

Qihoo-360 10.71 53.57 8 7.4 7.14 6.89 7.4 14.58

Fortinet 10.71 53.57 8 7.4 7.14 6.89 7.4 14.58

TrendMicro 3.57 53.57 4 7.4 3.57 3.57 3.7 11.45

McAfee 7.14 53.57 4 3.7 3.57 3.57 3.7 11.45

TotalDefense 7.14 53.57 0 0 0 0 0 8.85

Malwarebytes 3.57 53.57 0 0 0 0 0 8.33

Panda 3.57 53.57 0 0 0 0 0 8.33

Table 3: Detection results of the 16 commercial anti-malware programs and DroidNative tested with 192 variants of 30 malwarefamilies.

Technique DR FPR Dataset size Runtime Androidbenign/ per nativemalware sample binary

DroidNative 99.16% 1.3% 1000 / 1240 26.87 sec 3

DroidSift [50] 93% 5.15% 2100 / 193 175.8 sec 7

DroidLegacy [20] 92.73% 20.83% 48 / 743 7 7

AndroSimilar [27] 76.48% 1.46% 21132 / 455 7 7

Table 4: Comparison with other malware detection techniques discussed in Section 5. Out of the 1240 malware samples, 1000were used for training and 240 for testing.

also test their techniques on malware variants gener-ated through obfuscations.

Table 4 gives a comparison of DroidNative withthese three approaches. For DroidSift, we have onlyreported the DR of their malware variant detector,because their anomaly detector does not provide au-tomation of malware detection. In DroidLegacy thesame 48 applications were used in each of the 10-folds(confirmed through e-mail by the authors). Duringthe n-fold testing performed in the paper there werea total of 10 different benign applications that weredetected as malware, therefore, according to our com-putation the actual FPR = 10/48 = 0.2083.

The application of DroidSift and AndroSimilar togeneral malware and impartiality towards the testingdataset is not validated through n-fold cross valida-tion or other such testing approaches. Their testing

may also suffer from the possibility of data overfit-ting. DroidLegacy needs a larger number of benignsamples (more than 48) than tested in the paper forfurther validation. None of the other techniques cananalyze Android native binaries. Out of the threetechniques, only AndroSimilar has the potential tobe used in a real-time detector but has poor DR re-sults. DroidNative, on the other hand, is automated,fast, platform independent, and can analyze and de-tect Android native code malware.

The techniques proposed in DroidSift and DroidLe-gacy are very similar to DroidNative-ACFG. The dif-ference is that DroidNative-ACFG captures controlflow with patterns of the Android application’s na-tive binary code, whereas DroidSift and DroidLegacycapture the API call graph from the Android applica-tion’s bytecode, and may not be able to detect mal-

12

Page 13: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

ware applications that obfuscate by using fake APIcalls, hiding API calls (e.g, using class loading to loadan API at runtime), inlining APIs (e.g, instead ofcalling an external API, include the complete APIcode inside the application), and reflection (i.e, cre-ation of programmatic class instances and/or methodinvocation using the literal strings, and a subsequentencryption of the class/method name can make it im-possible for any static analysis to recover the call).

In DroidSift, during training the malware graphsthat were common to the benign graphs are removed,i.e, there can be malware applications whose APIcalls are similar to the benign applications; hencetheir technique may not detect such malware applica-tions. The anomaly detector proposed in DroidSift isbased on the behavior of known benign applications.Keeping track of benign applications is impracticalin real life, and also makes it more complex in termsof resources (time and memory) required than thetechnique proposed in this paper. Hence, DroidSiftis much less efficient than DroidNative from a perfor-mance perspective. The concept used in DroidSift foranomaly detection is similar to the use of Whitelists,and will suffer from the same issues, such as diffi-cultly to collect and delay between discovery and up-dating. Our technique is based on the behavior ofknown malware applications, hence is more practicaland lighter in terms of runtime. Unlike DroidNative,however, their anomaly detector may be able to de-tect zero-day malware.

DroidNative-SWOD is similar to AndroSimilar[27], which is based on SDHash [38] and is most likelyto detect very similar objects, and that’s why it pro-duces low DRs. DroidNative-SWOD is a trade offbetween accuracy and efficiency, and detects similarobjects at a coarse level, therefore it produces highDRs.

In terms of variant detection, DroidSift was testedwith 23 variants of DroidDream, only 1 family (listedin Table 2) of malware, and achieved an overall DR =100%. They used one of the obfuscations describedin [37] to generate these variants, but it’s not clearwhat specific obfuscation was used. AndroSimilarwas tested with 1013 variants of 33 families (most ofthem are listed in Table 2) of malware, and achievedan overall DR = 80.65%. They used the following

4 obfuscations, method renaming, junk method inser-tion, goto insertion and string encryption, to gener-ate these variants. DroidNative is tested with 192variants of 30 families of malware generated using 7types of obfuscation, and achieved an overall DR =99.84%.

In terms of performance, only DroidSift reportedruntime results. Although we acknowledge that it isless than ideal to directly compare performance num-bers of two different systems tested on completelydifferent hardware, the fact that DroidNative is ∼6.5times faster is a stronger indicator of the system’s sig-nificant performance advantage. More thorough per-formance testing would require access to the sourcecode of the other systems, which was not available tous.

4.6 Limitations

DroidNative has some limitations, discussed in thissection.

Like all static analysis techniques, DroidNative re-quires that the application’s malicious code be avail-able for static analysis. Techniques such as packing(compressing and encrypting the code) and applica-tions that download their malicious code upon initialexecution cannot be properly analyzed by the system.

DroidNative may not be able to detect true zero-day malware. DroidNative excels at detecting vari-ants of malware that has been previously seen, andwill only detect a zero-day malware if its controlstructure is similar (upto a threshold) to an existingmalware sample in the saved/training database.

DroidNative may not be able to detect a malwareemploying excessive control flow obfuscations. Exces-sive here means changing the control flow of a pro-gram beyond a certain percentage (threshold). Ingeneral, such threshold is difficult to find, but is cal-culated for each different dataset. These control flowobfuscations can be: control flow flattening, obscur-ing the control flow logic of a program by flatten-ing the CFG; irreducible flow graphs, reducing theseflow graphs looses the original CFG; method inlin-ing, replacing a method call by the entire methodbody, which may increase the number of CFGs in theprogram; branch function/code, obscuring the CFG

13

Page 14: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

by replacing the branches by the address of a func-tion/code; and jump tables, artificial jump table orjumps in the table are created to change the CFG.

The pattern matching of two ACFGs may fail ifthe malware variant obfuscates a statement in a ba-sic block in a way that changes its MAIL pattern.DroidNative performs exact (100%) pattern match-ing for two ACFGs to match. To improve DroidNa-tive’s resilient to such obfuscations, in the future wewill use a threshold for pattern matching. We willalso investigate other pattern matching techniques,such as a statement dependency graph or assigningone pattern to multiple statements of different typeetc, to improve this resiliency.

5 Related Work

5.1 Native Code Analysis for OtherPlatforms

There are several efforts for detecting malware na-tive binaries for standard PCs. We briefly discusssome of these techniques. We also discuss the use ofintermediate languages for detecting malware nativebinaries.

In [44], a method is presented that uses model-checking to detect malware variants. They build aCFG of a binary program, which contains informa-tion about the register and the memory location val-ues at each control point of the program. Model-checking is time consuming and can run out of mem-ory. Times reported in the paper range from fewseconds (10 instructions) to over 250 seconds (10000instructions). [24] presents a method that uses CFGsfor visualizing the control structure and representingthe semantic aspects of a program. They extend theCFG with extracted API calls to have more informa-tion about the executable program. [33] proposes atechnique that checks similarities of call graphs (se-mantic signatures) to detect malware variants. Onlysystem calls are extracted from the binary to buildthe call graph. The call graph is reduced to reducethe processing time, but this also reduced the ac-curacy of the detector. [32] uses value set analysis(VSA) to detect malware variants. Value set anal-

ysis is a static analysis technique that keeps trackof the propagation and changes of values throughoutan executable. VSA is an expensive technique andcannot be used for real-time malware detection.

These techniques are compute intensive, and at-tempts to reduce the processing time tend to producepoor DRs, cannot handle smaller size malware, andare not suitable for real-time detection. Other tech-niques [14, 36, 39, 42] that mostly use opcode-basedanalysis for detecting malware variants have the po-tential to be used for real-time malware detection,but have several issues: The frequencies of opcodescan change by using different compilers, compiler op-timizations, architectures and operating systems; ob-fuscations introduced can change the opcode distri-butions; selecting too many features (patterns) fordetection results in a high detection rate but also in-creases the runtime.

5.1.1 Intermediate Languages

REIL [21] is being used for manual malware analysisand detection. Unhandled native instructions are re-placed with NOP instructions which may introduceinaccuracies in disassembling. Furthermore, REILdoes not translate FPU, MMX and SSE instruc-tions, nor any privileged instructions, because theseinstructions according to the authors are not yet be-ing used to exploit security vulnerabilities. SAIL [17]represents a CFG of the program under analysis. Anode in the CFG contains only a single SAIL instruc-tion, which can make the number of nodes in theCFG extremely large and therefore can make anal-ysis excessively slow for larger binary programs. InVINE [43], the final translated instructions have allthe side effects explicitly exposed as VINE instruc-tions, that makes this approach general but also dif-ficult to maintain platform independence and less ef-ficient for specific security applications such as mal-ware detection. CFGO-IL [13] simplifies transforma-tion of a program in the x86 assembly language to aCFG. By exposing all the side effects in an instruc-tion, CFGO-IL faces the same problem as VINE-IL.Furthermore, the size of a CFGO-IL program tendsto be much larger than the original assembly pro-gram. WIRE [16] is a new intermediate language

14

Page 15: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

and like MAIL is specifically designed for malwareanalysis. WIRE does not explicitly specify indirectjumps, making malware detection more complicated.Furthermore, the authors do not mention anythingabout the side effects of the WIRE instructions, andis not clear how the language is used to automate themalware analysis and detection process.

In contrast to other languages: side-effects areavoided in MAIL, making the language much simplerand providing the basis for efficient malware detec-tion; control flow with patterns in MAIL provide fur-ther optimization and automation of malware detec-tion; publicly available formal model and tool makesMAIL easier to use.

5.2 Detection of Android MalwareVariants

Because of the popularity of Android devices, thereare several research efforts on malware detection forAndroid. For a comprehensive survey of Androidmalware detection techniques the reader is referredto [25]. Here, we only discuss the three [20, 27, 50]research efforts that are also used for comparison inSection 4.5.

[50] detects unknown malware using anomaly de-tection. They generate an API call graph and assignweights based on the data dependency and frequencyof security related API calls. A graph of an applica-tion is matched against a graph database of benignapplications, and if a matching graph is not foundthen an anomaly is reported. These reports are thensent to a human expert for verification. For knownmalware, a signature is generated for each malwaresample using the similarity score of the graphs. Aclassifier is used for signature detection, and is calledthe malware variant detector.

In general, changing (obfuscating) control flow pat-terns is more difficult (i.e, it needs a comprehensivechange in the program) than changing just API callpatterns of a program to evade detection. ACG basedtechniques look for specific API call patterns (includ-ing call sequences) in Android malware programs forthere detection, which may also be present in An-droid benign applications that are protected againstreverse engineering attacks. These API call patterns

can be packing/unpacking, calling a remote server forencryption/decryption, dynamic loading, and systemcalls, etc. Moreover, sometimes, e.g, for stripped na-tive binaries of Android, it becomes impossible tobuild a correct ACG of a program.

[20] uses API call graph signatures and machinelearning to identify piggybacked applications. Piggy-backing can be used to inject malicious code into abenign application. First a signature is generated ofa benign application and then it is used to identify ifanother application is a piggyback of this applicationor not. The technique used has the same limitationsas discussed above about the detection schemes basedon API calls.

[27] is based on SDHash [38], a statistical approachfor selecting fingerprinting features. SDHash relieson entropy estimates and an empirical study, andis most likely to pick features unique to a data ob-ject. Therefore the results in the paper have a betterfalse positive rate. Although the FPR reported in[27] is low (1.46%), because of the SDHash techniqueused, whose main goal is to detect very similar dataobjects, the ability of detecting malware variants ismuch lower than the technique proposed in this pa-per.

5.3 App Analysis on iOS

One other effort [22] targets Apple’s iOS, and per-forms static binary analysis by building a CFG fromthe Objective-C binaries. The main goal of the workis to detect privacy leaks and not malware analysisand detection. They perform complex data flow anal-ysis such as backward slicing and others which aretime consuming and are not suitable for real-timemalware detection. There are some differences be-tween Android applications and Apple iOS applica-tions, such as the use of C, C++ and Java in Androidand Objective-C in iOS, and the use of components inAndroid and use of message passing in iOS, etc. Be-cause of these differences, binary analysis of these ap-plications have significant differences and challenges.

15

Page 16: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

6 Conclusion

In this paper we have proposed DroidNative for thedetection of both bytecode and native code Androidmalware variants. DroidNative uses control flow withpatterns, implements and adapts the two techniquesACFG and SWOD to reduce the effect of obfusca-tions; uses the language MAIL to provide automationand platform independence; and has the potential tobe used for real-time malware detection. To the bestof our knowledge, none of the existing static analysistechniques deal with the detection of Android nativecode malware, and this is the first research effort todetect such malware. Through n-fold cross validationand other test evaluations, DroidNative shows supe-rior results for the detection of Android native codeand malware variants compared to the other researchefforts and the commercial tools.

References

[1] Android Runtime (ART). http://en.

wikipedia.org/wiki/Android_Runtime.

[2] Dalvik Virtual Machine. http://en.

wikipedia.org/wiki/Android_Runtime.

[3] IDA Pro. https://www.hex-rays.com/

products/ida/index.shtml.

[4] Removed Due to Double Blind Submission.

[5] Symantec cybercrime and Internet securitythreat report, 2012.

[6] Symantec cybercrime and Internet securitythreat report, 2014.

[7] F-Secure mobile threat report, Q1 2014.

[8] A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ull-man. Compilers: Principles, Techniques, andTools (2nd Edition). Addison-Wesley, Inc., 2006.

[9] S. Alam. MAIL: Malware Analysis IntermediateLanguage. http://web.uvic.ca/~salam/PhD/

TR-MAIL.pdf, 2014.

[10] S. Alam, R. N. Horspool, and I. Traore. MAIL:Malware Analysis Intermediate Language - AStep Towards Automating and Optimizing Mal-ware Detection. In Security of Information andNetworks. ACM SIGSAC, 2013.

[11] S. Alam, I. Sogukpinar, I. Traore, andR. Nigel Horspool. Sliding Window and ControlFlow Weight for Metamorphic Malware Detec-tion. Journal of Computer Virology and HackingTechniques, 2014.

[12] S. Alam, I. Traore, and I. Sogukpinar. Anno-tated Control Flow Graph for Metamorphic Mal-ware Detection. The Computer Journal - SectionD: Security in Computer Systems and Networks,2014.

[13] S. Anju, P. Harmya, N. Jagadeesh, andR. Darsana. Malware detection using assemblycode and control flow graph optimization. InACM-W. ACM, 2010.

[14] T. H. Austin, E. Filiol, S. Josse, and M. Stamp.Exploring Hidden Markov Models for VirusAnalysis: A Semantic Approach. In HICSS,2013.

[15] I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani. Crowdroid: Behavior-based MalwareDetection System for Android. In SPSM. ACM,2011.

[16] S. Cesare and Y. Xiang. Wire–a formal interme-diate language for binary analysis. In TrustCom,2012. IEEE, 2012.

[17] M. Christodorescu, S. Jha, S. Seshia, D. Song,R. E. Bryant, et al. Semantics-aware malwaredetection. In Security and Privacy, 2005 IEEESymposium on. IEEE, 2005.

[18] M. Christodorescu, S. Jha, S. A. Seshia, D. Song,and R. E. Bryant. Semantics-Aware MalwareDetection. In Security and Privacy. IEEE Com-puter Society, 2005.

[19] C. Collberg, C. Thomborson, and D. Low.A Taxonomy of Obfuscating Transformations.Technical report, University of Auckland, 1997.

[20] L. Deshotels, V. Notani, and A. Lakhotia. Droi-dLegacy: Automated Familial Classification ofAndroid Malware. In SIGPLAN. ACM, 2014.

[21] T. Dullien and S. Porst. Reil: A platform-independent intermediate representation of dis-assembled code for static code analysis. Proceed-ing of CanSecWest, 2009.

[22] M. Egele, C. Kruegel, E. Kirda, and G. Vigna.PiOS: Detecting Privacy Leaks in iOS Applica-tions. In NDSS, 2011.

16

Page 17: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

[23] W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox,J. Jung, P. McDaniel, and A. N. Sheth. Taint-Droid: an information flow tracking system forreal-time privacy monitoring on smartphones.Communications of the ACM, 2014.

[24] M. Eskandari and S. Hashemi. A Graph Min-ing Approach for Detecting Unknown Malwares.Journal of Visual Languages and Computing,2012.

[25] P. Faruki, A. Bharmal, V. Laxmi, V. Ganmoor,M. Gaur, M. Conti, and R. Muttukrishnan. An-droid Security: A Survey of Issues, MalwarePenetration and Defenses. Communications Sur-veys Tutorials, IEEE, 2015.

[26] P. Faruki, A. Bharmal, V. Laxmi, M. Gaur,M. Conti, and M. Rajarajan. Evaluation of An-droid Anti-malware Techniques against DalvikBytecode Obfuscation. In TrustCom, 2014.

[27] P. Faruki, V. Laxmi, A. Bharmal, M. Gaur, andV. Ganmoor. Androsimilar: Robust signaturefor detecting variants of android malware. Jour-nal of Information Security and Applications,2014.

[28] M. Fredrikson, S. Jha, M. Christodorescu,R. Sailer, and X. Yan. Synthesizing Near-Optimal Malware Specifications from SuspiciousBehaviors. In Security and Privacy. IEEE Com-puter Society, 2010.

[29] J. L. Gross and J. Yellen. Graph Theory and ItsApplications, Second Edition. Chapman & Hall,London, UK, 2005.

[30] J. Huang, X. Zhang, L. Tan, P. Wang, andB. Liang. AsDroid: detecting stealthy behav-iors in Android applications by user interfaceand program behavior contradiction. In ICSE,2014.

[31] C. Kolbitsch, P. M. Comparetti, C. Kruegel,E. Kirda, X. Zhou, and X. Wang. Effective andEfficient Malware Detection at the End Host. InUSENIX, 2009.

[32] F. Leder, B. Steinbock, and P. Martini. Classi-fication and Detection of Metamorphic MalwareUsing Value Set Analysis. In MALWARE, 2009.

[33] J. Lee, K. Jeong, and H. Lee. Detecting Meta-morphic Malwares Using Code Graphs. In SAC,2010. ACM, 2010.

[34] M. Parkour. Mobile Malware Dump. http://

contagiominidump.blogspot.com.

[35] S. Poeplau, Y. Fratantonio, A. Bianchi,C. Kruegel, and G. Vigna. Execute this! analyz-ing unsafe and malicious dynamic code loadingin android applications. In NDSS, 2014.

[36] B. Rad, M. Masrom, and S. Ibrahim. OpcodesHistogram for Classifying Metamorphic PortableExecutables Malware. In ICEEE, 2012.

[37] V. Rastogi, Y. Chen, and X. Jiang. Droid-Chameleon: Evaluating Android Anti-malwareAgainst Transformation Attacks. In ASIA CCS.ACM, 2013.

[38] V. Roussev. Data Fingerprinting with Similar-ity Digests. In K.-P. Chow and S. Shenoi, edi-tors, Advances in Digital Forensics VI. Springer,2010.

[39] N. Runwal, R. M. Low, and M. Stamp. OpcodeGraph Similarity and Metamorphic Detection.Journal in Computer Virology, 2012.

[40] B. Schwarz, S. Debray, and G. Andrews. Dis-assembly of Executable Code Revisited. InWCRE, Washington, DC, USA, 2002. IEEEComputer Society.

[41] S.-H. Seo, A. Gupta, A. Mohamed Sallam,E. Bertino, and K. Yim. Detecting mobile mal-ware threats to homeland security through staticanalysis. Journal of Network and Computer Ap-plications, 2014.

[42] G. Shanmugam, R. M. Low, and M. Stamp.Simple substitution distance and metamorphicdetection. Journal of Computer Virology andHacking Techniques, 2013.

[43] D. Song, D. Brumley, H. Yin, J. Caballero,I. Jager, M. G. Kang, Z. Liang, J. Newsome,P. Poosankam, and P. Saxena. Bitblaze: A newapproach to computer security via binary anal-ysis. In Information systems security. Springer,2008.

[44] F. Song and T. Touili. Efficient Malware De-tection Using Model-Checking. In D. Gian-nakopoulou and D. Mry, editors, Formal Meth-ods. Springer, 2012.

17

Page 18: Zhengyang Qu Ryan Riley arXiv:1602.04693v2 [cs.CR] 21 Feb …pages.cs.wisc.edu/~vrastogi/static/papers/aqrcr17.pdf · isting malware, the detection rate is 99.48%, sig-ni cantly surpassing

[45] G. Suarez-Tangil, J. E. Tapiador, P. Peris-Lopez,and J. Blasco. Dendroid: A text mining ap-proach to analyzing and classifying code struc-tures in Android malware families. Expert Sys-tems with Applications, 2014.

[46] M. Sun and G. Tan. NativeGuard: ProtectingAndroid Applications from Third-party NativeLibraries. In WiSec. ACM, 2014.

[47] X. Sun, Y. Zhongyang, Z. Xin, B. Mao, andL. Xie. Detecting Code Reuse in Android Appli-cations Using Component-Based Control FlowGraph. In ICT. Springer, 2014.

[48] L. K. Yan and H. Yin. DroidScope: Seam-lessly Reconstructing the OS and Dalvik Seman-tic Views for Dynamic Android Malware Analy-sis. In USENIX, 2012.

[49] C. Yang, Z. Xu, G. Gu, V. Yegneswaran, andP. Porras. Droidminer: Automated mining andcharacterization of fine-grained malicious be-haviors in android applications. In ComputerSecurity-ESORICS. Springer, 2014.

[50] M. Zhang, Y. Duan, H. Yin, and Z. Zhao.Semantics-Aware Android Malware Classifica-tion Using Weighted Contextual API Depen-dency Graphs. In CCS. ACM, 2014.

[51] M. Zheng, P. P. Lee, and J. C. Lui. ADAM: anautomatic and extensible platform to stress testandroid anti-virus systems. In DIMVA. 2013.

[52] M. Zheng, M. Sun, and J. Lui. Droid Analytics:A Signature Based Analytic System to Collect,Extract, Analyze and Associate Android Mal-ware. In TrustCom. IEEE, 2013.

[53] Y. Zhou and X. Jiang. Dissecting android mal-ware: Characterization and evolution. In Secu-rity and Privacy. IEEE, 2012.

[54] Y. Zhou, Z. Wang, W. Zhou, and X. Jiang. Hey,You, Get Off of My Market: Detecting MaliciousApps in Official and Alternative Android Mar-kets. In NDSS, 2012.

18