smartgen: exposing server urls of mobile apps with ...lin.3021/file/ · “select pg_sleep(5);”,...

44
Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References SMARTGEN: Exposing Server URLs of Mobile Apps with Selective Symbolic Execution Chaoshun Zuo Zhiqiang Lin Department of Computer Science University of Texas at Dallas April 6th, 2017

Upload: others

Post on 01-Feb-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    SMARTGEN: Exposing Server URLs of Mobile Appswith Selective Symbolic Execution

    Chaoshun Zuo Zhiqiang Lin

    Department of Computer ScienceUniversity of Texas at Dallas

    April 6th, 2017

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Server URLs

    https://www.google.com/search?q=www+2017

    A URL includes1 Domain name2 Resource path3 Query parameters4 ...

    Security Applications1 Hidden service identification2 Malicious website detection3 Server vulnerability fuzzing4 ...

    https://www.google.com/search?q=www+2017

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Server URLs

    https://www.google.com/search?q=www+2017

    A URL includes1 Domain name2 Resource path3 Query parameters4 ...

    Security Applications1 Hidden service identification2 Malicious website detection3 Server vulnerability fuzzing4 ...

    https://www.google.com/search?q=www+2017

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Server URLs

    https://www.google.com/search?q=www+2017

    A URL includes1 Domain name2 Resource path3 Query parameters4 ...

    Security Applications1 Hidden service identification2 Malicious website detection3 Server vulnerability fuzzing4 ...

    https://www.google.com/search?q=www+2017

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Browsers’ URLs vs. Mobile Apps’ URLs

    Source: cloudxtension.com

    cloudxtension.com

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Browsers’ URLs vs. Mobile Apps’ URLs

    Source: cloudxtension.com

    cloudxtension.com

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Security Implications of the URLs in Mobile Apps

    Source: cloudxtension.com

    1 Hiding the URLs may allow theservers to collect some privatesensitive information

    2 Mobile apps may talk to someunwanted services (e.g.,malicious ads sites)

    3 False illusions (securitythrough obscurity) to the appdevelopers that their servicesare secure (server URLs arehidden, none knows and nonewill attack (or fuzz) them).

    It is imperative to expose the server URLs from mobile apps

    cloudxtension.com

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Security Implications of the URLs in Mobile Apps

    Source: cloudxtension.com

    1 Hiding the URLs may allow theservers to collect some privatesensitive information

    2 Mobile apps may talk to someunwanted services (e.g.,malicious ads sites)

    3 False illusions (securitythrough obscurity) to the appdevelopers that their servicesare secure (server URLs arehidden, none knows and nonewill attack (or fuzz) them).

    It is imperative to expose the server URLs from mobile apps

    cloudxtension.com

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    A Movitating Example: ShopClues

    Figure: The password reset activity of ShopClues (between 10 millionand 50 million installs).

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    A Movitating Example: ShopClues

    PUT /api/v9/forgotpassword?key=d12121c70dda5edfgd1df6633fdb36c0HTTP/1.1Content-Type: application/jsonConnection: closeUser-Agent: Dalvik/1.6.0 (Linux; Android 4.2)Host: sm.shopclues.comAccept-Encoding: gzipContent-Length: 73

    {"user_email":”[email protected]","key":"d12121c70dda5edfgd1df6633fdb36c0"}

    There was an SQL injection vulnerabilityat this password reset interface

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Which Analysis We Should Use?Static Analysis vs. Dynamic Analysis vs. Symbolic Execution

    PUT /api/v9/forgotpassword?key=d12121c70dda5edfgd1df6633fdb36c0HTTP/1.1Content-Type: application/jsonConnection: closeUser-Agent: Dalvik/1.6.0 (Linux; Android 4.2)Host: sm.shopclues.comAccept-Encoding: gzipContent-Length: 73

    {"user_email":”[email protected]","key":"d12121c70dda5edfgd1df6633fdb36c0"}

    Static AnalysisStringcantenationCrypto keys

    Dynamic AnalysisRandom inputsIncompleteness...

    Symbolic ExecutionSystematicAutomated...

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Which Analysis We Should Use?Static Analysis vs. Dynamic Analysis vs. Symbolic Execution

    PUT /api/v9/forgotpassword?key=d12121c70dda5edfgd1df6633fdb36c0HTTP/1.1Content-Type: application/jsonConnection: closeUser-Agent: Dalvik/1.6.0 (Linux; Android 4.2)Host: sm.shopclues.comAccept-Encoding: gzipContent-Length: 73

    {"user_email":”[email protected]","key":"d12121c70dda5edfgd1df6633fdb36c0"}

    Static AnalysisStringcantenationCrypto keys

    Dynamic AnalysisRandom inputsIncompleteness...

    Symbolic ExecutionSystematicAutomated...

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Which Analysis We Should Use?Static Analysis vs. Dynamic Analysis vs. Symbolic Execution

    PUT /api/v9/forgotpassword?key=d12121c70dda5edfgd1df6633fdb36c0HTTP/1.1Content-Type: application/jsonConnection: closeUser-Agent: Dalvik/1.6.0 (Linux; Android 4.2)Host: sm.shopclues.comAccept-Encoding: gzipContent-Length: 73

    {"user_email":”[email protected]","key":"d12121c70dda5edfgd1df6633fdb36c0"}

    Static AnalysisStringcantenationCrypto keys

    Dynamic AnalysisRandom inputsIncompleteness...

    Symbolic ExecutionSystematicAutomated...

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Which Analysis We Should Use?Static Analysis vs. Dynamic Analysis vs. Symbolic Execution

    PUT /api/v9/forgotpassword?key=d12121c70dda5edfgd1df6633fdb36c0HTTP/1.1Content-Type: application/jsonConnection: closeUser-Agent: Dalvik/1.6.0 (Linux; Android 4.2)Host: sm.shopclues.comAccept-Encoding: gzipContent-Length: 73

    {"user_email":”[email protected]","key":"d12121c70dda5edfgd1df6633fdb36c0"}

    Static AnalysisStringcantenationCrypto keys

    Dynamic AnalysisRandom inputsIncompleteness...

    Symbolic ExecutionSystematicAutomated...

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Symbolic ExecutionGenerating Inputs Based on Program Code

    1 package com.shopclues;23 class y implements View$OnClickListener {4 EditText b;5 ...6 public void onClick(View arg5) {7 String v0 = this.b.getText().toString().trim();8 if(v0.equalsIgnoreCase("")) {9 Toast.makeText(this.a, "Email Id should not be

    empty", 1).show();10 }11 else if(!al.a(v0)) {12 Toast.makeText(this.a, "The email entered is not

    a valid email", 1).show();13 }14 else if(al.b(this.a)) {15 this.a.c = new ac(this.a, v0);16 this.a.c.execute(new Void[0]);17 }18 else {19 Toast.makeText(this.a, "Please check your

    internet connection", 1).show();20 }21 }22 }

    23 package com.shopclues.utils;2425 public class al {26 ...27 public static boolean a(String arg1) {28 boolean v0;29 if(arg1 == null) {30 return false;31 }32 else {33 v0 = Patterns.EMAIL_ADDRESS.

    matcher(((CharSequence)arg1)).matches();34 }35 return v0;36 }37 }

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Various Constraints in Mobile Apps

    Various Constraints1 Two text-box’s inputs need to be equivalent2 The “age” needs to be greater than 183 A “zip code” needs to be a five digit sequence4 A “phone number” needs to be a phone number5 A file name extension needs to be some type (e.g., jpg)6 ...

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Introducing SMARTGEN

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    AutomatedSystematicScalable

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Introducing SMARTGEN

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    Static analysisSelective symbolic executionDynamic analysis

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Static Analysis

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    Using soot [soo] frameworkBuilding extended call graph (ECG)EdgeMiner [CFB+15] for callbacks

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Selective Symbolic Execution

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    Data flow analysis (w/FlowDroid [ARF+14])Extract the path constraintsSolve them w/ Z3-str [ZZG13]

    Why Selective: only on theexecution path of networksending APIs (to trigger therequest messages)

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Selective Symbolic Execution

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    Data flow analysis (w/FlowDroid [ARF+14])Extract the path constraintsSolve them w/ Z3-str [ZZG13]

    Why Selective: only on theexecution path of networksending APIs (to trigger therequest messages)

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Runtime Instrumentation

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    System code static rewritingRepackaging the appsSystem debugging tool adb

    A new approach thatleverages API hookingand Java reflection

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Runtime Instrumentation

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    System code static rewritingRepackaging the appsSystem debugging tool adb

    A new approach thatleverages API hookingand Java reflection

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Runtime Instrumentation

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Runtime Instrumentation

    1

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Runtime Instrumentation

    1

    2

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Runtime Instrumentation

    1

    2

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Runtime Instrumentation

    1

    2

    3

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Security Applications

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    Security Applications

    SQL InjectionCross Site ScriptingOthers (e.g., malicious URL detection)

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    SQL Injection

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    Fuzzing (SQL Injection)

    “SELECT PG_SLEEP(5);”, “SELECT PG_SLEEP(10);”“’;WAITFOR DELAY ’0:0:5’-”“;SELECT COUNT(*) FROM SYSIBM.SYSTABLES”

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Malicious URL Detection

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    URLs

    Dynamic Analysis

    URLs Classification

    Malware sitesCompromised sitesVirusTotal provides services for these detections

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Overall Experimental Results

    Item Value# Apps 5,000Size of the Dataset (G-bytes) 126.2Time of the first two phases analyses (s) 90,143 (25 hours)# Targeted API Calls 147,327# Constraints 47,602# UI Configuration files generated 25,030Time of Dynamic Analysis (s) 486,446 (135 hours)# Request Messages 257,755# Exposed URLs 297,780# Unique Domains 18,193Logged Message Size (G-bytes) 24.0Σ Malicious URLs 8,634

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Overall Experimental Results

    Item Value# Apps 5,000Size of the Dataset (G-bytes) 126.2Time of the first two phases analyses (s) 90,143 (25 hours)# Targeted API Calls 147,327# Constraints 47,602# UI Configuration files generated 25,030Time of Dynamic Analysis (s) 486,446 (135 hours)# Request Messages 257,755# Exposed URLs 297,780# Unique Domains 18,193Logged Message Size (G-bytes) 24.0Σ Malicious URLs 8,634

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Statistics on the Extracted String Constraints

    Constraints Name # ConstraintsNot null 25,855String_length 13,858String_isEmpty 377String_contains 196String_contentEquals 43String_equals 3,087String_equalsIgnoreCase 991String_matches 448String_endsWith 11String_startsWith 64TextUtils_isEmpty 2,355Matcher_matches 317

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Comparison w/ Monkey [mon]WWW’17, April 2017, Perth, Australia Chaoshun Zuo and Zhiqiang Lin

    Table 1: Summary of the Performance of SmartGen.

    Item Value# Apps 5, 000Size of the Dataset (G-bytes) 126.2Time of the rst two phases analyses (s) 90, 143# Targeted API Calls 147, 327# Constraints 47, 602# UI Con guration les generated 25, 030Time of Dynamic Analysis (s) 486, 446# Request Messages 257, 755# Exposed URLs 297, 780# Unique Domains 18, 193Logged Message Size (G-bytes) 24.0

    Table 2: Statistics of the Extracted String Constraints

    Constraints Name # ConstraintsNot null 25, 855String length 13, 858String isEmpty 377String contains 196String contentEquals 43String equals 3, 087String equalsIgnoreCase 991String matches 448String endsWith 11String startsWith 64TextUtils isEmpty 2, 355Matcher matches 317

    seconds on average. During this analysis, it identi ed 147, 327 callsto the targeted APIs, extracted 47, 602 constraints, and generated25, 030 UI con guration les based on the solved constraints.Meanwhile, the details of the extracted string constraints are

    presented in Table 2. (Note that we also encountered other integerconstraints, such as when a value needs to be greater than 18; thedetails of these constraints are not presented here). We noticethat, interestingly, there are many “Not null” constraints. !is isactually because during an app execution, NullPointermay causecrashes and developers (or the system code) thus check it very o"en.Meanwhile, to validate whether a UI item contains a user input,we noticed developers also o"en use a String length constraint(to make sure it is not 0). Some apps also use String lengthto validate phone number input. Also, we found some apps justuse String contains with “@” to validate an email address input,and some other apps use sophisticated regular expression (e.g.,Matcher matches) for the matching.With the solved constraints, we then performed dynamic anal-

    ysis on each app in our Galaxy phone. In total, it took 486, 446seconds (i.e., 135 hours) to execute these 5, 000 apps (each appneeded 97 seconds on average). Note that among the 97 seconds,the installation and uninstallation time is on average 17 seconds.However, if we execute an app inside an emulator, the installationtime for an app with 25M will take about 60 seconds. !at is oneof the reasons why we designed SmartGen to use a real phone.During the dynamic analysis, we observed 257, 755 request mes-sages (55.7% uses HTTP protocol, and 44.3% uses HTTPS) generatedby the tested apps, and in total 297, 780 URLs in both request andresponse messages. Among them, there are 18, 193 unique domainsin these URLs. !e nal size for all the traced request and responsemessages collected at our proxy is 24.0 GB.Comparison with Monkey. To understand the contribution ofour selective symbolic execution, we compare SmartGen with a

    Execution T

    ime

    # Request M

    essages

    # Exposed

    URLs

    # Unique D

    omains

    Logged M

    essage Size

    0%

    100%

    200%

    300%RelativePerfo

    rmance

    w/ Monkeyw/ SmartGen

    Figure 6: Comparison between SmartGen and Monkey.

    widely used dynamic analysis tool Monkey [7]. At a high level,Monkey is a program, executed in an emulator or a real phone,which can generate pseudo-random streams of user events, suchas clicks, touches, or gestures, as well as a number of system-levelevents, all for app testing. For a fair comparison, we also run Mon-key in our real Galaxy phone to test each of our app, and con gureMonkey to generate 2, 000 events under the time interval of 100milliseconds. !at is, for each app, Monkey will take up to 200seconds to just test it.

    In total it took 1, 083, 530 seconds (i.e., 301 hours) to process theseapps. Each app took on average 216.7 seconds (among them around200 seconds for the testing, and 17 seconds for the installation anduninstallation). We have to also note that it is not 100% automatedwhile using Monkey for the testing. !is is because Monkey ran-domly sends events to the system without specifying the recipients.!ese random inputs may click system bu$ons, which may lockthe screen, turn o& the network connection, and even shutdownthe phone. !erefore, we disabled the screen locking functionality,and also developed a daemon program to constantly check the In-ternet connectivity and turn on the networking if necessary, butwe cannot disable the phone power o& event and must manuallypower on the phone. !is is the only event Monkey cannot handleautomatically and we encountered 17 phone power o& events. Weexcluded the power-o& and restart time in our evaluation in thiscase. For all these tested apps, with Monkey they generated 79, 778request messages, with 6, 384 domain names. !e total size of thelogged message is 12.8 GB.A detailed comparison between SmartGen and Monkey for

    these tested apps is presented in Fig. 6. We compare them basedon their execution time, the total number of request messagesgenerated, the total number of domains in the requested message,and nally the total size of the request message. We can see thatSmartGen only took 53%, i.e., (90, 143+486, 446)/1, 083, 530, of theexecution time of Monkey, but it generates 3.2X request messages,2.3X unique URLs, 1.9X unique domains, and 1.9X logged messagesize, compared to the result from Monkey.

    5.3 Harmful URL DetectionHaving so many URLs from the top 5, 000 mobile apps, we are theninterested in whether there is any harmful URLs. To this end, wesubmi$ed all of the exposed 297, 780 URLs to harmful URL detection

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Security Application: Malicious URL detectionSmartGen: Exposing Server URLs of Mobile Apps With Selective Symbolic Execution WWW’17, April 2017, Perth, Australia

    Table 3: Statistics of Harmful URLs Detected by Each Engine

    Detection #Phishing #Malware #Malicious Σ #HarmfulEngine Sites Sites URLsADMINUSLabs 0 0 4 4AegisLab WebGuard 0 0 1 1AutoShun 0 0 863 863Avira 2062 941 0 3003BitDefender 0 191 0 191Blueliv 0 0 5 5CLEAN MX 0 0 14 14CRDF 0 0 150 150CloudStat 0 0 1 1Dr.Web 0 0 2330 2330ESET 0 75 0 75Emsiso 1 43 0 44Fortinet 8 469 0 477Google Safebrowsing 0 13 2 15Kaspersky 0 2 0 2Malwarebytes hpHosts 0 1103 0 1103ParetoLogic 0 800 0 800!ick Heal 0 0 2 2!"era 0 0 6 6SCUMWARE.org 0 8 0 8Sophos 0 0 56 56Sucuri SiteCheck 0 0 248 248#reatHive 0 0 8 8Trustwave 0 0 80 80Websense #reatSeeker 0 0 56 56Yandex Safebrowsing 0 173 0 173Σ#Harmful URLs 2071 3818 3826 9715Σ#Unique Harmful URLs 2071 3722 3228 8634

    Table 4: # Engines of Harmful URLs

    Detected by # Engines # Unique Harmful URLs8 17 16 25 134 633 332 7511 7770Σ Unique Harmful URLs 8634

    service at VirusTotal, which then further identi%ed 8, 634 uniqueharmful URLs. Note that VirusTotal has integrated 68 maliciousURL scanners (as time of this experiment), and each submi"ed brandnew URL is analyzed by all of the scanners. #e scanners that haveidenti%ed at least one harmful URLs are reported in the %rst columnof Table 3, followed by the number of Phising sites, the number ofmalware (i.e., the URL is identi%ed as malware), and the number ofmalicious sites from the 2nd to the 4th columns, respectively. #elast column reports the total number of harmful URLs identi%ed bythe corresponding scanners, and the last row reports the numberof unique URLs in each category. #e total number of uniquemalicious URL is 8, 634 because there are 387 sites being detectedboth malware and malicious. Also, note that one harmful URL canbe identi%ed by several engines. #at is, there are some overlappedURLs in the last column of Table 3. To clearly show those overlaps,we present the number of harmful URLs and the number of enginesthat recognize those harmful URLs in Table 4. Interestingly, wecan see that most harmful URLs are detected by just one of theengines, and only one URL is detected simultaneously by 8 engines.Based on the timestamp of the queried result from VirusTotal, wenotice that 83% of the URLs are the %rst time analyzed by VirusTotal.Among the detected 8, 634 URLs, we also notice that 84% of themare new harmful URLs (because of our research).

    While we could just trust the detection result from VirusTotal,to con%rm indeed these URLs are malicious we manually examinedthe one that has been identi%ed by 8 engines. Interestingly, thisURL actually points to an APK %le. We then visited this URL anddownloaded the APK. We also submi"ed this suspicious APK %le toVirusTotal, and this time, 14 out of 55 %le scanners reported that thisAPK is malicious. We reverse engineered this %le and found it triedto acquire the root privilege of the phone by exploiting the kernelvulnerabilities, which undoubtedly proved it is a harmful URL.

    6 LIMITATIONS AND FUTUREWORKSmartGen clearly has limitations. First, there might be some miss-ing path in ECG (if an edge is missed by EdgeMiner [16]), or infea-sible paths that cannot be solved (currently our solver terminatesif it cannot provide any result a er 300 seconds). Second, not allof the app activities have been explored, especially if there is anaccess control in the app. More speci%cally, certain app activitiesare only displayed if the user has successfully logged in. However,SmartGen did not perform any automatic registration with these5, 000 apps, and it is certainly not able to trigger these activities.#erefore, how to trigger these activities for a given mobile app isone of our immediate future works.Currently, we only demonstrated how to use the exposed URLs

    to detect whether an app communicates with any malicious sites.#ere are certainly many other applications such as server vulner-ability identi%cation [31]. For instance, we can use the generatedserver request messages as seeds to perform the penetration testingto see whether the server contains any exploitable vulnerabilitiessuch as SQL injection, cross-site-scripting (XSS), cross-site requestforgery (CSRF), etc. We leave the study of the vulnerability fuzzingto our another future work.We can also apply the selective symbolic execution of SmartGen

    to solve other problems. For instance, by changing the targetedAPIs to those security-sensitive ones (e.g., TelephonyManager.getDeviceId), we can collect and solve the constraints along theexecution path to trigger these APIs. #rough this, we are likelyable to further observe how sensitive data is collected and perhaps%nd privacy leakage vulnerabilities in real apps. Part of our futurework will also explore these applications.

    7 CONCLUSIONWe have presented SmartGen, a tool to automatically generateserver request messages and expose the server URLs from a mobileapp by using selective symbolic execution, and demonstrated howto use SmartGen to detect malicious sites based on the exposedURLs for the top 5, 000 Android apps in Google Play. Unlike priore'orts, SmartGen focuses on the constraints from the UI elementsand solves them to trigger the networking APIs. Built atop APIhooking and Java re*ection, it also features a new runtime app in-strumentation technique that is able to more e+ciently instrumentan app and perform an in-context analysis. Our evaluation with thetop 5, 000 ranked mobile apps have demonstrated that with Smart-Gen we are able to %nd 297, 780 URLs, and among them actually8, 634 are malicious sites according to the URL classi%cation resultfrom VirusTotal.

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Related Work

    1 Dynamic Analysis. Monkey [mon] automatically executesand randomly navigates an app. AppsPlayground [RCE13]and SMV-Hunter [SSG+14] more intelligent. A3E [AN13], atargeted exploration of mobile apps. DynoDroid [MTN13]instruments the Android framework and uses adb tomonitor UI interaction and generate UI events.

    2 Symbolic Execution. Symbolic execution in app testing ingeneral [MMP+12], path exploration [ANHY12], andmalware analysis [WL16]. Closely related workIntelliDroid but it only focuses on malware and lacksgenerality of UI rich mobile app analysis.

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Related Work

    HTTPSHTTPS

    Encryption, hashing, signing

    1 Mobile App Vulnerability Discovery. A large body ofefforts have focused on discovering vulnerabilities inmobile apps. TaintDroid [EGC+10], PiOS [EKKV11],CHEX [LLW+12], SMV-Hunter [SSG+14].

    1 Remote Server Vulnerability Discovery. Few efforts(e.g., AUTOFORGE [ZWWL16]) including smartgen [ZL17].have been focusing on identifying the vulnerabilities inapp’s server side.

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    SMARTGEN [ZL17]A Fully Automated, Symbolic Execution Based, Mobile App Execution Framework

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    Security Applications

    SMARTGEN

    A fully automated mobile app executionframework via symbolic executionCan be used to test various securityvulnerabilities in mobile systems

    Experimental Result w/ 5, 000 apps

    Each app has 1,000,000 installs

    These apps actually talk to 2, 071phishing sites, 3, 722 malwaresites, and 3, 228 malicious sites

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    Thank You

    APKAPK Building ECG Extracting Path Constraints

    Solving the Constraints

    Runtime Instrumentation

    Request Message

    Generation

    Selective Symbolic

    Execution

    Real Phone

    Static Analysis

    Request

    Messages

    Dynamic Analysis

    Security Applications

    Acknowledgement

    AFOSR, NSFVirusTotal (premium services) [email protected]

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    References I

    Tanzirul Azim and Iulian Neamtiu, Targeted and depth-first exploration for systematic testing of android apps,Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented ProgrammingSystems Languages & Applications (New York, NY, USA), OOPSLA ’13, ACM, 2013, pp. 641–660.

    Saswat Anand, Mayur Naik, Mary Jean Harrold, and Hongseok Yang, Automated concolic testing ofsmartphone apps, Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations ofSoftware Engineering (New York, NY, USA), FSE ’12, ACM, 2012, pp. 59:1–59:11.

    Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, YvesLe Traon, Damien Octeau, and Patrick McDaniel, Flowdroid: Precise context, flow, field, object-sensitive andlifecycle-aware taint analysis for android apps, Proceedings of the 35th ACM SIGPLAN Conference onProgramming Language Design and Implementation (New York, NY, USA), PLDI ’14, ACM, 2014,pp. 259–269.

    Marshall Beddoe, The protocol informatics project, http://www.4tphi.net/~awalters/PI/PI.html.

    Yinzhi Cao, Yanick Fratantonio, Antonio Bianchi, Manuel Egele, Christopher Kruegel, Giovanni Vigna, andYan Chen, Edgeminer: Automatically detecting implicit control flow transitions through the androidframework., Proceedings of the 20th Annual Network and Distributed System Security Symposium(NDSS’15), 2015.

    Weidong Cui, Jayanthkumar Kannan, and Helen J. Wang, Discoverer: Automatic protocol reverseengineering from network traces, Proceedings of the 16th USENIX Security Symposium (Security’07)(Boston, MA), August 2007.

    Juan Caballero, Pongsin Poosankam, Christian Kreibich, and Dawn Song, Dispatcher: Enabling activebotnet infiltration using automatic protocol reverse-engineering, Proceedings of the 16th ACM Conference onComputer and and Communications Security (CCS’09) (Chicago, Illinois, USA), 2009, pp. 621–634.

    http://www.4tphi.net/~awalters/PI/PI.html

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    References II

    Weidong Cui, Vern Paxson, Nicholas Weaver, and Randy H. Katz, Protocol-independent adaptive replay ofapplication dialog, Proceedings of the 13th Annual Network and Distributed System Security Symposium(NDSS’06) (San Diego, CA), February 2006.

    Juan Caballero and Dawn Song, Polyglot: Automatic extraction of protocol format using dynamic binaryanalysis, Proceedings of the 14th ACM Conference on Computer and and Communications Security(CCS’07) (Alexandria, Virginia, USA), 2007, pp. 317–329.

    W. Enck, P. Gilbert, B.G. Chun, L.P. Cox, J. Jung, P. McDaniel, and A.N. Sheth, TaintDroid: aninformation-flow tracking system for realtime privacy monitoring on smartphones, OSDI, 2010.

    M. Egele, C. Kruegel, E. Kirda, and G. Vigna, Pios: Detecting privacy leaks in ios applications, NDSS, 2011.

    Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, and Xiangyu Zhang, Automatic protocol format reverse engineeringthrough context-aware monitored execution, Proceedings of the 15th Annual Network and DistributedSystem Security Symposium (NDSS’08) (San Diego, CA), February 2008.

    Long Lu, Zhichun Li, Zhenyu Wu, Wenke Lee, and Guofei Jiang, Chex: statically vetting android apps forcomponent hijacking vulnerabilities, Proceedings of the 2012 ACM conference on Computer andcommunications security, ACM, 2012, pp. 229–240.

    Justin Ma, Kirill Levchenko, Christian Kreibich, Stefan Savage, and Geoffrey M. Voelker, Unexpected meansof protocol inference, Proceedings of the 6th ACM SIGCOMM on Internet measurement (IMC’06) (Rio deJaneriro, Brazil), ACM Press, 2006, pp. 313–326.

    Nariman Mirzaei, Sam Malek, Corina S Păsăreanu, Naeem Esfahani, and Riyadh Mahmood, Testing androidapps through symbolic execution, ACM SIGSOFT Software Engineering Notes 37 (2012), no. 6, 1–5.

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    References III

    Ui/application exerciser monkey, https://developer.android.com/tools/help/monkey.html.

    Aravind Machiry, Rohan Tahiliani, and Mayur Naik, Dynodroid: An input generation system for android apps,Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ACM, 2013,pp. 224–234.

    Paolo Milani Comparetti, Gilbert Wondracek, Christopher Kruegel, and Engin Kirda, Prospex: ProtocolSpecification Extraction, IEEE Symposium on Security & Privacy (Oakland, CA), 2009, pp. 110–125.

    James Newsome, David Brumley, Jason Franklin, and Dawn Song, Replayer: Automatic protocol replay bybinary analysis, Proceedings of the 13th ACM Conference on Computer and and Communications Security(CCS’06), 2006.

    Vaibhav Rastogi, Yan Chen, and William Enck, Appsplayground: Automatic security analysis of smartphoneapplications, Proceedings of the Third ACM Conference on Data and Application Security and Privacy (NewYork, NY, USA), CODASPY ’13, ACM, 2013, pp. 209–220.

    A framework for analyzing and transforming java and android apps, https://sable.github.io/soot/.

    David Sounthiraraj, Justin Sahs, Garrett Greenwood, Zhiqiang Lin, and Latifur Khan, Smv-hunter: Largescale, automated detection of ssl/tls man-in-the-middle vulnerabilities in android apps, Proceedings of the21st Annual Network and Distributed System Security Symposium (NDSS’14) (San Diego, CA), February2014.

    Michelle Y Wong and David Lie, Intellidroid: A targeted input generator for the dynamic analysis of androidmalware, Proceedings of the 21st Annual Network and Distributed System Security Symposium (NDSS’16)(San Diego, CA), February 2016.

    https://developer.android.com/tools/help/monkey.htmlhttps://sable.github.io/soot/

  • Motivation SMARTGEN Design Applications Evaluation Related Work Conclusion References

    References IV

    Gilbert Wondracek, Paolo Milani, Christopher Kruegel, and Engin Kirda, Automatic network protocolanalysis, Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS’08)(San Diego, CA), February 2008.

    Chaoshun Zuo and Zhiqiang Lin, Exposing server urls of mobile apps with selective symbolic execution,Proceedings of the 26th World Wide Web Conference (WWW’17) (Perth, Australia), April 2017.

    Chaoshun Zuo, Wubing Wang, Rui Wang, and Zhiqiang Lin, Automatic forgery of cryptographicallyconsistent messages to identify security vulnerabilities in mobile services, Proceedings of the 21st AnnualNetwork and Distributed System Security Symposium (NDSS’16) (San Diego, CA), February 2016.

    Yunhui Zheng, Xiangyu Zhang, and Vijay Ganesh, Z3-str: A z3-based string solver for web applicationanalysis, Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ACM, 2013,pp. 114–124.

    MotivationSmartGen DesignApplicationsEvaluationRelated WorkConclusionReferences