Transcript
Page 1: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

SILVIO CESARE AND YANG XIANGDEAKIN UNIVERSITY

Simseer and BugwiseWeb Services for Binary-level Software

Similarity and Defect Detection

Page 2: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Introduction

Defect detection Finds software bugs E.g., buffer overflows, divide-by-zeros, use-after-frees

Malware variant detection Discover obfuscated, evolved, mutated copies of

malwareSoftware theft detection

Discover illegitimate copies of softwarePlagiarism detection

Discover unauthorized copying of software code. E.g., student assignments.

Page 3: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Motivation

Defect detection External Auditing Verification of compilation and linkage

Malware variant detection Increase predictive power of signatures Most new malware are variants of existing malware

Software theft detection Protection of intellectual property Automated detection reduces costs of investigation

Plagiarism detection Provide a deterrent through automated detection Manual approach not scalable

Page 4: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Innovation

This research makes the following contributions: We propose an online web service, Bugwise, to

perform binary-level defect detection.

We propose an online web service, Simseer, to address malware variant detection, software theft detection and plagiarism detection.

We use state-of-the-art algorithms in novel applications.

We implement and make our services public

Page 5: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Related Work

Defect detection Formal methods, program analysis, abstract

interpretation, data flow analysis.Software similarity

Features make a birthmark (fingerprint) Similarity function comparing birthmarks (euclidean

distance, cosine similarity etc).Birthmarks

Vectors, strings, sets, trees, graphs etc. Byte-level content, instructions, basic blocks, control

flow, API calls etc. Our system uses control flow.

Page 6: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Our Approach

Bugwise and Simseer use a unified backend from our previous work – Malwise.

We implement two web services using cloud-based virtual private servers.

Simseer Uses control flow as a feature to generate a signature

(birthmark).

Bugwise Combines decompilation with traditional data flow analysis to

detect several bug classes.

Page 7: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Web Services Workflow

Web Frontend Scan Server

Script Scheduler

Script

MalwiseEvolutionary

Tree Creation

SSH Tunnel

SSH Tunnel (Simseer)

Store and Display Results

SSH Tunnel (Bugwise)

Page 8: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

The Web Frontend

Accepts submission of archives and executables.

Implemented with server side PHP programming language.

PHP launches script to process submitted binary.

Script performs validation. E.g., Filenames have no special characters.

Launches C++ network client to submit binary to scan server.

Page 9: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

The Web Frontend

Page 10: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

The Scheduling Work Queue

Listens to TCP port on scan server.

Connects to web frontend via SSH tunnel.

Accepts binaries from web frontend.

Queues jobs so that only 1 is running at any time.

Launches Simseer or Bugwise script to process binary.

Page 11: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Malwise Backend

Malwise is a native C++ application of ~100,000 LOC.

Plugin-based modular system.

Simseer and Bugwise differ by their configuration and plugins.

Configuation specified in XML.

Page 12: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

The Simseer Backend

Performs unpacking to remove malware obfsucation.

Decompiles the control flow.

1st pass generates signatures.

2nd pass shows similarity between signatures.

Page 13: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

The Bugwise Backend

Performs decompilation of local variables.

Performs compiler-style optimisations (dead code elimiation, copy propagation, constant folding etc).

Performs data flow analysis (reaching defintions, upwards exposed uses etc).

Detects double frees (deallocating the same memory twice) using the data flow analysis results.

Page 14: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Configuration - Simseer (l), Bugwise (r)

<ModuleGroup>

<Name>Scan</Name>

<Run>Packer Detection Using Entropy</Run>

<Run>Unpacker Using Application Level Emulation</Run>

<Run>Structuring</Run>

<Run>NGram Structuring</Run>

</ModuleGroup>

<ModuleGroup>

<Name>Scan</Name>

<Run>Code Optimsation 1</Run>

<Run>Linux Arch</Run>

<Run>Pre Decompiler Data Flow Analysis</Run>

<Run>X86 Decompiler Data Flow Analysis</Run>

<Run>Decompiler Data Flow Analysis</Run>

<Run>Code Optimsation 2</Run>

<Run>IRDataFlowAnalysis</Run>

<Run>Double Free Detection</Run>

</ModuleGroup>

Page 15: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Simseer Evolutionary Tree Visualization

Phylogenetic tree – e.g. tree of life.

The closer nodes are in the tree, the more similar those nodes are.

Simseer backend generates distance/similarity matrix.

PHYLIP software package takes matrix and generates tree.

Tree is rendered to an image.

Page 16: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Program Realtionships Visualization

Page 17: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Results Processing

Parse XML output from Malwise

PHP parser

Simseer Display evolutionary tree and similarity matrix

Bugwise Display table showing address of double frees

Page 18: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Efficiency of Malwise as a Web Services

Does a web service incur much overhead compared to command line usage?

Test case is 9 samples submitted to Simseer.

Python script sends samples and waits for results.

We compare the times of command line versus the web service.

Mean overhead is 0.64 seconds.

Page 19: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Processing timesSimseer Web Service (l), Malwise Command Line (r)

Page 20: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Availability

http://www.FooCodeChu.Com

Rate limiting of submissions.

Limit of sample sizes and the number of samples in archives.

We intend to relax these restrictions as we migrate to more scalable infrastructure.

Page 21: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Future Work

Enterprise messaging to perform load balancing and queuing?

More options to scans to exploit Malwise plugin system.

Any-time clustering to cluster new samples incrementally in real-time?

Bug detection could be developed as bug management system.

Page 22: Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

Conclusion

We make available new services for bug detection and software similarity.

Our backend Malwise is versatile and allows plugins to implement these services.

Bugwise has found real bugs in Linux.

The web service overhead is minimal.

We believe web services in these applications will have future growth.


Top Related