ibinhunt: binary hunting with inter-procedural control flow jiang ming, meng pan, and debin gao...

21
iBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State University D’Crypt Pte Ltd School of Information Systems, Singapore Management University 1

Upload: rylie-sandall

Post on 16-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

iBinHunt: Binary Hunting with Inter-Procedural Control Flow

Jiang Ming, Meng Pan, and Debin Gao

College of Information Sciences and Technology, Penn State University D’Crypt Pte Ltd

School of Information Systems, Singapore Management University

12 3

1

2

3

Page 2: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Introduction

• Binary Hunting: automatically finding Semantic Differences in binary programs• Need to capture Semantic Differences

– Differences in functionality (input-output behavior)

• Syntactic Differences cause false positives– Differences in instructions– Register allocation– Basic-block reordering– Variables rename– ….

Page 3: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

An example: gzip

• Different instructions in two versions, but with the same semantics

• A patch with 5 lines of code

All the 75 non-empty functions are changed

xor eax, eax

and ebx, 0

1

Gzip Long File Name Buffer Overflow Vulnerabilityhttp://www.securityfocus.com/bid/3712

1

Page 4: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Importance of Binary Hunting

Security applications of binary hunting• Finding security vulnerabilities with patched binary

– “BinHunt: Automatically finding semantic differences in binary programs”, ICICS 2008

• Automatic patch-based exploit (1-day exploit ) generation – “Automatic Patch-Based Exploit Generation is Possible”, IEEE S&P 2008

• Software plagiarism detection– “GPLAG: detection of software plagiarism by program dependence graph analysis”, KDD 2006

• Adapting trained anomaly detectors to software patches– “Automatically adapting a trained anomaly detector to software patches”, RAID 2009

• Malware analysis– “Polymorphic worm detection using structural information of executables”, RAID 2005– “Large-scale malware indexing using function-call graphs”, CCS 2009

Page 5: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Challenge

• Source code of binary files is not available• Function name extracted from these binary files are unreliable• Variety of obfuscation• ……• Latest solutions -- find similarity/difference in control flow structure rather than binary instructions

– Resistant to “superficial” changes – Example: BinDiff, BinHunt, DarunGrim, SMIT

Page 6: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Intra-procedural control flow vs.

Inter-procedural control flow

• Intra-procedural control flow– Most previous work focus on the intra-

procedural control flow.– Sub-graph isomorphism problem is NP-

complete.

– Example: 96% of non-empty functions of thttpd have fewer than 30 basic blocks.

– Graph isomorphism is practical in analyzing intra-procedural control flow

• Inter-procedural control flow– No function boundary– Huge graph with large size of nodes,

where graph isomorphism is impractical

– Example: thttpd-2.25 totally has more than 4,300 basic blocks. More than 4,000 candidate matchings for single basic block

Page 7: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Function Transformation Obfuscation

• Function transformation obfuscation is well-studied– Inlining functions– Outlining functions– Cloning functions– Interleaving functions

• Performing such obfuscation is simple and without intensive analysis of the binaries.

1

C. Collberg, C. Thomborson, and D. Low. A taxonomy of obfuscating transformations.Technical Report 148, Department of Computer Sciences, The University of Auckland, July 1997.

Inlining and outlining transformations

1

Page 8: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Advanced control flow obfuscation

• Control flow flattening– “Protection of software-based survivability

mechanisms”, DSN 2001– “An Approach to the Obfuscation of

Control-Flow of Sequential Computer Programs”, ISC 2001

• Redirecting control-flow with exceptions– “Binary Obfuscation Using Signals”,

USENIX Security 2007– “binOb+: a framework for potent and

stealthy binary obfuscation”, AsiaCCS 2010

• Function boundary information (Intra-procedural control flow) is not reliable !

Page 9: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Overview of iBinHunt

• iBinHunt: Binary Diffing with Inter-Procedural Control Flow Graphs• iBinHunt provides practical solutions to large number of basic block matchings

– Dynamic Tainting: Monitor the execution of the two binary programs under a common input and use taint analysis to record all basic blocks involved in the processing of the input.

– Deep taint: assign different taint tags to various parts of the input; only basic blocks from two binary programs that are marked with the same taint tags are considered matching candidates (a reduction factor of up to 74%).

– Basic block comparison: symbolic execution is first used to represent outputs of the basic blocks with their input symbols, and a theorem prover is then used to check if the outputs from the two basic block are semantically equivalent.

– Automatic input generation: increases the coverage of tainted basic blocks by automatically generating inputs that result in different execution traces.

Page 10: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Deep taint for basic block comparison

Inter-Procedural Control Flow Graphs

Deep taint execution trace

Deep Taint

Basic block comparison

Page 11: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

An example: thttpd

• Input and its taint tag colors • Dynamic execution traces with Deep taint

Page 12: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Basic Blocks comparison

• Symbolic execution and theorem proving– Use symbolic execution to represent final values of outputs (registers and

variables)– Use a theorem prover to test if the outputs of two basic blocks are always

the same given the same inputs• Context aware

– the permutation of outputs of the equivalent basic blocks is the permutation of inputs of the successor blocks.

• Obtain the matching strength based on the result from the theorem

Page 13: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Basic block matching

we need to consider two other groups of blocks for finding matched blocks.

• Blocks are not semantically equivalent but with the same taint tags

• Blocks are not tainted but on the dynamic execution trace

• They could very likely be the differences between the two programs that iBinHunt is trying to locate. E.g., BB_13232 and BB_16184 are the location of binary difference

• Due to various reasons including limitations of taint analysis, not directly processing program inputs (e.g., signal processing), etc.

Page 14: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Matching Strength

Basic blocks B1 and B2 are considered matched to one another if B1 and B2 have the same taint tags (possibly non-tainted) and• B1 and B2 are semantically equivalent (evaluated by symbolic execution and a

theorem proving); or

• a predecessor of B1 and a predecessor of B2 match; or

• a successor of B1 and a successor of B2 match.

1B 2B

predecessor

1B

predecessor

2B

successor

1B

successor

2B

Page 15: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Automatic Input Generation

Symbolic ExecutionConcrete Execution

Symbolic Formula

Initial Input:GET index.html HTTP/1.1Host: .

ff Constraint

Solver(STP)

New Input

Page 16: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Evaluation

• We applied iBinHunt to find semantic differences in several versions of thttpd and gzip. There are two main aspects on which we want to evaluate:

– Efficiency: how many basic blocks can be matched under our definition of matching strength, how many matchings are identified by deep taint, and how long it takes to find these matchings.

– Accuracy: confirm these differences by comparing them to the ground truth (program source code).

• Different versions of thttpd and gzip (number of lines changed / total number of lines)

thttpd - 2.20 2.20c 2.21 2.25

2.19 252/6059 254/5843 1483/6641 2908/7271

gzip- 1.3.12 1.3.13 1.40

1.2.4 1317/4959 1351/4929 1446/4841

Page 17: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Matching basic blocks

We evaluate:• Matched basic blocks that are semantically the same;• Matched ones that are not semantically equivalent but have both a predecessor and a

successor matched;• Basic blocks are not semantically equivalent but have either a predecessor or a successor

matched.• The time taken by input generation and deep taint;

Page 18: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Effectiveness of deep taint

• Results show that more than 34% and 67% of the matched basic blocks in thttpd and gzip contain the same taint tags.

– a large number of these matchings do contain the same taint tags;– even though many basic blocks are not tainted by our limited number of

program inputs, their neighbors are tainted in most cases and the tainted neighbors help matchings to be identified.

• Percentage of matched basic blocks with the same taint representation

thttpd- 2.20 2.20c 2.21 2.25

2.19 34.8% 38.2% 39.9% 37.4%

gzip- 1.3.12 1.3.13 1.40

1.2.4 67.9% 72.2% 72.6%

Page 19: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Accuracy

• BB_1371 from thttpd-2.19 should match with BB_1689 in thttpd-2.25, both of which deal with the “-i” argument.

• However, BB_1687 in thttpd-2.25 also contains the same (type of) instructions, which confuses the binary diffing tool in the matching.

Page 20: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Discussions

• Limitations– The power of iBinHunt is limited by the non-perfect basic block coverage.– In our experiments with thttpd and gzip, some basic blocks are not covered

even if we continue to generate new program inputs– Performance

• Future work– More optimization on the code to improve efficiency. – Parallelizing Dynamic Taint Tracking– More in-depth binary difference analysis, in which (part of) the programs are

only semantically equivalent on certain subset of the inputs.

Page 21: IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State

Conclusion

• Introduce function obfuscation attacks in existing binary diffing tools that analyze intra-procedural control flow of programs.

• Propose a novel binary diffing tool called iBinHunt which analyzes the inter-procedural control flow.

• iBinHunt makes use of a novel technique called deep taint.