safe and efficient instrumentation
DESCRIPTION
Safe and Efficient Instrumentation. Andrew Bernat. Binary Instrumentation. Instrumentation modifies the original code Moves original code Allocates new memory Overwrites original code This affects the behavior of: Moved code Code that references moved code - PowerPoint PPT PresentationTRANSCRIPT
Paradyn Project
Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2010
Paradyn Project
Safe and Efficient Instrumentation
Andrew Bernat
Binary Instrumentation
2Safe and Efficient Instrumentation
• Instrumentation modifies the original code• Moves original code• Allocates new memory• Overwrites original code
• This affects the behavior of:• Moved code• Code that references moved code• Code that references changed memory
Sensitivity Models• A program is sensitive to a particular
modification if that modification changes the program’s behavior
• Current binary instrumenters rely on fixed sensitivity models
• Compensating for sensitivity imposes overhead
3Safe and Efficient Instrumentation
pop %eaxcall addr_translatejmp %eax
ret
Safe and Efficient
Approach
Safe and Efficient
Approach
Efficiency vs Sensitivity
4Safe and Efficient Instrumentation
Sensitivity Malware
Optimized Code
Conventional Code
Efficiency
Pin, Valgrind, …
Dyninst
Safe and Efficient
Approach
How do we do this?• Formal model of code relocation• Visible behavior• Instruction sensitivity• External sensitivity
• Implementation in Dyninst• Analysis phase• Transformation phase
• Performance Results
5Safe and Efficient Instrumentation
Three Questions
• What program behavior do we wish to preserve?
• How does modification affect instructions?
• How do instructions change program behavior?
6Safe and Efficient Instrumentation
Approach• Preserve visible behavior• Relationship of input to output
• Identify sensitive instructions• Those whose behavior is changed
• Emulate only externally sensitive instructions• Those whose sensitivity affects visible
behavior7Safe and Efficient Instrumentation
Visible Behavior• Intuition: we can change anything that
does not affect the output of the program
• Formalization: in terms of denotational semantics• Briefly: two programs P, P’ are equivalent if:
8Safe and Efficient Instrumentation
Visibly Equivalent Programs
9Safe and Efficient Instrumentation
Original Binary
X YInstrumented
Binary
X + A Y + BInstrumentati
onInput
Instrumentation
Output
Sensitivity• What does instrumentation change?• Addresses of instructions• Contents of memory• Shape of the address space
• Sensitive instructions are directly affected• Access the PC (and are moved)• Read modified memory• Test allocated memory
10Safe and Efficient Instrumentation
Sensitivity Examples
11Safe and Efficient Instrumentation
main: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
get_pc_thunk: mov (%esp), %ebx ret
Call/Return pair:
Jumptable:protect: call initializeinitialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)
Self-Unpacking Code(Simplified)
External Sensitivity• An instruction is externally sensitive if it
causes a visible change in behavior• Approximation: or changes control flow
• This requires:• The sensitive instruction must produce
different values• These differences must reach an instruction
that affects output (or control flow)• … and change its behavior
12Safe and Efficient Instrumentation
Program Modification
13Safe and Efficient Instrumentation
Analysis
Compensation
Code
Original Binary
Modified BinaryCode
Relocated Code
Analysis Phase• Identify sensitive instructions• InstructionAPI: used and defined sets
• Determine affected instructions• DepGraphAPI: forward slice
• Analyze effects of modification• SymEval: symbolic expansion of the slice
14Safe and Efficient Instrumentation
Analysis Example: Call/Return Pair
15Safe and Efficient Instrumentation
main: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
Call/Return pair:
Sensitivity: call (moved, uses PC)
Slice: call ret
Symbolic Expansion: call: ret:
Analysis Example: Jumptable
16Safe and Efficient Instrumentation
Sensitivity: call (moved, uses PC)
Slice: call mov (%esp), %ebx
Symbolic Expansion: call: ret: jmp:
jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
get_pc_thunk: mov (%esp), %ebx ret
Jumptable:
add $0x42, %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
Analysis Example: Unpacking Code
17Safe and Efficient Instrumentation
Sensitivity: call (moved, uses PC)
Slice: call initialize pop %esi mov (%esi, %ebx, 4), %eax call unpack … Symbolic Expansion: call: pop: mov:
protect: call initialize…initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)
Self-Unpacking Code(Simplified)
Compensation Phase• Generates the relocated code
• Two approaches:• Instruction transformation• Group transformation
18Safe and Efficient Instrumentation
Instruction Transformation• Emulate each externally sensitive
instruction• Replace some instructions (e.g., calls) with
sequences
• Straightforward to implement
• Some sequences impose high overhead• e.g., address translation
19Safe and Efficient Instrumentation
pop %eaxcall addr_translatejmp %eax
ret
Group Transformation• Emulate the behavior of a group of
instructions• Motivating example: thunks
• Open questions:• Which instructions are included in the
group?• How is the replacement sequence
determined?• Current status: hand-crafted templates
20Safe and Efficient Instrumentation
Transformation: Call/Return Pair
21Safe and Efficient Instrumentation
main: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
Original Codemain: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
Relocated Code
Transformation: Jumptable
22Safe and Efficient Instrumentation
Original Code Relocated Codejumptable:
push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
get_pc_thunk: mov (%esp), %ebx ret
jumptable: push %ebp mov %esp, %ebp mov $(orig_ret_addr), %ebx add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
Transformation: Unpacking Code
23Safe and Efficient Instrumentation
Relocated Codeprotect:
call initialize…initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpack_base)
Original Codeprotect: jmp initialize…initialize: mov $(orig_addr), %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)
Results
Type of Binary % PC Sensitive % Externally Sensitive
% Unanalyzable
Executable (a.out) 9.0% 1.1% 6.6%Library (.so) 7.9% 6.9% 9.1%
24Safe and Efficient Instrumentation
Percentage of PC-Sensitive Instructions (32-bit, GCC, static analysis)
Dyninst S&E (no memory)
S&E (memory)
go (uninstrumented)
21.3 (73.2%) 12.4s (0.8%) 15.0s (22.0%)
go (basic block count)
23.4 (90.2%) 16.3s (32.5%) 19.5s (58.5%)
Instrumentation Overhead (go, 32-bit, 12.3s base time)
Future Work• Memory sensitivity and compensation• Improved pointer analysis• Useful user intervention?
• Investigate group transformations• Widen range of input binaries• Expand supported platforms
25Safe and Efficient Instrumentation
Questions?
26Safe and Efficient Instrumentation
ASProtect code loop
27Safe and Efficient Instrumentation
8049756: call 8049761
8049761: mov EDX, ECX8049763: pop EDI8049764: push EAX8049765: pop ESI8049766: add EDI, 2183804976c: mov ESI, EDI804976e: push 08049773: jz 804977c
8049779: adc DH, 229
804977c: pop EBX804977d: mov EAX, 2015212641
8049782: mov ECX, EBX(EDI)8049785: jmp 804979c
804979c: add ECX, 158698631680497a2: xor ESI, 31433375680497a8: xor ECX, 59491573380497ae: jmp 80497c3
80497c3: sub ECX, 59494877880497c9: sub ESI, 6426080497ce: push ECX, ESP80497cf: mov EAX, 88437732180497d4: pop EBX(EDI)80497d7: jmp 80497ed
80497ed: adc AL, 10080497f0: sub EBX, 159502605080497f6: xor EAX, 3477880497fb: add EBX, 15950260468049801: call 804980c
804980c: mov AX, 27838049810: pop ESI8049811: cmp EBX, 42949653448049817: jnz 8049834
804981d: or ESI, 8391819108049823: jmp 8049847
8049834: mov ESI, 12875703758049839: jmp 8049782
Emulation Examples
28Safe and Efficient Instrumentation
add %eax, %ebx
jnz 0xf3e
call fprintf
mov (%esi, %ebx, 4), %eax
jnz 0xe498d3
add %eax, %ebx
push $804391jmp fprintf
lea (%esi, %ebx, 4), %eaxcall mem_addr_translatemov (%eax), %eax
retpop %eaxcall addr_translatejmp %eax