binary-level security: semantic analysis to the rescue · • binary-level security analysis •...

46
| 1 Sébastien Bardin et al. pré-GDR Sécurité 2017 Joint work with Richard Bonichon, Robin David, Adel Djoudi & many other people BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE Sébastien Bardin (CEA LIST)

Upload: others

Post on 25-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 1 Sébastien Bardin et al. – pré-GDR Sécurité 2017

Joint work with

Richard Bonichon, Robin David, Adel Djoudi & many other people

BINARY-LEVEL SECURITY:

SEMANTIC ANALYSIS TO THE RESCUE

Sébastien Bardin (CEA LIST)

Page 2: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 2

• Binary-level security analysis

• Many applications, many challenges

• Current syntactic and dynamic methods are not enough

• Formal methods can change the game … but must be strongly adapted

• Complement existing methods

• Need robustness and scalability!

• Acceptable to lose both correctness & completeness – in a controlled way

• New challenges and variations, many things to do!

• Achievements

• Vulnerability detection [with Josselin Feist, Laurent Mounier and Marie-Laure Potet]

• Malware deobfuscation [with Jean-Yves Marion]

• BINSEC platform: Open-source, still in its infancy, on heavy refactoring

Sébastien Bardin et al. – pré-GDR Sécurité 2017

IN A NUTSHELL

Page 3: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 3 Sébastien Bardin et al. – pré-GDR Sécurité 2017

ABOUT MY LAB @CEA

Page 4: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 4 Sébastien Bardin et al. – pré-GDR Sécurité 2017

ABOUT FORMAL METHODS

Success in safety-critical

Page 5: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 5 Sébastien Bardin et al. – pré-GDR Sécurité 2017

NEXT BIG CHALLENGE

Page 6: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 6 Sébastien Bardin et al. – pré-GDR Sécurité 2017

NOW: BINARY-LEVEL SECURITY

Page 7: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 7 Sébastien Bardin et al. – pré-GDR Sécurité 2017

BENEFITS

No source code More precise analysis Malware

What for: vulnerabilities, reverse (malware, legacy),

protection evaluation, etc.

Page 8: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 8 Sébastien Bardin et al. – pré-GDR Sécurité 2017

EXAMPLE: COMPILER BUG

Our goal here:

• Check the code after compilation

Page 9: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 9 Sébastien Bardin et al. – pré-GDR Sécurité 2017

EXAMPLE: MALWARE COMPREHENSION

The day after: malware comprehension

• understand what has been going on

• mitigate, fix and clean

• improve defense

Highly challenging [obfuscation]

APT: highly sophisticated attacks

• Targeted malware

• Written by experts

• Attack: 0-days

• Defense: stealth, obfuscation

• Sponsored by states or mafia

USA elections: DNC Hack

Page 10: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 10 Sébastien Bardin et al. – pré-GDR Sécurité 2017

CHALLENGE: CORRECT DISASSEMBLY

Basic reverse problem

• aka model recovery

• aka CFG recovery

Page 11: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 11 Sébastien Bardin et al. – pré-GDR Sécurité 2017

CAN BE TRICKY! • code – data

• dynamic jumps (jmp eax)

Page 12: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 12 Sébastien Bardin et al. – pré-GDR Sécurité 2017

STATE-OF-THE-ART TOOLS ARE NOT ENOUGH

• Static (syntactic): too fragile

• Dynamic: too incomplete

Just add

mov %eax,%ecx

mov %ecx,%eax

and break results

Page 13: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 13 Sébastien Bardin et al. – pré-GDR Sécurité 2017

[See later] CAN BECOME A NIGHTMARE

Page 14: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 14 Sébastien Bardin et al. – pré-GDR Sécurité 2017

EXAMPLE: VULNERABILITY DETECTION

Find vulnerabilities before the bad guys

• On the whole program

• At binary-level

• Know only the entry point and program

input format

Page 15: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 15 Sébastien Bardin et al. – pré-GDR Sécurité 2017

CHALLENGE: In-depth exploration

(example: use after free)

Dynamic analysis

• Too incomplete

Page 16: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 16 Sébastien Bardin et al. – pré-GDR Sécurité 2017

BONUS: (MULTI-)ARCHITECTURE SUPPORT

Page 17: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 17 Sébastien Bardin et al. – pré-GDR Sécurité 2017

THE SITUATION

• Binary-level security analysis is necessary

• Binary-level security analysis is highly challenging (*)

• Standard tools are not enough

Some challenges

• correct disassembly (code identification)

• multi-architecture support

• + all problems from source-level analysis (*) i.e., more challenging

than source code analysis

Page 18: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 18 Sébastien Bardin et al. – pré-GDR Sécurité 2017

SOLUTION? BINARY-LEVEL SEMANTIC ANALYSIS

Page 19: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 19 Sébastien Bardin et al. – pré-GDR Sécurité 2017

THE HARD JOURNEY FROM SOURCE TO BINARY

Wanted

• robustness

• precision

• scale

Page 20: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 20 Sébastien Bardin et al. – pré-GDR Sécurité 2017

OUR APPROACH

• A simple Intermediate Representation

• A set of « standard » analysis

• Adapted to binary

• Robustness in mind

• Keep correct/complete as much as possible

• Domain-dedicated analysis [combination]

Page 21: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 21 Sébastien Bardin et al. – pré-GDR Sécurité 2017

IN PRACTICE

Can recover useful semantic information

• More precise disassembly

• Exact semantic of instructions

• Input of interest

• …

Page 22: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 22 Sébastien Bardin et al. – pré-GDR Sécurité 2017

KEY 1: INTERMEDIATE REPRESENTATION

[cav’11]

• Concise

• Well-defined

• Clear, side-effect free

Page 23: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 23 Sébastien Bardin et al. – pré-GDR Sécurité 2017

KEY 1: INTERMEDIATE REPRESENTATION (2)

-- simplifications [tacas’15 ]

• DBA level

• Instruction level

• Program level

Page 24: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 24 Sébastien Bardin et al. – pré-GDR Sécurité 2017

KEY 2: PRECISE & ROBUST SYMBOLIC REASONNING

(DSE, Godefroid 2005)

Perfect for intensive testing

• Correct, relatively complete

• No false alarm

• Robust

• Scale in some ways

// incomplete

Page 25: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 25 Sébastien Bardin et al. – pré-GDR Sécurité 2017

KEY 2: DSE, about ROBUSTNESS (imo, the major advantage)

Page 26: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 26 Sébastien Bardin et al. – pré-GDR Sécurité 2017

And more … (more or less experimental)

Page 27: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 27 Sébastien Bardin et al. – pré-GDR Sécurité 2017

APPLICATION: VULNERABILITY DETECTION

[SSPREW 2016, with VERIMAG]

Find vulnerabilities before the bad guys

• On the whole program

• At binary-level

• Know only the entry point and program

input format

Page 28: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 28 Sébastien Bardin et al. – pré-GDR Sécurité 2017

KEY IDEAS (Josselin Feist)

A Pragmatic 2-step approach

• Static: scale, not complete, not correct

• Symbolic: correct, directed by static

• Combination: scalable and correct

Page 29: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 29 Sébastien Bardin et al. – pré-GDR Sécurité 2017

EXPERIMENTAL EVALUATION

On these examples:

• Better than DSE alone

• Better than blackbox fuzzing

• Better than greybox fuzzing with no seed

Page 30: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 30 Sébastien Bardin et al. – pré-GDR Sécurité 2017

APPLICATION: MALWARE DEOBFUSCATION

[S&P 2017, with LORIA]

The day after: malware comprehension

• understand what has been going on

• mitigate, fix and clean

• improve defense

Goal: help malware comprehension

• Reverse of heavily obfuscated code

• Identify and simplify protections

APT: highly sophisticated attacks

• Targeted malware

• Written by experts

• Attack: 0-days

• Defense: stealth, obfuscation

• Sponsored by states or mafia

USA elections: DNC Hack

Page 31: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 31 Sébastien Bardin et al. – pré-GDR Sécurité 2017

REVERSE CAN BECOME A NIGHTMARE (OBFUSCATION)

Obfuscation: make a code

hard to reverse • self-modification

• encryption

• virtualization

• code overlapping

• opaque predicates

• callstack tampering

• …

Goal: help malware comprehension

• Identify and simplify protections

Page 32: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 32 Sébastien Bardin et al. – pré-GDR Sécurité 2017

EXAMPLE: OPAQUE PREDICATE

Constant-value predicates

(always true, always false)

• dead branch points to spurious code

• goal = waste reverser time & efforts

Page 33: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 33 Sébastien Bardin et al. – pré-GDR Sécurité 2017

EXAMPLE: STACK TAMPERING

Alter the standard compilation scheme:

ret do not go back to call

• hide the real target

• return site may be spurious code

Page 34: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 34 Sébastien Bardin et al. – pré-GDR Sécurité 2017

STANDARD DISASSEMBLY TECHNIQUES ARE NOT ENOUGH

Static analysis

• too fragile vs obfuscation

• junk instr, missed instr.

Dynamic analysis

• robust vs obfuscation

• too incomplete

Page 35: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 35 Sébastien Bardin et al. – pré-GDR Sécurité 2017

DYNAMIC SYMBOLIC EXECUTION CAN HELP (Debray, Kruegel, …)

For deobfuscation • find new real paths

• robust

• still incomplete

« dynamic analysis on steroids »

Page 36: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 36 Sébastien Bardin et al. – pré-GDR Sécurité 2017

YET … WHAT ABOUT INFEASIBILITY QUESTIONS?

Prove that something is

always true (resp. false)

Many such issues in reverse

• is a branch dead?

• does the ret always return to the call?

• have i found all targets of a dynamic jump?

And more

• does this malicious ret always go there?

• does this expression always evaluate to 15?

• does this self-modification always write this opcode?

• does this self-modification always rewrite this instr.?

• …

Not addressed by DSE • Cannot enumerate all paths

Page 37: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 37 Sébastien Bardin et al. – pré-GDR Sécurité 2017

OUR CHALLENGE

Check infeasibility questions in obfuscated codes

• scale to realistic malware sizes

• robust to obfuscation such as self-modification

• precise

• generic

Rest of the talk:

• opaque predicate

• stack tampering

Page 38: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 38 Sébastien Bardin et al. – pré-GDR Sécurité 2017

OUR PROPOSAL: BACKWARD-BOUNDED SYMBOLIC EXECUTION

Insight 1: symbolic reasoning

• precision

• But: need finite #paths

Insight 2: backward-bounded

• pre_k(c)=0 => c is infeasible

• finite #paths

• efficient, depends on k

• But: backward on jump eax?

Insight 3: dynamic partial CFG

• solve (partially) dyn. jumps

• robustness

False negative (FN)

• can miss infeasibility

• why: k too small (miss /\-constraints)

False positive (FP)

• wrongly assert infeasibility

• why: CFG too partial (miss \/-constraints)

Low FP/FN rates in practice

• ground truth xp

Page 39: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 39 Sébastien Bardin et al. – pré-GDR Sécurité 2017

EXPERIMENTAL EVALUATION

• Controlled experiments (ground truth) precision

• Large-scale experiment: packers scalability, robustness

• Case-study: X-tunnel malware usefulness

Page 40: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 40 Sébastien Bardin et al. – pré-GDR Sécurité 2017

CONTROLLED EXPERIMENTS

• Goal = assess the precision of the technique

• ground truth value

• Experiment 1: opaque predicates (o-llvm)

• 100 core utils, 5x20 obfuscated codes

• k=16: 3.46% error, no false negative

• robust to k

• efficient: 0.02s / query

• Experiment 2: stack tampering (tigress)

• 5 obfuscated codes, 5 core utils

• almost all genuine ret are proved (no false positive)

• many malicious ret are proved « single-targets »

• Very precise results

• Seems efficient

Page 41: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 41 Sébastien Bardin et al. – pré-GDR Sécurité 2017

CASE-STUDY: PACKERS

Packers: legitimate software protection tools

(basic malware: the sole protection)

Page 42: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 42 Sébastien Bardin et al. -- S&P 2017

CASE-STUDY: PACKERS (fun facts)

Page 43: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 43 Sébastien Bardin et al. – pré-GDR Sécurité 2017

CASE-STUDY: THE XTUNNEL MALWARE (part of DNC hack)

Two heavily obfuscated samples • Many opaque predicates

Goal: detect & remove protections • Identify 50% of code as spurious

• Fully automatic, < 3h

Page 44: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 44 Sébastien Bardin et al. -- S&P 2017

CASE-STUDY: THE XTUNNEL MALWARE (fun facts)

• Protection seems to rely only on opaque predicates

• Only two families of opaque predicates

• Yet, quite sophisticated

• original OPs

• interleaving between payload and OP computation

• sharing among OP computations

• possibly long dependencies chains (avg 8.7, upto 230)

Page 45: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

| 45

• Binary-level security analysis

• Many applications, many challenges

• Current syntactic and dynamic methods are not enough

• Formal methods can change the game … but must be strongly adapted

• Complement existing approaches

• Need robustness and scalability!

• Acceptable to lose both correctness & completeness – in a controlled way

• New challenges and variations, many things to do!

• First results

• Advanced pocs and vulnerability detection and malware deobfuscation

• BINSEC: open-source, still in its infancy, on heavy refactoring

// Advertisement: International Summer School on Software Protection @ Saclay

Sébastien Bardin et al. – pré-GDR Sécurité 2017

CONCLUSION & TAKE AWAY

Page 46: BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE · • Binary-level security analysis • Many applications, many challenges • Current syntactic and dynamic methods are not

Commissariat à l’énergie atomique et aux énergies alternatives

Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142

91191 Gif-sur-Yvette Cedex - FRANCE

www-list.cea.fr

Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019