shuffler: fast and deployable continuous code re …...return-oriented programming ... permutation...
TRANSCRIPT
1
Shuffler: Fast and DeployableContinuous Code Re-Randomization
David Williams-King,Graham Gobieski, Kent Williams-King, James P. Blake,
Xinhao Yuan, Patrick Colp, Michelle Zheng,Vasileios P. Kemerlis, Junfeng Yang, William Aiello
OSDI 2016
2
Software Remains Vulnerable
● High-profile server breaches are commonplace
3
Software Remains Vulnerable
● High-profile server breaches are commonplace● 90% of today’s attacks utilize ROP [1]
4
Return-Oriented Programming
● Reuse fragments of legitimate code (gadgets)
func_3
func_2
func_1
func_3
func_2
func_1
Program code
ret addr
Stack
5
Return-Oriented Programming
● Reuse fragments of legitimate code (gadgets)
Program code
ret addr
Stack
6
Return-Oriented Programming
● Reuse fragments of legitimate code (gadgets)
Stack
ret addrret addr
ret addrdata
Buffer Overrun
ret addr
Program code
7
Return-Oriented Programming
● Reuse fragments of legitimate code (gadgets)
ROP gadget chain
Stack
ret addrret addr
ret addrdata
Buffer Overrun
ret addr
Program code
8
Modern ROP Attacks
● JIT-ROP [2]: iteratively read code at runtime
9
Modern ROP Attacks
● JIT-ROP [2]: iteratively read code at runtime
func_3
func_2
func_1
Target program Attacker
func_3
func_2
func_1
10
Modern ROP Attacks
● JIT-ROP [2]: iteratively read code at runtimeTarget program Attacker
func_3
func_2
func_1
11
Modern ROP Attacks
● JIT-ROP [2]: iteratively read code at runtime
ROP gadget chain
Target program Attacker
Inject exploit
func_3
func_2
func_1
12
Modern ROP Attacks
● JIT-ROP [2]: iteratively read code at runtime
ROP gadget chain
Target program Attacker
Inject exploit
func_3
func_2
func_1
13
The Shuffler Idea
● What if we re-randomize code more rapidly than an attacker discovers gadgets?
func_3
func_2
func_1
func_3
func_2
func_1
14
The Shuffler Idea
● What if we re-randomize code more rapidly than an attacker discovers gadgets?
func_3
func_2
func_1
15
The Shuffler Idea
● What if we re-randomize code more rapidly than an attacker discovers gadgets?
func_3
func_2
func_1
func_3
func_2
func_1
16
The Shuffler Idea
● What if we re-randomize code more rapidly than an attacker discovers gadgets?
ROP gadget chain
Inject exploit
func_3
func_2
func_1
??
17
The Shuffler Idea
● What if we re-randomize code more rapidly than an attacker discovers gadgets?
ROP gadget chain
Inject exploit
18
How Is This Possible?
● Re-randomize code before an attacker uses it
19
How Is This Possible?
● Re-randomize code before an attacker uses it– faster than disclosure vulnerability execution time;
– faster than gadget chain computation time;
– or, faster than network communication time
20
How Is This Possible?
● Re-randomize code before an attacker uses it– faster than disclosure vulnerability execution time;
– faster than gadget chain computation time;
– or, faster than network communication time
21
How Is This Possible?
● Re-randomize code before an attacker uses it– faster than disclosure vulnerability execution time;
– faster than gadget chain computation time;
– or, faster than network communication time● one memory disclosure can only travel 820 miles!
22
What Is Shuffler?
● Defense based on continuous re-randomization– Defeats all known code reuse attacks
– 20-50 millisecond shuffling, scales to 24 threads
● Fast: bounds attacker’s available time– Defeats even attackers with zero network latency
● Deployable:– Binary analysis w/o modifying kernel, compiler, ...
● Egalitarian:– Shuffler runs in same address space, defends itself
23
Outline
24
Outline
1. Continuous re-randomization
2. Accelerating our randomization
3. Binary analysis and egalitarianism
4. Results and Demo
25
func_1
...call func_2...
Continuous Re-Randomization
● Easy to copy code & fix direct references
func_2
func_2
26
func_1
...call func_2...
Continuous Re-Randomization
● Easy to copy code & fix direct references
(deleted)
func_2
27
Continuous Re-Randomization
● Easy to copy code & fix direct references● What about code pointers?
28
func_1
...mov $func_2, ptr...call *ptr...
Continuous Re-Randomization
● Easy to copy code & fix direct references● What about code pointers?
func_2
ptr:
29
func_1
...mov $func_2, ptr...call *ptr...
Continuous Re-Randomization
● Easy to copy code & fix direct references● What about code pointers?
func_2
&func_2ptr:
30
func_1
...mov $func_2, ptr...call *ptr...
Continuous Re-Randomization
● Easy to copy code & fix direct references● What about code pointers?
func_2(deleted)func_2
&func_2ptr:
31
func_1
...mov $func_2, ptr...call *ptr...
Continuous Re-Randomization
● Easy to copy code & fix direct references● What about code pointers?
&func_2ptr:
(deleted)func_2
32
Continuous Re-Randomization
● Easy to copy code & fix direct references● What about code pointers?
● How to update allpropagated pointers?
&func_2ptr:
func_2(deleted)
&func_2&func_2
&func_2
&func_2
&func_2
&func_2&func_2
func_2
33
Continuous Re-Randomization
● Solution: add extra level of indirection
f_2_idxptr:
func_2
f_2_idxf_2_idxf_2_idx
f_2_idx
...
%gs: (table)
...
&func_2
...
34
Continuous Re-Randomization
● Solution: add extra level of indirection
f_2_idxptr:
func_2
f_2_idxf_2_idxf_2_idx
f_2_idx
...
%gs: (table)
...
&func_2
...
f_2_idx
f_2_idx
f_2_idx
35
Continuous Re-Randomization
● Solution: add extra level of indirection
f_2_idxptr:
func_2
f_2_idxf_2_idxf_2_idx
f_2_idx
...
%gs: (table)
...
&func_2
...
f_2_idx
f_2_idx
f_2_idx
func_2
36
Continuous Re-Randomization
● Solution: add extra level of indirection
f_2_idxptr:f_2_idxf_2_idx
f_2_idxf_2_idx
...
%gs: (table)
...
&func_2
...
f_2_idx
f_2_idx
f_2_idx
func_2
(deleted)
37
Code Pointer Abstraction
● Transforming *code_ptr into **code_ptr– Correctness: pointer updates sound & precise
– Disclosure-resilience: code ptr table is hidden
38
Code Pointer Abstraction
● Transforming *code_ptr into **code_ptr– Correctness: pointer updates sound & precise
– Disclosure-resilience: code ptr table is hidden
f_2_idxptr: func_2
func_2
...%gs:
...
39
Code Pointer Abstraction
● Transforming *code_ptr into **code_ptr– Correctness: pointer updates sound & precise
– Disclosure-resilience: code ptr table is hidden
f_2_idxptr: func_2
func_2
...%gs:
...
mov $0x40054d, %rax
=> mov $0x20, %rax
Rewrite initialization pointsRewrite call sites callq *%rax
=> callq *%gs:(%rax)
40
Outline
1. Continuous re-randomization
2. Accelerating our randomization
3. Binary analysis and egalitarianism
4. Results and Demo
41
Return Address Encryption
● Return addresses are code pointers too● Could use code pointer table, but inefficient
– call/ret instructions highly optimized
42
Return Address Encryption
● Return addresses are code pointers too● Could use code pointer table, but inefficient
– call/ret instructions highly optimized
● Alternative mechanism – correct and hidden– Use normal call instructions
– Encrypt return addresses with XOR key
43
Return Address Encryption
● Prevent return address disclosure
44
Return Address Encryption
● Prevent return address disclosure
Thread Stack
ret addr
func_2
func_1
ret addr
ret addr
func_3
45
Return Address Encryption
● Prevent return address disclosure
Thread Stack
(encrypted)
func_2
func_1
(encrypted)
(encrypted)
func_3
+
+
+
XOR key
46
Return Address Encryption
● Prevent return address disclosure
func:
; original code
ret
Thread Stack
(encrypted)
func_2
func_1
(encrypted)
(encrypted)
func_3
+
+
+
XOR key
47
Return Address Encryption
● Prevent return address disclosure● We use binary rewriting (expand basic blocks)
func:mov %fs:0x28,%r11xor %r11,(%rsp); original codemov %fs:0x28,%r11xor %r11,(%rsp)ret
Thread Stack
(encrypted)
func_2
func_1
(encrypted)
(encrypted)
func_3
+
+
+
XOR key
48
Return Address Migration
● Unwind stack and re-encrypt new addresses
Thread Stack
(encrypted)
func_2func_1
(encrypted)
(encrypted)
+
+
+
XOR key
func_3
49
Return Address Migration
● Unwind stack and re-encrypt new addresses
Thread Stack
func_2func_1
func_2
func_1+
+
+
XOR key
func_3
func_3
(encrypted)
(encrypted)
(encrypted)
50
Return Address Migration
● Unwind stack and re-encrypt new addresses
Thread Stack
(deleted)(deleted)
func_2
func_1+
+
+
XOR key
(deleted)
func_3
(encrypted)
(encrypted)
(encrypted)
51
Asynchronous Randomization
52
Asynchronous Randomization
Computations
20ms shuffle period
● Creating new code copies takes time
53
Asynchronous Randomization
● Creating new code copies takes time
ComputationsGenerate
permutationMake newcode copy
Fix callinstructions
Update codepointer table
Stackunwind
15ms shuffling overhead5ms real work
54
Asynchronous Randomization
5ms real work
● Creating new code copies takes time● Shuffler prepares new code asynchronously
Generatepermutation
Make newcode copy
Fix callinstructions
Update codepointer table
Stackunwind
15ms shuffling overhead
Computations
55
Asynchronous Randomization
● Creating new code copies takes time● Shuffler prepares new code asynchronously
Stackunwind
Stackunwind
19.94ms real work 0.06ms
Computations Computations
Generatepermutation
Make newcode copy
Fix callinstructions
Update codepointer table
56
Asynchronous Randomization
● Creating new code copies takes time● Shuffler prepares new code asynchronously● Each thread unwinds its own stack in parallel
99.7% of runtime 0.3%
Computations
Generatepermutation
Make newcode copy
Fix callinstructions
Update codepointer table
Stackunwind
Stackunwind
Computations
57
Outline
1. Continuous re-randomization
2. Accelerating our randomization
3. Binary analysis and egalitarianism
4. Results and Demo
58
Augmented Binary Analysis
● Use additional info from unmodified compilers– Symbols, to distinguish code and data (no -s)
– Relocations, to find all code pointers (--emit-relocs)
59
Augmented Binary Analysis
● Use additional info from unmodified compilers– Symbols, to distinguish code and data (no -s)
– Relocations, to find all code pointers (--emit-relocs)
.section .rodata: .quad 0x400620
.section .text: mov $0x400620, %rax
Code pointer, or integer?
60
Augmented Binary Analysis
● Use additional info from unmodified compilers– Symbols, to distinguish code and data (no -s)
– Relocations, to find all code pointers (--emit-relocs)
.section .rodata: .quad 0x400620
.section .text: mov $0x400620, %rax
.section .rodata: .quad 4195872
.section .text: mov $4195872, %rax
Code pointer, or integer?
61
Augmented Binary Analysis
● Use additional info from unmodified compilers– Symbols, to distinguish code and data (no -s)
– Relocations, to find all code pointers (--emit-relocs)
.section .rodata: .quad 0x400620
.section .text: mov $0x400620, %rax
Code pointer, or integer?
Relocations (meta-data)
.section .rodata: .quad 4195872
.section .text: mov $4195872, %rax
62
Augmented Binary Analysis
● Use additional info from unmodified compilers– Symbols, to distinguish code and data (no -s)
– Relocations, to find all code pointers (--emit-relocs)● ask linker to preserve relocations
.section .rodata: .quad 0x400620
.section .text: mov $0x400620, %rax
Code pointer, or integer?
Relocations (meta-data)
.section .rodata: .quad 4195872
.section .text: mov $4195872, %rax
63
Augmented Binary Analysis
● Allows accurate and complete disassembly
64
Augmented Binary Analysis
● Allows accurate and complete disassembly● Many special cases, but we handle them
65
Where to Re-Randomize From
● Most defenses operate at higher privilege level– i.e. kernel, hypervisor, hardware
– Or else declare their own code “trusted”
66
Where to Re-Randomize From
● Most defenses operate at higher privilege level– i.e. kernel, hypervisor, hardware
– Or else declare their own code “trusted”
● Shuffler is egalitarian– Same level of privilege, no system modifications
– Defends itself from attack
67
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
68
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
mov 0x400620(,%rax,8),%raxjmpq *%rax
0x400620: 0x400508 0x4005140x400630: 0x400520 0x40052c0x400640: 0x400538 0x400544
memcpy’s code
69
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
Rewrite main, printf, ..., memcpy, ...
mov 0x400620(,%rax,8),%raxjmpq *%rax
memcpy’s code
0x400620: 0x400508 0x4005140x400630: 0x400520 0x40052c0x400640: 0x400538 0x400544
70
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
Rewrite main, printf, ..., memcpy, ...
mov 0x400620(,%rax,8),%raxjmpq *%rax
0x400620: 0x20 0x280x400630: 0x30 0x880x400640: 0x40 0x48
memcpy’s codemov 0x400620(,%rax,8),%raxjmpq *%gs:(%rax)
New memcpy code
Invalidates memcpy jump table
But rewrite process uses (old) memcpy
71
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
Rewrite main, printf, ..., memcpy, ...
mov 0x400620(,%rax,8),%raxjmpq *%rax
0x400620: 0x20 0x280x400630: 0x30 0x880x400640: 0x40 0x48
memcpy’s codemov 0x400620(,%rax,8),%raxjmpq *%gs:(%rax)
New memcpy code
??
Invalidates memcpy jump table
But rewrite process uses (old) memcpy
72
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
● Solution: use two copies of Shuffler
73
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
● Solution: use two copies of Shuffler
Shufflerstage 1
Shufflerstage 2
Otherlibraries
C library
Program
Loader loads
rewrites
74
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
● Solution: use two copies of Shuffler
Shufflerstage 1
Shufflerstage 2
Otherlibraries
C library
Program
Loader
invokes
75
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
● Solution: use two copies of Shuffler
Shufflerstage 1
Shufflerstage 2
Otherlibraries
C library
Program
Loader
eraseserases
76
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
● Solution: use two copies of Shuffler– Make new copies
Shufflerstage 2
Otherlibraries
C library
Program
Shufflerstage 2
Otherlibraries
C library
Program
77
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
● Solution: use two copies of Shuffler– Make new copies
Shufflerstage 2
Otherlibraries
C library
Program
78
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
● Solution: use two copies of Shuffler– Make new copies
Shufflerstage 2
Otherlibraries
C library
Program
Shufflerstage 2
Otherlibraries
C library
Program
79
Egalitarian Bootstrapping
● Problem: transformations break original code– e.g. memcpy uses code pointers
● Solution: use two copies of Shuffler– Make new copies
Shufflerstage 2
Otherlibraries
C library
Program
80
Outline
1. Continuous re-randomization
2. Accelerating our randomization
3. Binary analysis and egalitarianism
4. Results and Demo
81
Performance Evaluation
● SPEC CPU overhead at 50ms = 14.9%
82
Performance Evaluation
● SPEC CPU overhead at 50ms = 14.9%● Multiprocess Nginx up to 24 workers
83
Security Evaluation
● Two disclosure-based attack methodologies:– Scan many pages for the desired gadgets
● impacted by disclosure time, network latency
– Explore gadget space in small number of pages● impacted by ROP chain computation time (> 40 seconds)
84
Security Evaluation
● Two disclosure-based attack methodologies:– Scan many pages for the desired gadgets
● impacted by disclosure time, network latency
– Explore gadget space in small number of pages● impacted by ROP chain computation time (> 40 seconds)
● Published JIT-ROP takes 2300-378000 ms● We can re-randomize typically every 20-50 ms
85
Demo
86
87
Conclusion
● Continuous re-randomization every 20-50 ms
88
Conclusion
● Continuous re-randomization every 20-50 ms● Fast:
– Defeats all known code reuse attacks
– Asynchronous shuffling offloads overhead
● Deployable:– Binary analysis w/o modifying kernel, compiler, ...
● Egalitarian:– No additional privileges required
– Shuffler defends its own code
90
Related Work
● JIT-ROP, SOSP 2013● Oxymoron, Usenix Sec 2014● Code Pointer Integrity, OSDI 2014● Stabilizer, SIGARCH 2013● Remix, CODASPY 2016● TASR, CCS 2015● ...more related work in our paper
[1] https://securityintelligence.com/anti-rop-a-moving-target-defense/[2] http://www.ieee-security.org/TC/SP2013/papers/4977a574.pdf
91
Future Work
● Translating stack unwind information– Breaks C++ exceptions, pthread_cancel, etc.
● Cannot shuffle the loader currently– Breaks dlopen
● If shuffling takes too long, no mechanism to pause target program
92
Shuffler Thread Performance
● Asynchronous shuffling runs quickly● Synchronous runtime is 0.3% of total runtime
93
Scalability
● Tradeoff for server workers– Multithreaded => better performance overhead
– Multiprocess => no disclosures across workers
● Both techniques scale well in practice (up to 24x)
unw
unwComputations
unwComputations
unwComputations
Multithreaded program
unw
unwComputations
unwComputations
Multiprocess program
unw
n Shuffler threads1 common Shuffler thread