the s2e platform · 2018. 7. 12. · vitaly chipounov • ph.d. in 2014 at epfl, switzerland dslab,...
TRANSCRIPT
-
The S2E PlatformFinding vulnerabilities in
Linux and Windows programs
Vitaly Chipounov
https://s2e.systemsReady-for-use docker image, demos, tutorials, source code, documentation
-
Vitaly Chipounov
• Ph.D. in 2014 at EPFL, Switzerland DSLAB, George Candea
• S2E started back in 2008 when we wanted to reverse engineer Windows device drivers using symbolic execution
• Realized that we could also use this to find bugs in drivers, do performance analysis, etc. => created the S2E platform
• Co-founder and chief architect at Cyberhaven • Finalists in the Cyber Grand Challenge
-
• Evaluate software for vulnerabilities (Attack) • Defend software against attacks (Defend) • Keep software running and available (Availability)
The World's First All-Machine Hacking Tournament
Two teams used S2E
Team CodeJitsu
Cyberhaven UC Berkeley
Syracuse University
Team Disekt
-
Tutorial Overview (Part 1)
• Overview of S2E and symbolic execution
• Getting started with S2E
• Design and Implementation
• Automated generation of proofs of vulnerability
Finding vulnerabilities in user-space apps with S2E
-
Tutorial Overview (Part 2)
• Dealing with path explosion >12 KLOC driver, large input files
• Parallel symbolic execution, concolic execution, fuzzer integration, symbolic fault injection, function models, etc.
Testing Windows device drivers with S2EFAT12/16/32 driver from the WDK (fastfat.sys)
-
S2E is a platform for in-vivo multi-path analysis of software systems
Bug finding Verification
Testing Security checking
…
Extensible Write your own tools
Symbolic execution Concolic execution
State merging Fuzzing
…
On real OSes, with real apps, libraries, drivers
Pretty much anything that runs on computers
-
void main(int argc, char **argv) { int r = 1, i = 1;
if (i < argc) { if (argv[i][0] == 'n') { r = 0; ++i; } }
Dynamic Symbolic Execution
✓ ✓
-
void main(int argc, char **argv) { int r = 1, i = 1;
if (i < argc) { if (argv[i][0] == 'n') { r = 0; ++i; } }
30 GB disk 4 GB RAM?
Dynamic Symbolic Execution
-
30 GB disk 4 GB RAM?
void main(int argc, char **argv) { int r = 1, i = 1;
if (i < argc) { if (argv[i][0] == 'n') { r = 0; ++i; } }
Dynamic Symbolic Execution
-
#include
int main(int argc, char **argv) { FILE *fp = fopen(argv[1], "rb"); if (!fp) { goto err; }
int data; if (fread(&data, sizeof(data), 1, fp) != 1) { goto err; }
if (data == 0xdeadbeef) { printf("Hello world\n"); } else { *(char *)0xbadcafe = 0; }
err: if (fp) { fclose(fp); }
return 0; }
-
Quick Startssh -CX [email protected]
$ wget -O crash.c https://pastebin.com/raw/B1BbapMV $ gcc -g -O0 —m32 -o ./crash ./crash.c $ source s2e/s2e_activate $ s2e new_project ./crash @@ $ cd s2e/projects/crash $ ./launch-s2e.sh …
mailto:[email protected]://pastebin.com/raw/B1BbapMV
-
Analysis Output$ ls -1 s2e/projects/crash/s2e-last/ assembly.ll debug.txt ExecutionTracer.dat info.txt module.bc run.stats tbcoverage-0.json tbcoverage-1.json testcase-crash:0x50f:0x804858a-0-_tmp_input testcase-kill-0-_tmp_input testcase-kill-1-_tmp_input warnings.txt
-
Reproducing crashes with concrete test cases
~/s2e/projects/crash$ gdb --args ./crash s2e-last/testcase-crash:0x50f:0x804858a-0-_tmp_input
(gdb) r Starting program: /home/vitaly/s2e/projects/crash/crash s2e-last/testcase-crash:0x50f:0x804858a-0-_tmp_input
Program received signal SIGSEGV, Segmentation fault. 0x0804858a in main (argc=2, argv=0xffffcec4) at crash.c:18 18 *(char *)0xbadcafe = 0; (gdb)
-
Profiling Path Explosion$ s2e forkprofile crash … 01295 crash:0x08048571 1 /home/ubuntu/s2e/env/crash.c:15 (main)
if (data == 0xdeadbeef) { printf("Hello world\n"); } else { *(char *)0xbadcafe = 0; }
More on that in the 2nd part of the tutorial…
-
Displaying Code Coverage
$ s2e coverage lcov --html crash
Overall coverage rate: lines......: 84.6% (11 of 13 lines) functions..: no data found
Line coverage saved to /home/vitaly/s2e/env/projects/crash/s2e-last/crash.info. An HTML report is available in /home/vitaly/s2e/env/projects/crash/s2e-last/crash_lcov
-
$ s2e coverage lcov --html crash
-
OverviewFinding vulnerabilities in user-space apps with S2E
• Overview of S2E and symbolic execution
• Getting started with S2E
• Design and Implementation
• Automated generation of proofs of vulnerability
-
Valgrind AddressSanitizer KLEE Astrée TEMU BitBlaze VMware CoreDet SyncFinder ESD Parfait Saturn Calysto LLBMC Isabelle
CUTE EXE SJPF Coverity CodeSonar DTrace Oprofile BAP IDApro Jakstab Angr SimOS Pin PinOS LFI
ThreadSanitizer Manticore Dingo SFI Nooks CFI Dimmunix VeriSoft Eraser SAGE DART Cloud9 BitScope bddbddb …
Testing
Verification
Profiling
Simulation Isolation
Immunit
y
Debugging
…
Check properties on execution paths
-
✖
./prog
argc != 2
./prog 123
argc == 2
int main(argc, argv) { if (argc == 2) { printf(“%c”, *argv[2]); … } return 0; }
Checking Properties on Execution Paths
-
Enumerating All Execution Paths
int main(argc, argv) { void *p = malloc(…); if (!p) { exit(-1); } … return 0; }
Program
Kernel
Libraries
Hardware
~2 pathssystem size
Off-limits
-
Environment Modelling
int main(argc, argv) { void *p = malloc(…); if (!p) { exit(-1); } … return 0; }
Program
Environment Model
~ 2 pathsprogram sizeKLEE Cloud9 SLAM LLBMC Coverity …
Impractical
-
Calling the Environment
int main(argc, argv) { void *p = malloc(…); if (!p) { exit(-1); } … return 0; }
Program
Kernel
Libraries
Hardware
False negatives
False positives
KLEE EXE DART Fuzzing …
~ 2 pathsprogram size
-
Analysis Tools Make Trade-offs
• Accuracy vs. performance False positives vs. false negatives
• Types of analysis Testing, verification, profiling, etc.
• Software stack level Applications, libraries, kernel modules, etc.
• Source code vs. binaries
-
S2E is a Platform that Enables Flexible Trade-offs
• Accuracy vs. performance Symbolic execution, state merging, fuzzing, etc. You can write models, but don’t have to!
• Types of analysis Wide range of analyses
• Software stack level In-vivo, at any level of software stack
• Source code vs. binaries Works on binaries
-
Kernel
Libraries
Hardware
Applications
Distributed systemsReverse
engineering
RevNIC [EUROSYS’10]
Profiling
PROFs [ASPLOS’11]
Testing
BIOS Testing [WOOT’15] Avatar [NDSS’14] CHEF [ASPLOS’14] Achilles [ASPLOS’13] SWIFT [EUROSYS’12] SymDrive [OSDI’12] SymNet [WRIPE’12] DDT [USENIX’11] …
VerificationSoftware router dataplanes [NSDI’14]
Security
CVE-2015-1536
-
Analysis Plugins
Symbolic Execution
Engine
Dynamic Binary
Translator
Instrumentation Engine
Path Selection Plugins
S2E
Applications
Libraries
Kernel Drivers
Virtual Hardware
VM
What input to make symbolicWhat input to make concreteSearch heuristics
Check for crashes, vulnerability conditions, performance metrics, etc.
-
• S2E uses QEMU
• S2E and QEMU are completely decoupled
• S2E is contained in libs2e.so
• libs2e.so intercepts and replaces /dev/kvm functionality
• Need a few simple KVM extensions to intercept DMA, disk R/W, and device state snapshotting
• You don’t have to use QEMU with S2E
KVM Extensions for Symbolic Execution
Analysis Plugins
Symbolic Execution
Engine
Dynamic Binary
Translator
Instrumentation Engine
Path Selection Plugins
S2E
Applications
Libraries
Kernel Drivers
Virtual Hardware
VMli
bs2e
.so
KVM-compatible interface/dev/kvm
-
• We refactored QEMU’s translator to make it standalone
• libcpu, libtcg: code translation and generation libraries
• libs2ecore, libs2eplugins, klee, libvmi, etc.
• You can reuse these in your own projects
• You can swap out the symbolic execution engine with your own if you want
Modular Architecture
Analysis Plugins
Symbolic Execution
Engine
Dynamic Binary
Translator
Instrumentation Engine
Path Selection Plugins
S2E
Applications
Libraries
Kernel Drivers
Virtual Hardware
VMli
bs2e
.so
KVM-compatible interface/dev/kvm
-
Dynamic Binary Translation
0x80000000: mov [ebx], eax
void tb_0x80000000(cpu) { tmp1 = cpu->regs[EBX]; tmp2 = cpu->regs[EAX];
__stl_mmu(tmp1, tmp2); }
while(true) { tb = translate(cpu->pc) tb->func(cpu); }}
-
Dynamic Binary Translation
0x80000000: mov [ebx], eax
void tb_0x80000000(cpu) { tmp1 = cpu->regs[EBX]; tmp2 = cpu->regs[EAX];
__stl_mmu(tmp1, tmp2); }
Host-independent micro-operations
Frontend
Host instructions (x86, arm, mips, etc.)
Backend (one per target architecture)
while(true) { tb = translate(cpu->pc) tb->func(cpu); }} translate()
-
Dynamic Binary Translation
translate(pc) { do { ins = disassemble(pc); emit_uops(ins); pc += ins.size; } while (ins != jmp); }}
while(true) { tb = translate(cpu->pc) tb->func(cpu); }}
translate(pc) { do { if (s2e_instrument_ins(pc)) { emit_uops_s2e(); }} ins = disas(pc); emit_uops(ins); pc += ins.size; } while (ins != jmp); }}
-
Code Instrumentationtranslate(pc) { do { ins = disassemble(pc); emit_uops(ins); pc += ins.size; } while (ins != jmp); }}
translate(pc) { do { if (s2e_instrument_ins(pc)) { emit_uops_s2e(); }} ins = disas(pc); emit_uops(ins); pc += ins.size; } while (ins != jmp); }}
0x80000000: mov [ebx], eax
void tb_0x80000000(cpu) { s2e_call_plugins();
tmp1 = cpu->regs[R_EBX]; tmp2 = cpu->regs[R_EAX];
__stl_mmu(tmp1, tmp2); }
void tb_0x80000000(cpu) { tmp1 = cpu->regs[EBX]; tmp2 = cpu->regs[EAX];
__stl_mmu(tmp1, tmp2); }
-
Dynamic Binary Translation
0x80000000: mov [ebx], eax
void tb_0x80000000(cpu) { tmp1 = cpu->regs[EBX]; tmp2 = cpu->regs[EAX];
__stl_mmu(tmp1, tmp2); }
Host-independent micro-operations
Frontend
Host instructions (x86, arm, mips, etc.)
Backend (one per target architecture)
while(true) { tb = translate(cpu->pc) tb->func(cpu); }} translate()
+LLVM
-
define i64 @tb_0x80000000(i64*) #12 { entry: %loc_18ptr = alloca i32 %loc_19ptr = alloca i32 %1 = getelementptr i64, i64* %0, i32 0 %2 = load i64, i64* %1 %eax_ptr = getelementptr %struct.CPUX86State, …, i32 0, i32 0 %ebx_ptr = getelementptr %struct.CPUX86State, …, i32 0, i32 2 %eax = load i32, i32* %eax_ptr %ebx = load i32, i32* %ebx_ptr call void @__stl_mmu(i32 %ebx, i32 %eax) … }
Symbolic Execution0x80000000: mov [ebx], eax
KLEE
-
Applications
Libraries
Kernel Drivers
Analysis Plugins
Symbolic Execution
Engine
Dynamic Binary
Translator
Instrumentation Engine
Virtual Hardware
Path Selection Plugins
S2E
VM
Core execution events
onTranslateInstructionStart onMemoryAccess onTimer onStateFork onStateKill onCustomInstruction …
Writing Plugins
-
void MyPlugin::initialize() { s2e()->getCorePlugin()->onTranslateInstructionStart-> connect(MyPlugin::onInstructionTranslation); }
void MyPlugin::onInstructionTranslation(signal, pc) { if (pc == crashHandlerPc) { signal->connect(MyPlugin::onInstructionExecution, …); } }
void MyPlugin::onInstructionExecution(state) { s2e()->getExecutor()->terminateState(…); }
Where do I get the crash handler pc? Is it going to change between OSes? Where do I get information about the crash (pid, pc, etc.)? Is the crash handler pc in the correct address space?
=> Requires complex virtual machine introspection
-
void MyPlugin::initialize() {}
void MyPlugin::handleOpcodeInvocation(state, data, size) { my_plugin_data_t data; s2e()->mem()->read(&data, data, size…); s2e()->getExecutor()->terminateState(data.pid, “crashed”); }}
// linux-4.9.3/arch/x86/mm/fault.c static void force_sig_info_fault(int si_signo, int si_code, unsigned long address, …) { unsigned lsb = 0; siginfo_t info;
my_plugin_data_t data; data.pid = current->pid; data.address = address; s2e_invoke_plugin(“MyPlugin”, &data, sizeof(data); … }}
Do as much work as possible inside the guest!
-
Applications
Libraries
Kernel Drivers
Analysis Plugins
Symbolic Execution
Engine
Dynamic Binary
Translator
Instrumentation Engine
Virtual Hardware
Path Selection Plugins
S2E
VM
s2e_invoke_plugin();
OS Monitoring Plugins
onProcessLoad() onProcessExit() onThreadCreate() onThreadExit() onSegFault() …
S2E Guest Tools
#!/bin/bash
s2ecmd kill -1 “error” s2eget file.txt …
-
Overview
• Overview of S2E and symbolic execution
• Getting started with S2E Finding simple crashes and getting code coverage
• Design and Implementation
• Automated generation of proofs of vulnerability
Finding vulnerabilities in user-space apps with S2E
-
What is a proof of vulnerability (PoV)?
• Input to the program that can control the program’s state arbitrarily
• Set the program counter and a general purpose register to an arbitrary agreed upon value
• Exfiltrate data from memory
• Useful for developers to triage bugs (a crash in itself may not be exploitable).
$ gdb —args ./vulnerable-program pov_input.txt SEGFAULT at 0xdeadbeef
-
int main(int argc, char **argv) { char buf[4]; receive(buf, 12); return 0; }
pushebpmovebp,espsubesp,4;Allocate4bytesonthestackforbufleaeax,[ebp-4];Computeaddressofbufpush12;Push12for2ndparameterofreceivepusheax;Pushaddressofbuffor1stparamofreceivecallreceivexoreax,eax;Setreturnvalueto0leave;Cleanthestackframeret;Returnfromthemainfunction
echo AAAAAAAABBBBBBBBCCCCCCCC | xxd -r -p | ./program
Segfault at address 0xCCCCCCCC
-
pushargvpushargccallmain
main:pushebpmovebp,espsubesp,4leaeax,[ebp-4]push12pusheaxcallreceivexoreax,eaxleaveret
0xf8:0xabc0;argv0xf4:0xdef0;argc0xf0:0x800231;returnaddress0xec:0x1000;savedframepointer0xe8:????????;spaceforthebuffer0xe4:12;sizeoftheparameterpassedtoreceive0xe0:0xe8;addressofthebufferonthestack
0xf8:0xabc0;argv0xf4:0xdef0;argc0xf0:CCCCCCCC;returnaddress0xec:BBBBBBBB;savedframepointer0xe8:AAAAAAAA;spaceforthebuffer0xe4:12;sizeoftheparameterpassedtoreceive0xe0:0xe8;addressofthebufferonthestack
-
int main(int argc, char **argv) { char buf[4]; receive(buf, 12); return 0; }
echo λ0λ1λ2λ3λ4λ5λ6λ7λ8λ9λ10λ11 | ./program
Segfault at address λ11λ10λ9λ8
-
pushargvpushargccallmain
main:pushebpmovebp,espsubesp,4leaeax,[ebp-4]push12pusheaxcallreceivexoreax,eaxleaveret
0xf8:0xabc0;argv0xf4:0xdef0;argc0xf0:0x800231;returnaddress0xec:0x1000;savedframepointer0xe8:????????;spaceforthebuffer0xe4:12;sizeoftheparameterpassedtoreceive0xe0:0xe8;addressofthebufferonthestack
0xf8:0xabc0;argv0xf4:0xdef0;argc0xf0:λ11λ10λ9λ8;returnaddress0xec:λ4λ5λ6λ7;savedframepointer0xe8:λ0λ1λ4λ3;spaceforthebuffer0xe4:12;sizeoftheparameterpassedtoreceive0xe0:0xe8;addressofthebufferonthestack
-
0xf8:0xabc0;argv0xf4:0xdef0;argc0xf0:λ11λ10λ9λ8;returnaddress0xec:λ4λ5λ6λ7;savedframepointer0xe8:λ0λ1λ4λ3;spaceforthebuffer0xe4:12;sizeoftheparameterpassedtoreceive0xe0:0xe8;addressofthebufferonthestack
onSymbolicAddress(S2EExecutionState *state, klee::ref address, …)
pushargvpushargccallmain
main:pushebpmovebp,espsubesp,4leaeax,[ebp-4]push12pusheaxcallreceivexoreax,eaxleaveret
1. Add constraint address == 0xbadcafe 2. Solve constraints 3. Get concrete test case
Segfault at address 0xbadcafe
echo 0000000000000000fecaad0b | xxd -r -p | ./program
-
Practice Time!ssh -CX [email protected]
$ wget -O vulnerable.c https://pastebin.com/raw/rBYFGEJS
$ gcc -g -w -fno-stack-protector -D_FORTIFY_SOURCE=0 -O0 -m32 \ -o ./vulnerable vulnerable.c
$ source s2e/s2e_activate $ s2e new_project —-enable-pov-generation ./vulnerable @@ $ cd s2e/projects/vulnerable $ ./launch-s2e.sh …
mailto:[email protected]://pastebin.com/raw/rBYFGEJS
-
Testcases for PoVs and Crashes
$ls~/s2e/projects/vulnerable/s2e-last_tmp_input-crash@0x0-pov-unknown-0_tmp_input-crash@0x0-pov-unknown-3_tmp_input-recipe-type1_i386_generic_reg_ebp.rcp@0x8048760-pov-type1-0_tmp_input-recipe-type1_i386_generic_reg_edi.rcp@0x8048760-pov-type1-0_tmp_input-recipe-type1_i386_generic_shellcode_eax.rcp@0x804883e-pov-type1-3...
-
Validating PoVs$gdb--args./vulnerables2e-last/_tmp_input-recipe-type1_i386_generic_shellcode_eax.rcp@0x804883e-pov-type1-3(gdb)r
Startingprogram:/home/user/s2e/env/projects/vuln-lin32-32/vulne…
Demoingfunctionpointeroverwrite
ProgramreceivedsignalSIGSEGV,Segmentationfault.0x44556677in??()(gdb)inforegisterseax0xccddeeff-857870593ecx0x445566771146447479edx0x4064ebx0x00esp0xffffcd1c0xffffcd1cebp0xffffcd680xffffcd68esi0x804b008134524936…
-
Controlling Register Values
---------------------------------------------------------------------------------ThispluginwritesPoVsasinputfiles.Thisissuitableforprogramsthat--taketheirinputsfromfiles(insteadofstdinorothermethods).add_plugin("FilePovGenerator")pluginsConfig.FilePovGenerator={--ThegeneratedPoVwillsettheprogramcounter--ofthevulnerableprogramtothisvaluetarget_pc=0x0011223344556677,
--ThegeneratedPoVwillsetageneralpurposeregister--ofthevulnerableprogramtothisvalue.target_gp=0x8899aabbccddeeff}
s2e-config.lua
-
Recipes for PoVs• Recipes tell S2E how to constrain the input in order
to get interesting PoVs
• The recipe plugin monitors execution for symbolic pointers
• When the recipe plugin detects a symbolic pointers, it looks at the available recipes and applies those that satisfy constraints
-
$ ls ~/s2e/projects/vulnerable/recipe type1_i386_generic_reg_eax.rcp type1_i386_generic_reg_ebp.rcp type1_i386_generic_reg_esi.rcp … type1_i386_generic_reg_esp.rcp type1_i386_generic_shellcode_eax.rcp type1_i386_generic_shellcode_ebp.rcp
Recipes for PoVs
-
# Set GP and EIP with shellcode # mov eax, $gp # mov ecx, $pc # jmp ecx :type=1 :reg_mask=0xffffffff :pc_mask=0xffffffff :arch=i386 :platform=generic :gp=EAX :exec_mem=EIP [EIP+0] == 0xb8 [EIP+1] == $gp[0] [EIP+2] == $gp[1] [EIP+3] == $gp[2] [EIP+4] == $gp[3] [EIP+5] == 0xb9 [EIP+6] == $pc[0] [EIP+7] == $pc[1] [EIP+8] == $pc[2] [EIP+9] == $pc[3] [EIP+10] == 0xff [EIP+11] == 0xe
-
# Set GP and EIP with shellcode # mov eax, $gp # mov ecx, $pc # jmp ecx :reg_mask=0xffffffff :pc_mask=0xffffffff :type=1 :arch=i386 :platform=generic :gp=EAX :exec_mem=EIP [EIP+0] == 0xb8 [EIP+1] == $gp[0] [EIP+2] == $gp[1] [EIP+3] == $gp[2] [EIP+4] == $gp[3] [EIP+5] == 0xb9 [EIP+6] == $pc[0] [EIP+7] == $pc[1] [EIP+8] == $pc[2] [EIP+9] == $pc[3] [EIP+10] == 0xff [EIP+11] == 0xe
[EIP+0] == 0xb8 [EIP+1] == 0xaa [EIP+2] == 0xaa [EIP+3] == 0xaa [EIP+4] == 0xaa [EIP+5] == 0xb9 [EIP+6] == 0xbb [EIP+7] == 0xbb [EIP+8] == 0xbb [EIP+9] == 0xbb [EIP+10] == 0xff [EIP+11] == 0xe
EIP=λ11λ10λ9λ8
Look for address ranges that are - executable - contain symbolic dataWhen such a range is found (e.g, at address 0xb8001200) - add constraint EIP == 0xb8001200 - add constraints [EIP+i] == 0x…If constraints satisfiable => we have a PoV
-
Challenges• Unreplayable PoVs
Randomness (/dev/random, ASLR, etc.)
• Arbitrary memory reads and writes Require plugin support to track interesting memory regions
• Interactive CGC-style PoVs We have seen file-based PoVs so far
More details on s2e.systems/docs
-
Conclusion
• S2E enables flexible trade-offs Source/binary, environment, SW stack level, etc.
• Using S2E to generate proofs of vulnerabilities
https://s2e.systemsReady-for-use docker image, demos, tutorials, source code, documentation
We are hir
ing!