[cb16] cofi break – breaking exploits with processor trace and practical control flow integrity by...

Anti exploitation and Control Flow Integrity with Processor Trace

Brought to you by

Shlomi Obermanindependent security

researcher

Ron Shinaindependent security

researcher

Tracing – what executed and when?

Code optimization and profiling◦Sampling◦Instrumentation

Intel Processor Trace (PT)

Intel PTProcessor feature enabling instruction

tracing with low overhead – documentation says about 5%◦Tens of times faster than the previous option

Available on Intel Broadwell and Skylake processors

A similar feature, Real Time Instruction Trace, exists on certain Intel Atom processors

Intel PT

PacketsProcessor writes trace to memory as packets

Packet Types◦ Taken / Not Taken packets for conditional branches◦ IP packets for indirect branches◦ Timestamp packets◦ …

Binary is needed to recreate the instruction trace

call to foo

branch taken / not taken

Decoded Trace Packets

User and or Kernel tracing

Filter by process

Starting or stopping the trace based on address ranges (only in later processors)

Configuration options

Atom processors supporting RTIT – tracing guests possible, but not the hypervisor

Broadwell – no support at all

Skylake – full support

Tracing VM guests and hypervisors

+ Traced Program’s Binary

Instruction Trace

Intel PT output

Linux kernel 4.1 comes with integrated PT supportLinux kernel 4.3 supports tracing using perf user tools

An open source PT decoding library – libipt

Gdb 7.10 supports using PT for tracing

simple-pt – an open source implementation of PT on Linux(used to create the trace pictures on the previous slide)

* processor supporting PT included separately ;)

Want to use Processor Trace right now? *

Exploitation and the NX Bit

pdf

Hi!

shellcode

When pdf is opened, the shellcode will be in memory that isn’t executable – NX bit

How do attackers run the code to make their shellcode executable?◦ Use code that is already executable (the

program’s code )

This exploitation technique comes in many forms, most notably, ROP – Return Oriented Programming

Using executable memory already in the program usually involves moving around the process rather strangely

for example:

◦ Not returning to a function’s caller

◦ Calling addresses in the middle of functions, instead of at the beginning

◦ …

“Jump Around, Jump around…” / House of Pain

pdf

Hi!

shellcode

Establish rules for how the code flows in the process◦ Functions return to their callers◦ Calls are made to the beginning of functions◦ …

How can those rules be enforced?◦ Add rule checking to the program’s binary◦ Trace the program while running and go over the log (this work)◦ Use other CPU features to detect “surprising” branches

“Control Flow Integrity Principles, Implementations, and Applications”, Abadi, Budiu, Erlingsson, Ligatti, 2005

Control Flow Integrity (CFI)

“Security Breaches as PMU Deviation”, Yuan, Xing, Chen, Zang 2011

“kBouncer: Efficient and Transparent ROP Mitigation” – Pappas, Winner of Microsoft BlueHat competition 2012, uses previous CPU branch tracing capabilities

“CFIMon: Detecting Violation of Control Flow Integrity using Performance Counters” – Xia, Liu, Chen, Zang 2012

“Taming ROP on Sandy Bridge”, Wicherski of Crowdstrike, 2013

“Transparent ROP Detection using CPU Performance Counters”, Li, Crouse, THREADS 2014

and more…

Prior Work

Anti exploitation system to scan files based on CFI (think pdf on Adobe Reader)

Detects whether “illegal” returns were made, like in ROP◦ Easy to add other CFI mitigations, such as checking the

targets of calls (no calls to the middle of functions, …)

(Soon to be) Open SourceDeveloped in 2015

Our Implementation

Verifying CFI via Processor TraceWas the flow OK? Just follow the arrows

and calls using the PT generated packets

What information is needed to follow the execution and verify it?

Control Flow Graph (CFG)◦ Location of functions◦ Location of basic blocks◦ …

Need this for all the libraries loaded by the process – Adobe Reader dlls, Windows dlls◦ If not – false positives

All we have is debugging symbols, pdb files, for the Windows binaries

We used IDA to recover the CFG

IDA didn’t do a good enough job◦Part of the functions and basic blocks in Adobe

Reader / Windows binaries weren’t detected

Static Analysis

When supporting a new version of Adobe Reader, IDA is used to get the initial CFG (static analysis)

Afterwards, many pdf files are traced with PT◦ When a new basic block or function is discovered while following the

trace – the CFG is updated

Repeat◦ run IDA on the new CFG◦ run the pdf files on IDA’s output◦ If the CFG was updated in the last iteration

Repeat

Dynamic Analysis

Most of the edges in the CFG are:◦ Calls relative to the current IP (no

packet for those)◦ Conditional branches

When traversing the CFG during trace verification, fetching the next node in these cases has to be (very) fast

Since the CFG is fixed and built in preprocessing, this isn’t a problem

Optimization

Ideally, no disassembly and CFG modification (slow) would be done during verification

However, some of the code analyzed is created dynamically – as long as it doesn’t change, this can be dealt with in preprocessing

In cases where it changes every time “Adobe Reader” is run to open a file, preprocessing isn’t enough◦ code is disassembled and CFG is updated

Optimization

Following the execution trace is done on a per thread basis

How to know which thread was executing at each part of the trace?◦PT packets give timing information, but

only output the current process

Thread information

Event Tracing for Windows (ETW)

◦It should be possible to get the thread context switching times from the CSwitch events provided by ETW as TSC

◦Then these timestamps could be synched with the TSC packets from PT to determine which thread was running in different parts of the trace

Thread Information

What about getting a callback every time a thread in the traced process is switched in?

◦ AFAWK, no direct way

◦ We hooked the Windows context switch function - don’t do that

◦ Endgame presented a way to achieve this via Asynchronous Procedure Calls (Blackhat 2016)

Thread Information

Need to know the executable memory ranges at all points in the trace – what modules are loaded

Knowing when the PT trace reached ntdll!LdrLoadDll and ntdll!LdrUnloadDll isn’t enough◦ Module name is needed to update the current memory

map

ETW was used to retrieve module load / unload name and time (tsc) and this is then synched with the times of the load/unload functions in the trace

Module load / unload

For example:◦ Exception dispatching code◦ User mode callbacks◦ …

When going over the trace, when suspected mismatches occur, the above special cases are checked via binary signatures

This mostly needs to be done per operating system, not per-application

Still not done – functions don’t always return to their callers

(almost entirely) Not dealt with by our implementation

For PT tracing the code being executed is needed One obvious problem is pages that get written to and

executed from simultaneously

(maybe) One could remove the write permission every time a page becomes writable and executable and handle the access violation when it gets written to, in order to obtain the code’s new version

Dynamically generated code

A case of dynamically generated code that was dealt with:

Applications that hook themselves… with identical hooks, at the same locations and same time

To the trace verifier, the code is essentially static

Dynamically generated code

Benign, non malicious files◦Run on 10000 pdf, 3000 ppt/x, 3000 doc/x without false positives

Malicious files containing a ROP chain◦Run on 5 such files, detecting the exploit and displaying the CFI violation

Scanning Results

you’d still need◦Module load / unload information◦Thread context switch times

but could somewhat do without◦The CFG – a partial CFG can be built from the

trace (it doesn’t need to be built in advance)

Forget CFI and anti-exploitation…What if I just want to trace a process quickly with Processor Trace?

Control-flow Enforcement Technology announced by Intel June 2016. Release date ?

Processors will directly support:◦Shadow (call) Stack tracking –unmatching return control protection exception

◦Indirect branch tracking – an indirect branch to a target containing an instruction different than ENDBRANCH control protection fault

Coming soon to a motherboard near you

ARM has a feature similar to Processor Trace called CoreSight

Tracing on linux has been integrated with perfOpen source decoding library exists – OpenCSD

http://www.linaro.org/blog/core-dump/coresight-perf-and-the-opencsd-library/

What about tracing quickly on ARM?

“Control Jujutsu” – Evans, Long, Otogonbaatar, Shrobe, Rinard, Okhravi, Stelios, CCS 2015

Uses indirect call sites with controllable targets and arguments (via vulnerability) to achieve arbitrary code execution (e.g., call exec or system)

Bypasses CFI because the target functions are legal in the CFG

Bypassing CFI

“Write Once, Pwn Anywhere”, Yu, Black Hat USA 2014

◦Sometimes applications have security critical information in one variable

◦Pseudo-code from internet explorer’s javascript engine:

if (safemode & 0xB == 0) {turn_on_god_mode();}

Bypassing CFI with “data attacks”

“Control Flow Bending”, Carlini, Barresi, Payer, Wagner, Gross, USENIX 2015

◦printf-oriented-programming – if you control the arguments, printf can do arbitrary computation


“Data oriented programming” – Hu, Shinde, Sendroiu, Zheng, Prateek , Zhenkai, S&P 2016

goal: perform arbitrary computation while adhering to the CFG

Similar to ROP in spirit – use parts of the original program as “instructions” of a “VM” controlled by the attacker

“data gadgets” are used to perform computation on data


gadgets are executed one after the other by using constructs already in the vulnerable program – such as loops

the vulnerability being exploited is used to determine which data gadget gets run and on what data

“data oriented programming” (cont)

any questions?