cpu side-channels vs. virtualization malware: the good, the bad, or the ugly

CPU SIDE-CHANNELS VS. VIRTUALIZATION MALWARE: THE GOOD, THE BAD, OR THE UGLY

Yuriy Bulygin

Security Center of Excellence

Intel Corporation

04/22/232

AGENDA• RSB based micro-architectural side-channel• Hyper-channel: detecting hypervisor with uArch

side-channel• Demo• Conclusion

04/22/233

RSB BASED μARCH SIDE-CHANNEL

04/22/234

μARCH SIDE-CHANNELS• Cache based side-channel attacks• (Simple) Branch Prediction Analysis (BPA)• Instruction cache analysis• Shared FU attack (shared multiplier in SMT capable CPU)

• Crypto + Spy threads (software or hardware) share some CPU resource• Spy puts the shared resource in a known state and monitors if

and how it was corrupted by crypto• Crypto may corrupt spy’s state depending on the secret (key)• Information about the secret leaks through this CPU resource

and can be measured by the spy to recover the key

04/22/235

RETURN STACK BUFFER (RSB)

• Internal hardware “stack” in CPU– Typically simple push/pop stack structure with 16 entries– May be more complicated that simple stack on modern CPUs• Predicts target address of RET instruction before it’s

available from memory– CALL instruction drives next linear IP (return address) into

the RSB– target address of RET instruction is derived from the topmost

RSB entry– RSB is circular buffer with respect to CALL’s: if RSB is full the

oldest return address is overwritten• Mispredict penalty if it’s later determined that it

doesn’t match return address popped from program stack

04/22/236

USING RSB TO SPY ON CRYPTO CODE

• Spy thread executes 16 nested CALL instructions to fill RSB with spy’s return addresses• Crypto thread executes code (e.g. ER step in Montgomery

modular reduction algorithm)• Spy thread then executes 16 RET instructions and measures

time taken to execute them– Or directly measures “number of RSB misses” performance counter

• Spy observes increased time due to RSB mispredictions corresponding to one or more spy’s return addresses replaced with crypto’s return addresses• What if crypto implementation replaced different # of RSB

entries depending on key bit or result of mod multiplication ??• Spy would be able to differentiate key bit value based on # of

RSB mispredictions

04/22/237

FILLING RSB WITH SPY’S RET’urns

call func15 call func14 call func13 call func12 call func11 call func10 call func9 call func8 call func7 call func6 call func5 call func4 call func3 call func2 call func1 call func0

RSB

crypto executes

// Montgomery modular reductioncrypto_montgomery_reduction {.. // End Reduction step if( crypto_cmp(a, N) >= 0 ) { crypto_sub(a, a, N); }..}

• Crypto thread executes square-and-multiply modular exponentiation or Montgomery modular multiplication (MMM)

• Let’s take a look at this Montgomery reduction:

04/22/238

CRYPTO CORRUPTS SPY’S RSB DEPENDING ON THE SECRET

if( crypto_cmp(a, N) >= 0 ) { crypto_sub(a, a, N); }

RSB

The rest of spy’s return addressesare not corrupted

1. No End Reduction (A < N)

if( crypto_cmp(a, N) >= 0 ) { crypto_sub(a, a, N); }

2. End Reduction is carried out (A ≥ N)

crypto_sub replaces additional entries

04/22/239

SPY OBSERVES RSB MISSPREDICTIONS

Spy can distinguish if crypto executed:

• crypto_cmp only (1 RSB miss): MMM w/o End Reduction

or• crypto_cmp/crypto_sub

(4 RSB misses): MMM with ER step

ret ; func15 ret ; func14 ret ; func13 ret ; func12 ret ; func11 ret ; func10 ret ; func9 ret ; func8 ret ; func7 ret ; func6 ret ; func5 ret ; func4 ret ; func3 ret ; func2 ret ; func1 ret ; func0

RSBrdtsc

RSB miss RSB miss RSB miss RSB miss

rdtsc

04/22/2310

HYPER-CHANNEL:USING RSB BASED μARCH SIDE-CHANNEL TO SPY ON HYPERVISOR

04/22/2311

OOPS. LET’S DO IT AGAIN

1.Spy populates RSB by executing 16 nested CALL’s

2.Executes CPUID or any other instruction that causes #VMEXIT• If OS is in non-root

(guest) mode then CPUID is trapped by hypervisor

call func15 call func14 call func13 call func12 call func11 call func10 call func9 call func8 call func7 call func6 call func5 call func4 call func3 call func2 call func1 call func0

RSB

#VMEXIT CPUID

04/22/2312

HYPERVISOR CORUPTS SPY RSB CONTENTS

3.#VMEXIT handler is likely to “corrupt” 1 or more spy’s RSB entries replacing them with its own entries• It enough for #VMEXIT handler

to make 1 CALL to subfunction

vmexit_subfunc1: call vmexit_subfunc11 vmexit_subfunc: call vmexit_subfunc1 VMExit_Handler: call vmexit_subfunc

RSB

13 hyper-channel return addressesare not corrupted

04/22/2313

SPY OBSERVES RSB MISSPREDICTIONS

4.After #VMEXIT spy executes 16 RET’urns

– RSB hit: < 3 clk cycles– RSB miss penalty: 10-15 clk

cycles

5.Experiment:– Clear: 83 cycles– Rootkit-ed: 123 cycles– Can be >300 cycles if

#VMEXIT handler slightly modified

ret ; func15 ret ; func14 ret ; func13 ret ; func12 ret ; func11 ret ; func10 ret ; func9 ret ; func8 ret ; func7 ret ; func6 ret ; func5 ret ; func4 ret ; func3 ret ; func2 ret ; func1 ret ; func0

RSBrdtsc

RSB miss RSB miss RSB miss

rdtsc

04/22/2314

CLOSER LOOK AT THE RSB SPY ..

func15() { cpuid ; #VMEXIT on VT rdtsc ; start measurement ret ; start 16 returns}func14() { call func15 ret}..func0() { call func1 ret}

spy() { cli call func0 rdtsc ; end measurement sti}

04/22/2315

DEMO: HYPER-CHANNEL DETECTOR

04/22/2316

DEMO: HYPER-CHANNEL

04/22/2317

PROPERTIES• No false negatives !! A single RSB entry corruption is detectable

– Hyper-channel needs to know time taken by 16 RET’s to execute on non-virtualized OS (noticed 100 in command-line ??)

– “# of RSB misses” perf. counter is always 0 on non-virtualized OS !!• The RSB side-channel detection is probabilistic

– RSB can be flushed due to multiple events– So the detector needs to make multiple measurements to decrease

likehood of the false positive– Experimental probability of a false positive is ~ 1/1000 (RSB was flushed

during hyper-channel’s measurement)– Make as few as 10 measurements

• #VMEXIT behavior related to RSB depends on the core– RSB may be entirely flushed by #VMEXIT microcode– This is easily detectable but detector cannot tell anything about the

hypervisor• Timing and TLB profiling are also side-channels

– But there’s no externally published uArch side-channel using TLB’s

04/22/2318

EVADING HYPER-CHANNEL• Hypervisor may not make any calls inside VMExit

handler– In this case hyper-channel detector will be useless– But this is a painful restriction !!– It’s similar to requiring crypto implementations to not make

any key-dependant calls (what about recursive Karatsuba sqr/mul ??)

• Clearly malicious hypervisor can masquerade legitimate VMM by making the same # of nested calls– It cannot evict all 16 entries as it’s suspicious !! Which

legitimate VMM calls more than 16 nested subroutines ?? shoot it..

04/22/2319

CONCLUSION• Side-channels are good..• Yeah, I know.. this conclusion sucks

• Although many are tired of virtualization competition, let’s respect awesome research in virtualization rootkits and their detection• With widespread of HW virtualization, exploits targeting

legitimate hypervisors may become as common as OS kernel exploits are now• We can detect that OS is virtualized, probably can detect

malicious hypervisor by all known heuristics• So what ?? Can we remove it ??

04/22/2320

PLUG: DeepWatch

• DeepWatch is a Proof of Concept hardware based detector of virtualization malware• that uses embedded microcontroller in

chipset• to detect malicious hypervisor and remove

it from the system• I hope you’ll see its demo soon..

04/22/2321

THANK YOU !! QUESTIONS ??

• Thanks to researchers of virtualization rootkits, their detection methods, and uArch side-channel analysis

• I’d also like to acknowledge Sagar Dalvi and Mark Davis from Intel

[email protected]://www.intel.com/security

mailto:[email protected]

http://www.intel.com/security



04/22/2323

REFERENCES• Nate Lawson, Peter Ferrie, Thomas Ptacek:

http://www.matasano.com/log/930/side-channel-detection-attacks-against-unauthorized-hypervisors/https://www.blackhat.com/presentations/bh-usa-07/Ptacek_Goldsmith_and_Lawson/Presentation/bh-usa-07-ptacek_goldsmith_and_lawson.pdfhttp://www.matasano.com/log/

• Joanna Rutkowska, Alexander Tereshkin:http://bluepillproject.orghttp://www.invisiblethingslab.com

• Dino A. Dai Zovi:http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Zovi.pdf

• Peter Ferrie. Attacks on More Virtual Machine Emulators: http://pferrie.tripod.com/papers/attacks2.pdf

• Edgar Barbosa: http://rapidshare.com/files/42452008/detection.rar.html • Tal Garfinkel, Keith Adams, Andrew Warfield, Jason Franklin:

http://www.cs.cmu.edu/~jfrankli/hotos07/vmm_detection_hotos07.pdf, http://x86vmm.blogspot.com/2007/07/bluepill-detection-in-two-easy-steps.html

• Michael Myers, Stephen Youndt: http://www.crucialsecurity.com//index.php?option=com_content&task=view&id=94&Itemid=136/

• bugcheck: vrdtsc

http://www.invisiblethingslab.com/

http://pferrie.tripod.com/papers/attacks2.pdf

http://rapidshare.com/files/42452008/detection.rar.html

http://www.cs.cmu.edu/~jfrankli/hotos07/vmm_detection_hotos07.pdf

http://x86vmm.blogspot.com/2007/07/bluepill-detection-in-two-easy-steps.html

cpu side-channels vs. virtualization malware: the good, the bad, or the ugly

Documents

thirdparty marks

respective owners

keycopyright intel corporation

rsb entries

oldest return address

linear ip return address

doesnt match return

topmost rsb entryrsb