virtunoid : breaking out of kvm
DESCRIPTION
Virtunoid : Breaking out of KVM. Nelson Elhage Black Hat USA 2011. Outline. Introduction Related work Background K nowledge Attack Detailed CVE 2011-1751 Bug Detailed Exploit Detailed (Take Control of %rip) Inject Shellcode into host Disable non executable page Bypassing ASLR - PowerPoint PPT PresentationTRANSCRIPT
Nelson ElhageBlack Hat USA 2011
Virtunoid: Breaking out of KVM
IntroductionRelated workBackground KnowledgeAttack Detailed
CVE 2011-1751 Bug DetailedExploit Detailed (Take Control of %rip)Inject Shellcode into host Disable non executable pageBypassing ASLR
ConclusionsReference
Outline
It was found that the PIIX4 Power Management emulation layer in qemu-kvm did not properly check for hot plug eligibility during device removals. A privileged guest user could use this flaw to crash the guest or, possibly, execute arbitrary code on the host. (CVE-2011-1751)
Introduction
a generic and open source machine emulator and virtualizer.
Three components:Kvm.koKvm-intel.ko or kvm-amd.koQemu-kvm
QEMU-KVM Background Knowledge
The core KVM kernel moduleProvides ioctls for communicating the kernel
modulePrimarily responsible for emulating the
virtual CPU and MMUEmulates a few devices in-kernel for
efficiencyContains an emulator for a subset of x86 used
in handling certain traps
Kvm.ko [3]
Provides support for Intel’s VMX and AMD’s SVM virtualization extensions
Relatively small compared to the rest of KVM
Kvm-intel.ko/kvm-amd.ko
Provides the most direct user interface to KVM
Based on the classic x86 emulatorImplements the bulk of the virtual devices a
VM usesImplements a wide variety of possible devices
and busesAn order of magnitude more code than the
kernel module
Qemu-kvm
QEMU-KVM Background Knowledge
Static QEMUTimer *active_timers[QEMU_NUM_CLOCKS] Struct QEMUTimer { QEMUClock *clock; int64_t expire_time; QEMUTimerCB *cb; /* call back function*/ void *opaque; /* parameter */ struct QEMUTimer *next; /* link list */}
QEMUTimer (qemu_src/qemu_common.h)
QEMUTimer
Active_timers QEMUTimer
Related functions:Qemu_new_timer: allocate a memory region for the new
timer.Qemu_mod_timer: modify the current timer add it to link
list. Qemu_run_timers: loop through the link list and execute the
timer structure call back function with the opaque as the parameter
QEMUTimer(qemu_src/vl.c)
The main_loop_wait function will iterate through the active_timers and call qemu_run_timers()
QEMUTimer(qemu_src/vl.c)
A computer clock that keep track of the current time
MC146818 RTC hardware manual can be found
http://wiki.qemu.org/File:MC146818AS.pdf
Real Time Clock
RTCState structureStruct RTCState { ….. QEMUTimer *second_timer; QEMUTimer *second_timer2;}
Real Time Clock Emulations [2]
Related functions:Rtc_initfn : initialize the RTCRtc_update_second : update the expire time of
the QEMUTimer and add it to the link list.
Real Time Clock Emulations
rtc_initfn : RTCState *s = …. s->second_timer = qemu_new_timer(rtc_clock, rtc_updated_second, s) s->second_timer2 = qemu_new_timer(rtc_clock, rtc_update_second2, s) qemu_mod_timer(s->second_timer2, s->next_second_time)
Real Time Clock Emulations
RTCState and QEMUTimer
………
Second_timer
Second_timer2 opaqueNext
RTCState
Active_timer
QEMUTimer
QEMUTimer
opaqueNext
Cb
Cb
……….………..……….
……….………..……….
Rtc_update_second
Rtc_update_second2
Real Time Clock Emulations
A south bridge chip.Default south bridge chip used by qemu-kvmInclude ACPI, PCI-ISA, and an embeded MC146818
RTC.Support PCI device hotplug, write values to IO port
0xae08Qemu use qdev_free to emulate device hotplug.Certain devices don’t support device hotplug but qemu
didn’t check this.
PIIX 4
It should not be possible to unplug the ISA bridge
KVM’s emulated RTC is not designed to be unplugged.
PCI-ISA hot plug
Bug Detailed
Did not checkThe device can Be unplug or not
Being dealloc
Add the second timer to link list.
#include <sys/io.h>Int main(){ iopl(3); outl(2, 0xae08); return 0;}
Bug reproducer
………
Second_timer
opaqueNext
Cb
opaqueNext
Cb
……….………..……….
RTCState
QEMUTimer
QEMUTimer
Rtc_update_second
Active_timerUnplug RTC
opaqueNext
Cb
opaqueNext
Cb
……….………..……….
RTCState
QEMUTimer
QEMUTimer
Rtc_update_second
Active_timerUnplug RTC
Second_timer
Dummy memory region
………………
opaqueNext
Cb
opaqueNext
Cb
……….………..……….
RTCState
QEMUTimer
QEMUTimer
Rtc_update_second
Active_timerReturn to main_loop_waitCall qemu_run_timers
Second_timer
Dummy memory region
………………
opaqueNext
Cb
opaqueNext
Cb
……….………..……….
RTCState
QEMUTimer
QEMUTimer
Rtc_update_second
Active_timerQEMUTimer call backRtc_update_second(opaque)
Second_timer
Dummy memory region
………………
opaqueNext
Cb
opaqueNext
Cb
……….………..……….
RTCState
QEMUTimer
QEMUTimer
Rtc_update_second
Active_timerNext Main_loop_wait
Second_timer
Dummy memory region
………………
1. Inject a Controlled QEMUTimer into qemu-kvm
2. Eject ISA bridge3. Force an allocation into the freed
RTCState, with second timer point to our fake QEMUTimer
Exploit Detailed ( Take Control of %rip)
The guest RAM is backed by mmap()ed region inside the qemu-kvm process.
Allocate in the guest RAM and calculate the the host address by the following formula:Hva = physmem_base + gpagpa = page_traslation(gva) <= linux kernel project 1
Gva = guest virtual addressGpa = guest physical addressHva = host virtual addressPhysmem_base = mmap start regionFor now assume we know physmem_base(no aslr)
Inject a Fake QEMUTimer
Force qemu to call mallocUtilize the qemu-kvm user-mode networking
stackQemu-kvm implement DHCP server, DNS
server and NAT gateway in user-mode networking stack
User-mode stack normally handle packets synchronously
To prevent recursion, if a second packet is emitted while handling a first packet, the second packet is queued using malloc.
ICMP ping.
Force allocation into the freed RTCState
1. Allocate a Fake QEMUTimer2. calculate the Fake timer address3. unplug ISA bridge4. ping the gateway containing pointers to
your fake timer.
Putting it together
………
Second_timer
opaqueNext
Cb
opaqueNext
Cb
……….………..……….
RTCState
QEMUTimer
QEMUTimerRtc_update_second
Active_timerAllocate Fake QMEUTimer
opaqueNext
CbFake QEMUTimer
……….………..……….
Evil function(Shellcode)
opaqueNext
Cb
opaqueNext
Cb
……….………..……….
RTCState
QEMUTimer
QEMUTimerRtc_update_second
Active_timerUnplug ISA bridge
opaqueNext
CbFake QEMUTimer
……….………..……….
Evil function(Shellcode)
Second_timer
Ping the gateway
opaqueNext
Cb
opaqueNext
Cb
……….………..……….
RTCState
QEMUTimer
QEMUTimerRtc_update_second
Active_timerFirst Main_loop_wait
opaqueNext
CbFake QEMUTimer
……….………..……….
Evil function(Shellcode)
Second_timer
opaqueNext
Cb
RTCState
QEMUTimer
Active_timerSecond Main_loop_wait
opaqueNext
CbFake QEMUTimer
……….………..……….
Evil function(Shellcode)
Second_timer
1. we have %rip control 2. Where is the Evil function
Inject shellcode to host virtual memoryHost virtual memory has page protection(NX
bit)3. Solutions:
A. ROP B. something clever
What’s Next
1. we can control the QEMUTimer data structure.
2. create multiple QEMUTimer object and chain them together.
Multiple QEMUTimer Chaining
opaqueNext
CbopaqueNext
CbopaqueNext
Cb
……….………..……….
……….………..……….
……….………..……….
QEMUTimer
F1(X) F2(Y) F3(Z)
We now have multiple on argument function calls.
We want to do more arguments function calls. For example, mprotect take three arguments.
Multiple QEMUTimer Chaining
Arguments of types Bool, char, short, int, long, long long, and pointers are in the INTEGER class.
If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used
More detailed check out the reference 7
System V AMD64 ABI [7]
Suppose we can find a function with the following property.
Let f1(x) be set_rsi%rsi register will not be modified during
qemu_run_timer() in most qemu version.Therefore, F2(y) becomes F2(y,x) since we control the
%rsi from f1(x)
Multiple QEMUTimer Chaining
Set_rsi: movl %rdi, %rsi; return
This function will copy its first parameter to the second parameter of ioport_write
%rdi is the first parameter and %rsi is the second parameter. Therefore we get a function with the previous property. (Movl %rdi, %rsi)
Set_rsi for QEMUTimer Chaining
Void cpu_outl(pio_addr_t addr, uint32_t val) { ioport_write(2, addr, val);}
Mprotect prototype:Mprotect(addr, lens, prot)PROT_EXEC = 4
Use the following function
We control the “opaque/ioport” by QEMUTimer and control the “addr” by set_rsi()
Seems like we control everything in this function
Mprotect for QEMUTimer Chaining
Structure of IORange and IORangeOps
Allocate a fake IORangeOps with fake_ops->read = mprotect
Allocate a page-aligned IORange with Fake_ioport->ops = fake_ops Fake_ioport->base = -PAGE_SIZE
Copy shellcode following the IORangeConstruct a timer chain that calls
Cpu_outl(0, *)Ioport_readl_thunk(fake_ioport, 0)Fake_ioport + 1
Putting it All Together
opaqueNext
CbopaqueNext
CbopaqueNext
Cb
IORange (PAGE_ALIGN)
ops
Fill with shellcode
IORangeOps
Read
…….…….…….
…….…….…….
…….…….…….
Cpu_outl Ioport_readl_thunk
mprotectQEMUTimer Chain
The base address of the qemu-kvm binary, to find code address(such as mprotect ….)
Physmem_base, the address of the physical memory mapping inside kvm
Solutions:Find an information leakAssume non-PIE. Every major distribution compile qemu-
kvm as non position independent executable. How about physmem_base
Address
Emulated IO ports 0x510 (address) and 0x511 (data)
Used to communicate various tables to the qemu BIOS (e820 map, ACPI tables, etc)
Also provides support for exporting writable tables to the BIOS
However, fw_cfg_write doesn’t check if the target table is supposed to be writable
Fw_cfg
Several fw_cfg areas are backed by statically-allocated buffers.
Net result: nearly 500 writable bytes inside static variables.
Static data
Mprotect needs a page-aligned address, so these aren’t suitable for our shellcode
We can construct fake timer chains in this space to build a read4() primitive. (Create Information Leak)
Follow pointers from static variables to find physmem_base
Proceed as before
Read4 your way to victory
Sandbox qemu-kvmBuild qemu-kvm as PIELazily mmap/mprotect guest RAMXOR-encode key function pointersMore auditing and fuzzing of qemu-kvm
Possible hardening directions
VM breakouts aren’t magicHypervisors are just as vulnerable as
anything elseDevice drivers are the weak spot.
Conclusions
[1] http://qemu.weilnetz.de/qemu-tech.html [2] http://qemu.weilnetz.de/doxygen/structRTCState.html [3] http://www.linuxinsight.com/files/kvm_whitepaper.pdf [4] https://www.ibm.com/developerworks/cn/linux/l-virtio/ [5] http://smilejay.com/kvm_theory_practice/ [6] http://www.linux-kvm.org/page/Documents [7] http://www.cs.tufts.edu/comp/40/readings/amd64-abi.pdf [8]http://
linuxfromscratch.xtra-net.org/hlfs/view/unstable/glibc-2.4/chapter02/pie.html
qemu source code
Reference