implements bios emulation support for bhyve
DESCRIPTION
TRANSCRIPT
Before talk about BIOS Emulation on BHyVe
Let’s quickly looking into BHyVe internal structure and Intel VT-x
13年3月17日日曜日
BHyVe Overview• bhyveload loads guest
OS
• bhyve is userland part of HypervisorEmulates devices
• bhyvectl is a management tool
• libvmmapi is userland API
• vmm.ko is kernel part of Hypervisor
FreeBSD kernel
bhyveload bhyve
/dev/vmm/${vm_name} (vmm.ko)
Guest kernel
1. Create VM instance,load guest kernel
2. Run VM instace
HD
NIC
Console
Disk imagetap device
stdin/stdout
bhyvectl
libvmmapi
3. Destroy VMinstance
mmap/ioctl
13年3月17日日曜日
vmm.ko
• Provides /dev/vmm/${vmname}
• Each vmm device file contains each VM instance state
• The device file can create via sysctl: hw.vmm.create
• Destroy via sysctl: hw.vmm.destroy
13年3月17日日曜日
/dev/vmm/${vmname} interfaces
• read/write/mmapCan access guest memory area by standard syscall (Which means you even can dump guest memory by dd command)
• ioctlProvides various operation to VM
13年3月17日日曜日
/dev/vmm/${vmname} ioctls
• VM_MAP_MEMORY: Map guest memory area as requested size
• VM_SET/GET_REGISTER: Access registers
• VM_RUN: Run guest machine, until virtual devices accessed (Or some other trap happened)
13年3月17日日曜日
bhyveload• FreeBSD bootloader ported to userland: userboot
• bhyveload loads userboot.so as dynamic link library, call loader_main function
• Once it called, it does following things:
• Parse UFS on diskimage, find kernel
• Load kernel to guest memory area (using mmap)
• Set initial guest register values (using VM_SET_REGISTER ioctl)
• RIP = kernel entry point
• CR0 = Paging enable | Protected mode enable
• EFER = Long mode enable | Long mode active
• Initialize Page Table, set addr to CR3
• Create GDT, IDT, LDT, set addr to GDTR, IDTR, LDTR
• Initialize TR
• Guest machine starts from kernel entry point, with 64bit mode enabled
13年3月17日日曜日
bhyve
• bhyve command runs like following rules:
while (1) {
ioctl(VM_RUN);
device_io_emulation();
}
13年3月17日日曜日
Intel VT-x: Hardware assisted virtualization
• New CPU mode: VMX root mode(hypervisor) / VMX non-root mode(guest)
• If some event which need to emulate in hypervisor,CPU stops guest, exit to hypervisor → VMExit
User(Ring 3)
Kernel(Ring 0)
User(Ring 3)
Kernel(Ring 0)
VMXroot mode
VMXnon-root
mode
VMEntry
VMExit
13年3月17日日曜日
VT-x configuration
• Which event should be handled by hypervisor?It depends hypervisor implementation!
• VT-x is configurable!You can disable/enable each event
• Also can change some behavior of CPU
13年3月17日日曜日
BHyVe BIOS emulation project
• Google Summer of Code ’12“BHyVe BIOS emulation to boot legacy systems”
• Project Goal:Implement BIOS emulation on BHyVe hypervisor, to make BHyVe able to support more guest OSes
13年3月17日日曜日
Limitation of bhyveload
• It’s legacy free! yay!
• But...
• Only supports FreeBSD/amd64
• You need to implement kernel loader for each OSes
• Want to run more OSes on BHyVe!
13年3月17日日曜日
Why don’t you just implement OS loader?• Better than supporting legacy ugly BIOS? True! But...
• OS loader will be heavily dependent kernel implementation
• You’ll be need to implement OS loader for each OSesex: Linux loader, NetBSD loader, OpenBSD loader...
• Maybe it’s very hard to implement proprietary OS loader
• Even OS loader could worked, Guest OS may call BIOS interrupt handler → DIE!It’s common on 32bit x86 OSes.Most 64bit OS are legacy free.
13年3月17日日曜日
BIOS interrupt call• Ex: sys/boot/i386/mbr/mbr.s
main.5: movw %sp,%di # Save stack pointer
movb 0x1(%si),%dh # Load head
movw 0x2(%si),%cx # Load cylinder:sector
movw $LOAD,%bx # Transfer buffer
testb $FL_PACKET,flags # Try EDD?
jz main.7 # No.
pushw %cx # Save %cx
pushw %bx # Save %bx
movw $0x55aa,%bx # Magic
movb $0x41,%ah # BIOS: EDD extensions
int $0x13 # present?
↑BIOS Interrupt Call
13年3月17日日曜日
Software interrupt(INTx)
CPU reads interrupt vector
Execute BIOS call handler
Perform IO by in/out or MMIO
Hardware
int 13h
What happen when it called?
On the ROM
13年3月17日日曜日
How Linux KVM handles BIOS
• KVM uses QEMU for userland process
• QEMU has real BIOS called “SeaBIOS”, opensource BIOS
• SeaBIOS perform I/O by in/out instruction or MMIO
• KVM handles these I/O, emulate devices
13年3月17日日曜日
Software interrupt(INTx)
CPU reads interrupt vector
Execute interrupt handler
QEMU HW Emulation
VMExit by in/out or MMIOSeaBIOS preforms IO to virtual HW
QEMU emulates HW IOHyperVisor
Guest
int 13h
BIOS call handling on KVM
13年3月17日日曜日
Bring SeaBIOS in BHyVe?
• I wanted to use it
• But we can’t bring the code in FreeBSD
• Because it’s GPLv3 licensed
13年3月17日日曜日
OK then, is there BSDL BIOS?
• Unfortunately, we haven’t find any BSDL BIOS
• But, there’s BSDL DOS emulator on Ports: doscmd
• It has DOS & BIOS interrupt call emulator runs on FreeBSD/i386
13年3月17日日曜日
How doscmd works• Map pages on low memory area to place DOS app(<1MB)
• Setup interrupt vector / interrupt handler(It just issues HLT;IRET)
• Load DOS app on low memory area
• Enter virtual 8086 mode(i386_vm86(2)), entry DOS app entry address
• CPU executes DOS app in virtual 8086 mode
• When DOS app calls DOS/BIOS interrupt call, it handled by interrupt handler, the handler issues HLT instruction
• Once HLT instruction issued, CPU leaves from virtual 8086 mode
• doscmd emulates DOS/BIOS interrupt call
• return to virtual 8086 modevirtual 8086
mode
13年3月17日日曜日
Software interrupt(INTx)
CPU reads interrupt vector
Execute interrupt handler
BIOS Emulation
HLT instruction Trap
Issue HLT instruction
doscmd emulates BIOS calldoscmd on FreeBSD/i386
DOS app on v8086 mode
int 13h
How doscmd works
13年3月17日日曜日
Difference of BIOS handling on QEMU vs doscmd• QEMU
Runs real BIOS in guest machineInterrupt handler handles BIOS interrupt callQEMU just emulates hardware devices
• doscmdHasn’t real BIOSInterrupt handler is just for trap vm86 machinedoscmd emulates BIOS interrupt call handler
13年3月17日日曜日
Plan to emulate BIOS on BHyVe
• Extract only necessary code from doscmd, make it libraryExport two function: biosemul_init() / biosemul_call()
• In biosemul_init(), perform BIOS compatible initialization (initialize register value, boot sector loading, initialize interrupt vector, install interrupt handler)
• On interrupt handler, use VMCALL instruction instead of HLT instructionBecause GuestOS also may use HLT, and we don’t want to handle it by BIOS emulation code
• biosemul_call() handles BIOS interrupt callExecutes BIOS interrupt call emulation using doscmd code
13年3月17日日曜日
Software interrupt(INTx)
CPU reads interrupt vector
Execute interrupt call handler
BIOS Emulation
VMExit by VMCALLIssue VMCALL instruction
doscmd emulates BIOS callHyperVisor
Guest
int 13h
How to handle BIOS interrupt call in BHyVe
13年3月17日日曜日
Why don’t you trap interrupt directly?
• Intel VT-x has ability to trap interrupt directly (no need to issue VMCALL instruction in interrupt handler)
• Why we shouldn’t use it for BIOS emulation?Because guest OS may use BIOS interrupt call vector numbers for different software interrupt after entering protected mode
• Bootloaders may invoke interrupt handler by jumping address (btx does it)
13年3月17日日曜日
Problems(1)• doscmd is 64bit unsafe!
Need to rewrite some type definitionEx: u_long → uint32_t
• doscmd maps guest memory area at 0x0Maybe we also can mmap guest memry area at 0x0 on BHyVe, but I rewrited codeEx:*(char *)(0x400) = 0; ↓*(char *)(0x400 + guest_mem) = 0;
13年3月17日日曜日
Problems(2)
• Guest register storagedoscmd stores register value in their structure, but BHyVe requires to issue ioctl to set/get guest register
I decided to copy all register first, then emulate BIOS interrupt call, writeback modified register after that
13年3月17日日曜日
Debugging BIOS emulator
• When I started implementing BIOS emulation, I inserted register dump for each BIOS interrupt call
• Actually, dumping for each BIOS interrupt call is too few to determine what’s going on
• And the emulation doesn’t worked fine, it finally jumped away to strange EIP and commit suicide, I have no idea
• I haven’t find a way to run BHyVe on an emulator and getting instruction level trace
• BHyVe can run on VMware, but I haven’t find a way to do tracing on it
• Decided to implement instruction level trace on BHyVe
13年3月17日日曜日
Implement instruction level tracer on BHyVe(1)• If guest CPU is emulated, dumping each instruction is
very easyJust dump everything when instruction decoder called
• But, on BHyVe guest program runs nativelyBecause it uses VT-x
• This means, you have no way to inspect instruction or dump registers until VMExit caused
• Then, we can raise exception on every instruction
• You can insert instruction to raise exception, but x86 has a flag to single step debugging (TF bit on EFLAGS)
13年3月17日日曜日
Implement instruction level tracer on BHyVe(2)• At first, I implemented following rule:
• Sets TF bit on EFLAGS, enables VMExit on #DB exception
• bhyve handle #DB exception, disassembly instruction on EIP, step forward EIP address, VMEnter again
• I suddenly realized VMExit causing BEFORE executing instruction! USELESS!!
13年3月17日日曜日
Implement instruction level tracer on BHyVe(3)• I changed my mind to handle it just same as BIOS interrupt
call (interrupt handler issue VMCALL instruction→VMExit)
• EIP and some register are pushed on stack because it’s not returnedNeed to fetch from stack to dump
• OLD_EIP = *(uint16_t *)(ESP)
• OLD_CS = * (uint16_t *)(ESP + 2)
• OLD_EFLAGS = * (uint16_t *)(ESP + 4)
• OLD_ESP = * (uint16_t *)(ESP + 6)
13年3月17日日曜日
Instruction level tracer output
[trace] 16bit ip:7c3e cs:0 flags:102 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:cld
[trace] 16bit ip:7c3f cs:0 flags:102 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:xor %cx, %cx
[trace] 16bit ip:7c41 cs:0 flags:146 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %cx, %es
[trace] 16bit ip:7c43 cs:0 flags:146 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %cx, %ds
[trace] 16bit ip:7c45 cs:0 flags:146 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %cx, %ss
[trace] 16bit ip:7c4a cs:0 flags:146 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %sp, %si
[trace] 16bit ip:7c4c cs:0 flags:146 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov $0x700, %di
[trace] 16bit ip:7c4f cs:0 flags:146 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:incb %ch
[trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:100 edx:80 insn:rep movsw
[trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:ff edx:80 insn:rep movsw
[trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fe edx:80 insn:rep movsw
[trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fd edx:80 insn:rep movsw
[trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fc edx:80 insn:rep movsw
[trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fb edx:80 insn:rep movsw
[trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fa edx:80 insn:rep movsw
[trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:f9 edx:80 insn:rep movsw
[trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:f8 edx:80 insn:rep movsw
[trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:f7 edx:80 insn:rep movsw
13年3月17日日曜日
Tracing suddenly stops!(1)
• EFLAGS can be cleared on some conditions
• popf clears EFLAGS:#DB exception still causes immediately after popf instruction issued, so setting TF bit on OLD_FLAGS(on stack) can solve the issue(Guest machine restores EFLAGS by IRET)
13年3月17日日曜日
Tracing suddenly stops!(2)
• EFLAGS can be cleared on some conditions
• BIOS interrupt call VMExit:Looks like CPU clears TF flag when it interrupteddoscmd uses following interrupt call handler for handle BIOS interrupt call:VMCALL; STI; RETF 2RETF 2 means don’t restore CS and EFLAGS, so changing OLD_EFLAGS(on stack) has no effectJust sets TF bit on EFLAGS can solve the issue
• But we must not set TF bit on EFLAGS when interrupt is #DB exceptionIt causes infinite loop
13年3月17日日曜日
Tracing suddenly stops!(3)
• lidt just before switching protected mode
• After IDTR changed, #DB exception cannot handle anymore
• Because #DB handler only installed on real mode interrupt vector, not on IDT
• Modified IDT and implement #DB handler on btx
• #DB exception haven’t caused in real mode after the lidt instruction
• Probably because IDT for protected mode is not valid for real mode
• After switching protected mode, tracing could resumed by set TF flag on EFLAGS
13年3月17日日曜日
Exception causes exception
• Not really sure, but it looks like exception raises at an exception handler
• Because of this, it can’t print error on console
• Inserted VMCALL at the beginning of exception handler, dump it all
13年3月17日日曜日
BTX interrupt call causes exception
[trace] 32bit-kern eip:9332 cs:18 eflags:106 ss:10 esp:17b8 ds:10 cr0:31 eax:31 ebx:9357 ecx:0 edx:70000 insn:decb %al[trace] 32bit-kern eip:9334 cs:18 eflags:106 ss:10 esp:17b8 ds:10 cr0:31 eax:30 ebx:9357 ecx:0 edx:70000 insn:mov %eax, %cr0[trace] 32bit-kern eip:9097 cs:8 eflags:146 ss:0 esp:1800 ds:0 cr0:31 eax:102 ebx:2820 ecx:0 edx:708ee insn:mov $0x10, %cl[trace] 32bit-kern eip:9099 cs:8 eflags:146 ss:0 esp:1800 ds:0 cr0:31 eax:102 ebx:2820 ecx:10 edx:708ee insn:mov %ecx, %ss[trace] 32bit-kern eip:909d cs:8 eflags:146 ss:10 esp:1800 ds:0 cr0:31 eax:102 ebx:2820 ecx:38 edx:708ee insn:ltr %cx[except] 32bit-kern exception:13 error_code:38 eip:909d cs:8 eflags:10146 ss:10 esp:1800 insn:ltr %cx ds:0 cr0:31 eax:102 ebx:2820 ecx:38 edx:708ee
• INT 0x31 (BIOS call from BTX app) causes an exception at LTR instruction
• I Have no idea... → Tried to skips all BIOS call on boot2 & loader, use in/out
13年3月17日日曜日
rep causes exception in loader
[trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a0200 ebx:201000 ecx:52f edx:50000a insn:rep movsb [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a0200 ebx:201000 ecx:52e edx:50000a insn:rep movsb [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a0200 ebx:201000 ecx:52d edx:50000a insn:rep movsb [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a0200 ebx:201000 ecx:52c edx:50000a insn:rep movsb [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a0290 ebx:201000 ecx:52b edx:50000a insn:rep movsb [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a027b ebx:201000 ecx:52a edx:50000a insn:rep movsb [except] 32bit-kern exception:3 error_code:0 eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc insn:rep movsb ds:10 cr0:31 eax:a0236 ebx:201000 ecx:529 edx:50000a
• Really haven’t good idea...
13年3月17日日曜日
Demonstration
13年3月17日日曜日
Conclusion
• Test implementation of BIOS emulator for BHyVe implemented
• Instruction level tracer implemented on it for debugging
• Reached at /boot/loader stage, but it dies before loading a kernel
• Advices by bootloader developers are really needed
• Advices for better debugging method is also needed(Is there hardware debugger for x86?Or, maybe VMware has cool debugging feature?)
13年3月17日日曜日