userfaultfd and post-copy migration

19
Userfaultfd and Post-Copy Migration Mike Rapoport

Upload: kernel-tlv

Post on 14-Jan-2017

267 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Userfaultfd and Post-Copy Migration

Userfaultfd and Post-Copy Migration

Mike Rapoport

Page 2: Userfaultfd and Post-Copy Migration

Outline● Migration background● Userfaultfd● Post-copy migration

Page 3: Userfaultfd and Post-Copy Migration

Migration: why?● Spectacular● Statefull application with no downtime

○ Hardware upgrades○ Software upgrades requiring boot

● Load balancing

Page 4: Userfaultfd and Post-Copy Migration

Migration: how?

● Very simple○ Save state on source○ Copy state to destination○ Restore state on destination

● Memory is the heaviest part○ Pre-copy vs post-copy

Page 5: Userfaultfd and Post-Copy Migration

Migration flows

Pre-copy

● Track memory, copy inactive part● Freeze on source● Copy state and remaining memory● Unfreeze on destination

Post-copy

● Freeze on source● Copy state except memory● Enable “remote swap”● Unfreeze on destination● Bring memory on demand

Page 6: Userfaultfd and Post-Copy Migration

Pre-copy

prepare memory copy 1

memory copy n freeze state

copy unfreeze

time

Running on

sourceStopped Running

on dest

Page 7: Userfaultfd and Post-Copy Migration

Post-copy

prepare rest of the memoryfreeze state copy unfreeze

time

remote page faults

Running on

sourceStopped Running on dest

Page 8: Userfaultfd and Post-Copy Migration

Pre-copy vs post-copy

https://youtu.be/lo2JJ2KWrlA

Pre-Copy

+ Less vulnerable to node failures

+ High performance in “UP” state- Longer downtime- Might diverge

Post-Copy

- More vulnerable to node failures

- Slowdown after migration+ Shorter downtime+ Predictable downtime

Page 9: Userfaultfd and Post-Copy Migration

Userfaultfd highlights● Delegation of page faults to userspace● File descriptor with ioctl’s for control● Poll and read to get page fault notifications● mcopy_atomic to “map” the page

○ Can handle zero pages

Page 10: Userfaultfd and Post-Copy Migration

Userfaultfd setup● Initialize user fault page descriptor

○ uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);

● API handshake○ ioctl(uffd, UFFDIO_API, &uffdio_api);

● Register range○ uffdio_register.range.start = (unsigned long) start;○ uffdio_register.range.len = nr_pages * page_size;○ uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;○ ioctl(uffd, UFFDIO_REGISTER, &uffdio_register);

Page 11: Userfaultfd and Post-Copy Migration

Page fault handling

● Wait for event○ pollfd[0].fd = uffd;○ pollfd[0].events = POLLIN;○ poll(pollfd, 1, -1);

● Read the event○ read(uffd, &uffd_msg, sizeof(uffd_msg));○ if (msg.event != UFFD_EVENT_PAGEFAULT)○ oops...○ faulting_address = msg.arg.pagefault.address

Page 12: Userfaultfd and Post-Copy Migration

Page fault handling● “Map” normal page

○ uffdio_copy.dst = faulting_address;○ uffdio_copy.src = source_page_address;○ uffdio_copy.len = page_size;○ uffdio_copy.mode = 0;○ uffdio_copy.copy = 0;○ ioctl(uffd, UFFDIO_COPY, &uffdio_copy);

● “Map” zero page○ uffdio_zeropage.range.start = faulting_address;○ uffdio_zeropage.len = page_size;○ uffdio_zeropage.mode = 0;○ ioctl(uffd, UFFDIO_ZEROPAGE, &uffdio_zeropage);

Page 13: Userfaultfd and Post-Copy Migration

Under the hood● syscall(__NR_userfaultfd)

○ Allocate userfault context○ Create a file hooked to an anonymous inode○ Wait for API handshake

● ioctl(UFFDIO_API)○ Verify that userspace and kernel talk the same language

● ioctl(UFFDIO_REGISTER)○ Find VMA covering desired range○ Make sure the VMA can “user fault”○ Add userfault context to the VMA

Page 14: Userfaultfd and Post-Copy Migration

Under the hood● Page fault

○ Faulting address covered by VMA with userfault context○ Add “page fault” message to file poll queue○ Wake up process polling the uffd○ Return VM_FAULT_UFFD_RETRY to mm core

● UFFDIO_COPY/UFFDIO_ZEROPAGE○ Allocate a page○ Create a page table entry for faulting address○ Copy the page content from user or○ Map to zero page

Page 15: Userfaultfd and Post-Copy Migration

VM post-copy migration● Guest memory is a part of QEMU

address space● Combine pre- and post-copy● Straightforward flow

○ Start a thread for for user fault handling○ Register guest memory areas with userfaultfd

○ Guest page fault causes UFFD_EVENT_PAGEFAULT

■ Request the page from source■ copy/zero guest memory upon response

○ Fetch non-faulting pages in the background

Page 16: Userfaultfd and Post-Copy Migration

CRIU + post-copy migration● Different address spaces

○ Restore controller○ Restored processes

● Basic flow similar to VMs○ Start a daemon for user fault handling○ Register restored process areas with userfaultfd

■ Might be quite a few uffds○ Handle page faults○ Fetch non-faulting memory in the background

● BUT

Page 17: Userfaultfd and Post-Copy Migration

Non-cooperative userfaultfd ● Page fault cannot block restorer

○ Use UFFDIO_WAKE ioctl

● Processes change mappings on the flight○ fork()○ madvise(..., MADV_DONTNEED)○ mremap()

Page 18: Userfaultfd and Post-Copy Migration

Future● Kernel WIP

○ Write protected pages○ fork, madvise, mremap events○ hugetlbfs○ tmpfs

● CRIU○ Make it work? ;-)