a reimplementation of netbsd based on a microkernel by andrew s. tanenbaum

63
1 A REIMPLEMENTATION OF NETBSD BASED ON A MICROKERNEL Andrew S. Tanenbaum and a team of students and programmers who actually did all the work. Three of them, Ben Gras, Lionel Sambuc, and Arun Thomas are here! Vrije Universiteit Amsterdam, The Netherlands

Upload: eurobsdcon

Post on 24-Jul-2015

1.221 views

Category:

Technology


0 download

TRANSCRIPT

1

A REIMPLEMENTATION OF NETBSDBASED ON A MICROKERNEL

Andrew S. Tanenbaum

and a team of students and programmers who actually did all the work. Three of them,

Ben Gras, Lionel Sambuc, and Arun Thomas are here!

Vrije UniversiteitAmsterdam, The Netherlands

2

GOAL OF OUR WORK: BUILD A RELIABLE OS

Tanenbaum’s definition of a reliable OS:

“An operating system is said to be reliable when a typical user has never experienced even a single failure in his or her lifetime and does not know anybody who has ever experienced a failure.”

In engineering terms, this is probably mean time to failure > 50 years

I don’t think we are there yet

3

THE TELEVISION MODEL (BEFORE SMART TVs)

1. You buy the television

2. You plug it in

3. It works perfectly for the next 10 years

4

THE COMPUTER MODEL (WINDOWS EDITION)

5

THE COMPUTER MODEL (WINDOWS EDITION)

1. You buy the computer

2. You plug it in

3. You install service packs 1 through 9f

4. You install 18 new emergency security patches

5. You find and install 7 new device drivers

6. You install antivirus software

7. You install antispyware software

8. You install antihacker software (firewall)

9. You install antispam software

10. You reboot the computer

6

THE COMPUTER MODEL (2)

11. It doesn’t work

12. You call the helpdesk

13. You wait on hold for 30 minutes

14. They tell you to reinstall Windows

7

TYPICAL USER REACTION

The New York Times recently reported that 25% of computer users have gotten so angry at their computerthat they physically hit it.

IS RELIABILITY SO IMPORTANT?

• Annoying

• Lost work• But also think about

– Industrial control systems in factories

– Power grids

– Hospital operating rooms

– Banking and e-commerce servers

– Emergency phone centers

– Control software in cars, airplanes, etc.

8

IS THIS FEASIBLE?

• We won’t find out if we don’t try

• Dutch Royal Academy gave me €2 million to try

• European Union gave me €2.5 million to give it a shot

• So, we’re trying

9

10

IS RELIABILITY ACHIEVABLE AT ALL?

• Systems can survive hardware failures!– RAIDs can survive failed disks

– ECC memory can survive parity errors in memory

– TCP/IP can survive lost packets

– CD-ROM drives can correct many simultaneous errors

• We need to be able to survive software failures, too

11

A NEED TO RETHINK OPERATING SYSTEMS

• Operating systems research need to be refocused– We have powerful hardware on PC-class machines– Plenty of CPU cycles, RAM, bandwidth– Current software has tons of (useless) features– Consequently, the software is slow, bloated, and buggy

• To achieve the TV model, future OSes, must be– Small– Simple– Modular– Reliable– Secure– Self-healing

BRIEF HISTORY OF OUR WORK

• (1976) John Lions wrote a book on UNIX V6

• (1979) AT&T released V7 and forbade books on it

• (1985) I started to write a UNIX-like OS from scratch

• (1987) MINIX 1 + book for teaching OS classes released

• (1997) MINIX 2 (POSIX) & 2nd edition of book released

• (2000) MINIX 2 license changed to BSD

• (2004) MINIX 3: start of work making a reliable OS

• (2006) 3rd edition of book

• (2008) European grant

• (2010) Focus moved towards embedded systems

• (2013) MINIX 3.3.0 moves to NetBSD “compatibility”12

THREE EDITIONS OF THE BOOK

13

1 2 3

14

INTELLIGENT DESIGN

• Microkernel (13,000 LoC vs. > 15 million for Linux)– Bugs per 1000 LoC: Most S/W (1-10)

– MINIX 3 at least 13 kernel bugs; Linux has > 15,000

– Drivers have 3-7x more bugs than rest of kernel

– About 70% of the code is drivers

• Highly modular

• OS runs as multiple user-mode server processes

AS APPLIED TO OPERATING SYSTEMS

15

STEP 1: ISOLATE COMPONENTS

• Move all loadable modules out of the kernel– includes all device drivers, file systems, mem mgt, etc.

• Run each module as a separate process with POLA

(Principle Of Least Authority)

16

STEP 2: ISOLATE I/O

• Isolate I/O devices

• Limit access to I/O ports

• Constrain DMA (needs hardware assistance)

17

STEP 3: ISOLATE COMMUNICATION

• Limit interprocess communication

• Restrict IPC on a ‘need-to-communicate’ basis

• Restrict kernel calls on a per-component basis

• Make sure faulty receiver cannot hang sender

18

ARCHITECTURE OF MINIX 3

Microkernel handles interrupts, processes, scheduling, IPC

Clock

FS 1 FS 2 Proc. Other... ServersUsermode

Disk TTY Net Print Other... Drivers

Shell Make User...

Process

19

USER-MODE DEVICE DRIVERS

• Each driver runs as a user-mode process

• No superuser privileges

• Protected by the MMU

• Do not have access to I/O ports, privileged instrs

• They have to ask the microkernel

20

USER-MODE SERVERS

• Each server runs as a separate process• Some key servers

– Virtual file server (implements VFS)– Actual file servers (e.g., MINIX FS, EXT2, ISO 9660)– Process manager– Memory manager– Network server– Reincarnation server– Part of the scheduler

21

A SIMPLIFIED EXAMPLE: DOING A READ

Usermode Servers

Drivers

Users

Kernel

File access when the block is in the FS cache

1

2 3

4

User

Disk

FSFS

22

FILE SERVER (2)

File access when the block is NOT in the FS cache

ServersUsermode

Drivers

Users

Kernel

1

2

3

9

4

6

7,85

Notification

FS

User

Disk

23

REINCARNATION SERVER

• Parent of all the drivers and servers

• When a driver or server dies, RS collects it

• RS checks a table for action to take e.g., restart it

• RS also pings drivers and servers frequently

24

DISK DRIVER RECOVERY

ServersUsermode

Drivers

Users

Kernel

User1

FS

2

DiskdriverX 3. Crash!New

driver

4

RSRS

5

System is self healing—this is how we hope to make it reliable

25

KERNEL RELIABILITY/SECURITY

• Fewer LoC means fewer kernel bugs

• Small kernel (13,000 LoC) means reduced TCB

• NO foreign code (e.g., drivers) in the kernel

• Static data structures (no malloc in kernel)

• Moving bugs to user space puts them in cages

26

IPC RELIABILITY/SECURITY

• Fixed-length messages (no buffer overruns)• Rendezvous system was simple

– No lost messages– No buffer management

– We had to add asynchronous messages

• Interrupts and messages are unified

27

DRIVER RELIABILITY/SECURITY

• Untrusted code: heavily isolated• Bugs, viruses cannot spread to other modules• Cannot touch kernel data or other drivers/servers• Bad pointers, buffer overruns crash only one driver• Infinite loops detected and driver restarted• Access to I/O ports is restricted• Access to kernel calls is limited and checked• Restricted power to do damage (not superuser)

28

OTHER ADVANTAGES OF USER COMPONENTS

• Short development cycle

• Normal programming model

• No down time for crash and reboot

• Easy debugging

• Good flexibility (applies to drivers, file systems, etc.)

29

FAULT INJECTION EXPERIMENT

• We injected 800,000 faults into each of 3 drivers• Done on the binary drivers• Examples, change src addr, dest addr, loop condition• 100 faults were injected on each experiment• Waited 1 sec to see if the driver crashed• If no crash, inject another 100 faults and repeat• The driver crashed in 18,038 trials• The operating system NEVER crashed

30

PORT OF MINIX 3 TO ARM

• Restructured source tree for multiple architectures• Changed booting to support u-boot for ARM• Rewrote the low-level code dealing with hardware• Changed code for context switching, paging, etc.• Removed x86 segmentation code• Imported NetBSD ARM headers and libraries• Ported build.sh for cross-toolchain support• Wrote drivers for SD card and other Beagle devices

31

EMBEDDED SYSTEMS

9 cm

5 cm

BeagleBone Black

32

BBB CHARACTERISTICS

Item BeagleBone Black

CPU ARM v7

Clock 1 GHz

RAM 512 MB

Flash 4 GB

Video HDMI/1080p

GPIO pins 92

Ethernet 10/100 Mbps

USB 1

Open source/documentation Yes

Price (quantity 1) $55

I ADMIT I WAS WRONG

• On 29 Jan 1992 I posted to comp.os.minix this:

• “Don’t get me wrong, I am not unhappy with LINUX. It will get all the people who want to turn MINIX in BSD UNIX off my back.”

• I Apologize. Now I do want to turn MINIX into BSD. It just took me 20 years to realize this.

33

MINIX 3 MEETS BSD

34

+ =

BSD Daemon is copyright 1988 by Marshall Kirk McKusick and is used with permission.

OR MAYBE

35

WHY BSD?

• MINIX 3 didn’t have enough application software• BSD is a proven, portable, quality product• BSD has better code quality than Linux• Pkgsrc handles packages better than what we had• Thousands of excellent packages available• Active community• License compatibility• Why NetBSD?• Mostly due to its emphasis on portability

36

NETBSD FEATURES IN MINIX 3.3.0

• Clang/LLVM compiler• NetBSD build system (build.sh)• ELF file format• Source code tree modeled on NetBSD• Headers and libraries are from NetBSD• Pkgsrc works and builds >4200 NetBSD packages• Nevertheless, built on MINIX 3 kernel & servers• X11 coming back soon (almost there)

37

NETBSD FEATURES MISSING IN MINIX 3.3.0

• Kernel threads (we do have userland pthreads)

• Some system calls:– All _LWP*, MSG*, SEM* calls

– CLONE

– Some GET, IOCTL calls

– KQUEUE, KTRACE

– Job control

• Some weird socket options

• Nevertheless, we can build over 4200 packages

38

KYUA TESTS

39

Conclusion: 2139 out of 2651 passed (81%)

40

SYSTEM ARCHITECTURE

Servers

…Drivers

Microkernel (this is the only part running in kernel mode)

Net

VFS

TTYDisk USB

FS MM Reincarnat OS

(MINIX)

Clang Pkgsrc (libc) …Pkg 1 Pkg n

Users User- Land(NetBSD)

MINIX 3 ON THE THREE BEAGLE BOARDS

41

YOUR ROLE

• MINIX 3 is an open-source project

• I hope some of you will join and help us• Things to do

– Add crucial missing system calls– Port more packages (Java, a browser, etc.)– Write the missing drivers for the Beagle series– Get it running on Raspberry Pi & other platforms– Port Rump– Port required libraries and then port a GUI– Much more

• Get involved!

42

43

MINIX 3 IN A NUTSHELL

• Microkernel reimplementation of NetBSD• Fully open source with BSD license• Highly compatible with NetBSD• Supports both LLVM and gcc• Uses NetBSD pkgsrc• Over 4200 packages build• Live update coming soon• Good security by design (modularity, randomization)• Go get it at www.minix3.org and try it

44

POSITIONING OF MINIX

• Show that multiserver systems are – reliable

– secure

– practical

• Demonstrate that drivers belong in user mode

• High-reliability and fault-tolerant applications

• $50 single-chip, small-RAM laptops for 3rd world

• Embedded systems

45

MINIX 3 LOGO

• Why a raccoon?– Small– Cute– Clever– Agile– Eats bugs– More likely to visit your house than a penguin

46

WEBSITE: www.minix3.org

DOCUMENTATION IS IN A WIKI

• Wiki.minix3.org

• You can help

document the system

47

48

TRAFFIC TO WWW.MINIX3.ORG

Total visits to the main page since 2004: 2.9 millionTotal visits to the download page since 2004: 1.1 millionActual downloads since 2007: 600,000

Visits to download pageVisits to www.minix3.org

49

MINIX 3 GOOGLE NEWSGROUP

50

CONCLUSION

• Current OSes are bloated• MINIX 3 is an attempt at a reliable, secure OS• Kernel is very small (13,000 LoC)• OS runs as a collection of user processes• Each driver is a separate process• Each OS component has restricted privileges• Faulty drivers can be replaced automatically• Live update is possible (not in current release)• Uses NetBSD userland and packages

SURVEY

• Please download from www.minix3.org

• Give it a try

• Fill out the survey on the www.minix3.org page

• We had 14,000 downloads in Sept. but we don’t know who they are or what they are doing

• We are trying to build an active community

51

MASTERS DEGREE AT THE VU

• If you are interested in a Masters in systems

• Google me, go to my home page

• Click on the YouTube video there

52

53

THE END

54

WEBSITE: www.minix3.org

55

56

DISK PERFORMANCE

57

THE COST OF DRIVER RECOVERY

• We killed the Ethernet driver every t sec to simulate repeated driver crashes

Driver recovery takes about 360 msec

58

RESEARCH: LIVE UPDATE

• We can update almost the entire system on the fly

• How: create new process (driver or server)

• New one contacts old one to get the state

• After state is transferred, third process is created

• It runs the state transfer backwards as a check

• If it is OK, connections are rerouted to the new one

• Applications don’t even notice this has happened

59

MUCH BETTER THAN KSPLICE

• KSPLICE can handle only small security patches

• SPLICE patches the running process

• Over time, crud accumulates in the process

• If the update fails, there is no recovery

60

RESEARCH: MULTICORE CHIPS

• Network stack has components

• Chips may be heterogeneous

• Where to put each component?

• Experiments scaling frequencies

• Sometimes slower is faster!• Sleep/wakeup is expensive

TCP IP

Ether Kernel

Multicore chip

Core

61

RESEARCH: NEW FILE SYSTEM--LORIS

• Better reliabilty

• Better flexibility

• Handles heterogeneity better

• File rather than block oriented

• Uses checksums to detect corruption

VFS

Naming

Cache

Logical

Physical

Driver

Introduces concept of a logical file (1 or more phys files spread or striped over possiblyheterogeneous devices)

62

RESEARCH: FAULT INJECTION

Inject fault?

Originalunmodifiedbasic block

Basic blockwith faultinjected

This structure is created automatically by the LLVM compiler

63

NEW PROGRAM STRUCTURE