plan9 from bell labs meets tinycore linux

Plan 9 from Bell Labsmeets

TinyCore Linux

oracchaPlan9日記（http://d.hatena.ne.jp/oraccha/）

2010年8月22日　第五回カーネル／VM探検隊

Plan 9とは？こっち

Plan 9とは？• ベル研で開発された分散OS

• 現在のUNIXよりもUNIXらしいOS

• オープンソース化されて開発は継続

• 年１回のワークショップ開催

• 新しい開発者の獲得：GSoC 2010で６つのプロジェクトがaccept

最近、私の琴線に触れたもの

Plan 9 on SL-C3100http://server.hemiola.co.uk/zaurus.jpg

http://server.hemiola.co.uk/zaurus.jpg




GoogleのエンジニアはPlan 9を使っているのか？

• Plan 9開発者の多くはGoogleへ

• Rob Pike、Ken Thompson、Dave Presotto、Russ Cox、...

• もう数年間「リアルな」Plan 9は使ってない。今はMac OS X上でPlan9portだよ。

• デスクトップでは9term、acme、sam

• ファイルサーバとしてventiも

UNIXで動くPlan 9環境• Plan9port (a.k.a. Plan 9 from User Space)

• http://swtch.com/plan9port/

• 9vx (a.k.a. Plan 9 Virtual eXecutable)

• http://swtch.com/9vx/

どちらもメインの開発はRuss Cox氏

http://swtch.com/9vx

http://swtch.com/9vx

Plan9port

• Plan 9コマンドセットの移植

• ほぼユーザランドでOS相当機能を実装

• v9fs（Linuxの9P2000 FS実装）と連携

• おまけ

• NetBSD/evbarm（LinkStation）に移植

• MINIX3は未完。Pthread周りに問題

rio (Xnest)

acme

Plan9port on MINIX 3

9vx（１）• Plan 9 a.out実行形式ファイルの軽量なバイナリエミュレータ

• vx32によるネイティブバイナリ・サンドボックス（c.f., Google NaCl）

• セグメンテーション＋動的コード変換（データ）（コード）

9vx（２）• Plan 9カーネルはユーザランドで実行

• 割込み・例外はシグナルに

• ページフォルト→SIGSEGV

• タイマ割込み→SIGALRM

• デバイスドライバはホストOS

• #A (audio), #i (draw), #I (inet), #m (mouse), #Z (local filesystem)

9vxの構成Plan 9a.out

Plan 9a.out

Plan 9a.out

vx32 sandbox library

Modified Plan 9 kernel

Host OS(Linux, *BSD, MacOS X)

1プロセス

vx32のアドレス空間

• Guest Data Segment

• Guest Control Segment

• vx32 sandbox library

separate signal stack, passing vx32 the full saved registerstate when such a signal occurs. Again, all widely-usedx86 operating systems have this capability.Finally, vx32 can benefit from being able to map disk

files into the host application’s address space and tocontrol the read/write/execute permissions on individualpages in the mapping. Although these features are notstrictly required by vx32, they are, once again, providedby all widely-used x86 operating systems.On modern Unix variants such as Linux, FreeBSD,

and OS X, specific system calls satisfying the above re-quirements are modify_ldt/i386_set_ldt, sigaction,sigaltstack, mmap, and mprotect. Windows NT, 2000,and XP support equivalent system calls, though we havenot ported vx32 to Windows. We have not examinedwhether Windows Vista retains this functionality.Guest code. Although vx32 uses x86 segmentation

for data sandboxing, it assumes that guest code runningin the sandbox conforms to the 32-bit “flat model” andmakes no explicit reference to segment registers. In fact,vx32 rewrites any guest instructions referring to segmentregisters so that they raise a virtual illegal instructionexception. This “flat model” assumption is reasonablefor practically all modern, compiled 32-bit x86 code; itwould typically be a problem only if, for example, thesandboxed guest wished to run 16-bit DOS or Windowscode or wished to run a nested instance of vx32 itself.Some modern multithreading libraries use segment

registers to provide quick access to thread-local storage(TLS); such libraries cannot be used in guest code underthe current version of vx32, but this is not a fundamentallimitation of the approach. Vx32 could be enhanced toallow guest code to create new segments using emulationtechniques, perhaps at some performance cost.Host applications may impose further restrictions on

guest code through configuration flags that direct vx32 toreject specific classes of instructions. For example, forconsistent behavior across processor implementations,the VXA archiver described in Section 5.1 disallows thenon-deterministic 387 floating-point instructions, forcingapplications to use deterministic SSE-based equivalents.

3.2 Data sandboxing: segmentationIn the x86 architecture, segmentation is an address trans-lation step that the processor applies immediately beforepage translation. In addition to the eight general-purposeregisters (GPRs) accessible in user mode, the processorprovides six segment registers. During any memory ac-cess, the processor uses the value in one of these seg-ment registers as an index into one of two segment trans-lation tables, the global descriptor table (GDT) or lo-cal descriptor table (LDT). The GDT traditionally de-scribes segments shared by all processes, while the LDTcontains segments specific to a particular process. Upon

Figure 1: Guest and Host Address Space Structure

finding the appropriate descriptor table entry, the proces-sor checks permission bits (read, write, and execute) andcompares the virtual address of the requested memoryaccess against the segment limit in the descriptor table,throwing an exception if any of these checks fail. Fi-nally, the processor adds the segment base to the virtualaddress to form the linear address that it subsequentlyuses for page translation. Thus, a normal segment withbase b and limit l permits memory accesses at virtual ad-dresses between 0 and l, and maps these virtual addressesto linear addresses from b to b+l. Today’s x86 operatingsystems typically make segmentation translation a no-opby using a base of 0 and a limit of 232!1. Even in this so-called “flat model,” the processor continues to performsegmentation translation: it cannot be disabled.Vx32 allocates two segments in the host application’s

LDT for each guest instance: a guest data segment and aguest control segment, as depicted in Figure 1.The guest data segment corresponds exactly to the

guest instance’s address space: the segment base pointsto the beginning of the address space (address 0 in theguest instance), and the segment size is the guest’s ad-dress space size. Vx32 executes guest code with theprocessor’s ds, es, and ss registers holding the selec-

Bryan Ford and Russ Cox, “Vx32: Lightweight User-level Sandboxing on the x86,” USENIX 2008 (Best student paper award)

システムコールの模擬Plan 9a.out

Plan 9a.out

Plan 9a.out

vx32 sandbox library

Modified Plan 9 kernel

Host OS(Linux, *BSD, MacOS X)

1プロセスint 0x64

virtualtrap

動的コード変換例08048160 int 0x64

b7d8d0f9 mov ebx, fs:[0x2c]b7d8d100 mov fs:[0x20], eaxb7d8d106 mov eax, 0x264 ; 0x264 = int 0x64b7d8d10b mov fs:[0x40], 0x8048162b7d8d116 jmb vxrun_gentrap

ソフトウェア割込み（システムコール）の場合

(fragment index table)

code fragment cache

(code fragment)

Endpoint hash table

Fixed execution state, register save area

Guest Control Segment

unsafeなコードを変換して、GCSのキャッシュ領域に格納

ソースコード変換• kenccとgccの差を吸収するために

edスクリプトを駆使！

例えば、匿名フィールドの変換,s!Lock;!Lock lk;!g

struct{

Lock;int fid;Chan *free;Chan *list;

}chanalloc;

struct{

Lock lk;int fid;Chan *free;Chan *list;

}chanalloc;

gccgoにはkencc拡張が入っているらしいhttp://gcc.gnu.org/ml/gcc-patches/2009-04/msg00727.html

http://gcc.gnu.org/ml/gcc-patches/2009-04/msg00727.html

http://gcc.gnu.org/ml/gcc-patches/2009-04/msg00727.html

ベンチマーク

zlib bz2 jpeg jp2 vorbis flac0

1

Core

2 D

uo, O

S X

0.9

9 Pe

ntiu

m M

, Lin

ux 0

.95

Pent

ium

4, L

inux

1.0

0 Xe

on, L

inux

1.0

0 At

hlon

64 x

86-3

2, L

inux

1.0

8 O

pter

on x

86-3

2, L

inux

1.0

6

Core

2 D

uo, O

S X

0.9

4 Pe

ntiu

m M

, Lin

ux 0

.97

Pent

ium

4, L

inux

1.0

0 Xe

on, L

inux

1.0

0 At

hlon

64 x

86-3

2, L

inux

1.0

6 O

pter

on x

86-3

2, L

inux

1.0

4

Core

2 D

uo, O

S X

0.7

1 Pe

ntiu

m M

, Lin

ux 0

.73

Pent

ium

4, L

inux

0.6

8 Xe

on, L

inux

0.7

5 At

hlon

64 x

86-3

2, L

inux

0.9

1 O

pter

on x

86-3

2, L

inux

0.8

9

Core

2 D

uo, O

S X

1.0

7 Pe

ntiu

m M

, Lin

ux 1

.22

Pent

ium

4, L

inux

1.1

8 Xe

on, L

inux

1.1

0 At

hlon

64 x

86-3

2, L

inux

1.2

8 O

pter

on x

86-3

2, L

inux

1.2

7

Core

2 D

uo, O

S X

1.2

1 Pe

ntiu

m M

, Lin

ux 0

.92

Pent

ium

4, L

inux

1.0

2 Xe

on, L

inux

0.9

8 At

hlon

64 x

86-3

2, L

inux

1.0

2 O

pter

on x

86-3

2, L

inux

0.9

7

Core

2 D

uo, O

S X

0.9

9 Pe

ntiu

m M

, Lin

ux 0

.92

Pent

ium

4, L

inux

1.1

6 Xe

on, L

inux

1.0

3 At

hlon

64 x

86-3

2, L

inux

1.1

3 O

pter

on x

86-3

2, L

inux

1.0

9

Figure 9: Normalized run times for VXA decoders running under vx32. Each bar plots run time using vx32 divided by run timefor the same benchmark running natively (smaller bars mark faster vx32 runs). Section 5.1 gives more details. The jpeg test runsfaster because the vx32 translation has better cache locality than the original code.

md5 sha1 sha512 ripemd whirlpool0

1

Core

2 D

uo, O

S X

0.9

2 Pe

ntiu

m M

, Lin

ux 1

.11

Pent

ium

4, L

inux

1.2

3 Xe

on, L

inux

1.0

8 At

hlon

64 x

86-3

2, L

inux

1.1

8 O

pter

on x

86-3

2, L

inux

1.1

7

Core

2 D

uo, O

S X

1.0

3 Pe

ntiu

m M

, Lin

ux 1

.14

Pent

ium

4, L

inux

1.0

8 Xe

on, L

inux

1.0

4 At

hlon

64 x

86-3

2, L

inux

1.1

5 O

pter

on x

86-3

2, L

inux

1.0

7

Core

2 D

uo, O

S X

0.8

5 Pe

ntiu

m M

, Lin

ux 1

.02

Pent

ium

4, L

inux

1.0

6 Xe

on, L

inux

1.0

4 At

hlon

64 x

86-3

2, L

inux

1.1

1 O

pter

on x

86-3

2, L

inux

1.1

4

Core

2 D

uo, O

S X

0.9

8 Pe

ntiu

m M

, Lin

ux 1

.07

Pent

ium

4, L

inux

1.0

7 Xe

on, L

inux

1.0

3 At

hlon

64 x

86-3

2, L

inux

1.1

1 O

pter

on x

86-3

2, L

inux

1.1

1

Core

2 D

uo, O

S X

0.7

4 Pe

ntiu

m M

, Lin

ux 1

.03

Pent

ium

4, L

inux

1.2

1 Xe

on, L

inux

1.1

0 At

hlon

64 x

86-3

2, L

inux

1.1

6 O

pter

on x

86-3

2, L

inux

1.1

7

Figure 10: Normalized run times for cryptographic hash functions running under vx32. Each bar plots run time using vx32 dividedby run time for the same benchmark running natively (smaller bars mark faster runs).

syscall pipe-byte pipe-bulk rdwr sha1zero du mk0

1

2

3

4

nativ

evx

32 1

.69

VMwa

re4.

8 Q

EMU

23

nativ

evx

32 2

.7

VMwa

re 3

.8

QEM

U21

nativ

evx

32 2

.5

VMwa

re 2

.8

QEM

U22

nativ

evx

32 0

.93

VMwa

re 2

.6

QEM

U18

nativ

evx

32 1

.00

VMwa

re 1

.90

QEM

U 1

.90

nativ

evx

32 0

.57

VMwa

re 2

.7

QEM

U9.

1

nativ

evx

32 0

.63

VMwa

re 1

.32

QEM

U 3

.9

Figure 11: Normalized run times for simple Plan 9 benchmarks. The four bars correspond to Plan 9 running natively, Plan 9 VX,Plan 9 under VMware Workstation 6.0.2 on Linux, and Plan 9 under QEMU on Linux using the kqemu kernel extension. Eachbar plots run time divided by the native Plan 9 run time (smaller bars mark faster runs). The tests are: swtch, a system call thatreschedules the current process, causing a context switch (sleep(0)); pipe-byte, two processes sending a single byte back and forthover a pair of pipes; pipe-bulk, two processes (one sender, one receiver) transferring bulk data over a pipe; rdwr, a single processcopying from /dev/zero to /dev/null; sha1zero, a single process reading /dev/zero and computing its SHA1 hash; du, a singleprocess traversing the file system; and mk, building a Plan 9 kernel. See Section 5.3 for performance explanations.

zlib bz2 jpeg jp2 vorbis flac0

1

Core

2 D

uo, O

S X

0.9

9 Pe

ntiu

m M

, Lin

ux 0

.95

Pent

ium

4, L

inux

1.0

0 Xe

on, L

inux

1.0

0 At

hlon

64 x

86-3

2, L

inux

1.0

8 O

pter

on x

86-3

2, L

inux

1.0

6

Core

2 D

uo, O

S X

0.9

4 Pe

ntiu

m M

, Lin

ux 0

.97

Pent

ium

4, L

inux

1.0

0 Xe

on, L

inux

1.0

0 At

hlon

64 x

86-3

2, L

inux

1.0

6 O

pter

on x

86-3

2, L

inux

1.0

4

Core

2 D

uo, O

S X

0.7

1 Pe

ntiu

m M

, Lin

ux 0

.73

Pent

ium

4, L

inux

0.6

8 Xe

on, L

inux

0.7

5 At

hlon

64 x

86-3

2, L

inux

0.9

1 O

pter

on x

86-3

2, L

inux

0.8

9

Core

2 D

uo, O

S X

1.0

7 Pe

ntiu

m M

, Lin

ux 1

.22

Pent

ium

4, L

inux

1.1

8 Xe

on, L

inux

1.1

0 At

hlon

64 x

86-3

2, L

inux

1.2

8 O

pter

on x

86-3

2, L

inux

1.2

7

Core

2 D

uo, O

S X

1.2

1 Pe

ntiu

m M

, Lin

ux 0

.92

Pent

ium

4, L

inux

1.0

2 Xe

on, L

inux

0.9

8 At

hlon

64 x

86-3

2, L

inux

1.0

2 O

pter

on x

86-3

2, L

inux

0.9

7

Core

2 D

uo, O

S X

0.9

9 Pe

ntiu

m M

, Lin

ux 0

.92

Pent

ium

4, L

inux

1.1

6 Xe

on, L

inux

1.0

3 At

hlon

64 x

86-3

2, L

inux

1.1

3 O

pter

on x

86-3

2, L

inux

1.0

9

Figure 9: Normalized run times for VXA decoders running under vx32. Each bar plots run time using vx32 divided by run timefor the same benchmark running natively (smaller bars mark faster vx32 runs). Section 5.1 gives more details. The jpeg test runsfaster because the vx32 translation has better cache locality than the original code.

md5 sha1 sha512 ripemd whirlpool0

1

Core

2 D

uo, O

S X

0.9

2 Pe

ntiu

m M

, Lin

ux 1

.11

Pent

ium

4, L

inux

1.2

3 Xe

on, L

inux

1.0

8 At

hlon

64 x

86-3

2, L

inux

1.1

8 O

pter

on x

86-3

2, L

inux

1.1

7

Core

2 D

uo, O

S X

1.0

3 Pe

ntiu

m M

, Lin

ux 1

.14

Pent

ium

4, L

inux

1.0

8 Xe

on, L

inux

1.0

4 At

hlon

64 x

86-3

2, L

inux

1.1

5 O

pter

on x

86-3

2, L

inux

1.0

7

Core

2 D

uo, O

S X

0.8

5 Pe

ntiu

m M

, Lin

ux 1

.02

Pent

ium

4, L

inux

1.0

6 Xe

on, L

inux

1.0

4 At

hlon

64 x

86-3

2, L

inux

1.1

1 O

pter

on x

86-3

2, L

inux

1.1

4

Core

2 D

uo, O

S X

0.9

8 Pe

ntiu

m M

, Lin

ux 1

.07

Pent

ium

4, L

inux

1.0

7 Xe

on, L

inux

1.0

3 At

hlon

64 x

86-3

2, L

inux

1.1

1 O

pter

on x

86-3

2, L

inux

1.1

1

Core

2 D

uo, O

S X

0.7

4 Pe

ntiu

m M

, Lin

ux 1

.03

Pent

ium

4, L

inux

1.2

1 Xe

on, L

inux

1.1

0 At

hlon

64 x

86-3

2, L

inux

1.1

6 O

pter

on x

86-3

2, L

inux

1.1

7

Figure 10: Normalized run times for cryptographic hash functions running under vx32. Each bar plots run time using vx32 dividedby run time for the same benchmark running natively (smaller bars mark faster runs).

syscall pipe-byte pipe-bulk rdwr sha1zero du mk0

1

2

3

4

nativ

evx

32 1

.69

VMwa

re4.

8 Q

EMU

23

nativ

evx

32 2

.7

VMwa

re 3

.8

QEM

U21

nativ

evx

32 2

.5

VMwa

re 2

.8

QEM

U22

nativ

evx

32 0

.93

VMwa

re 2

.6

QEM

U18

nativ

evx

32 1

.00

VMwa

re 1

.90

QEM

U 1

.90

nativ

evx

32 0

.57

VMwa

re 2

.7

QEM

U9.

1

nativ

evx

32 0

.63

VMwa

re 1

.32

QEM

U 3

.9

Figure 11: Normalized run times for simple Plan 9 benchmarks. The four bars correspond to Plan 9 running natively, Plan 9 VX,Plan 9 under VMware Workstation 6.0.2 on Linux, and Plan 9 under QEMU on Linux using the kqemu kernel extension. Eachbar plots run time divided by the native Plan 9 run time (smaller bars mark faster runs). The tests are: swtch, a system call thatreschedules the current process, causing a context switch (sleep(0)); pipe-byte, two processes sending a single byte back and forthover a pair of pipes; pipe-bulk, two processes (one sender, one receiver) transferring bulk data over a pipe; rdwr, a single processcopying from /dev/zero to /dev/null; sha1zero, a single process reading /dev/zero and computing its SHA1 hash; du, a singleprocess traversing the file system; and mk, building a Plan 9 kernel. See Section 5.3 for performance explanations.

VMWareやQEMUと比較して圧倒的な速さ！

Tiny Core Linux（１）• DawmSmallLinux開発者のRobert

Shingledecker氏が中心に開発

• 極小Linuxディストリ（GUI込みで10 MB）

• BusyBox、Tiny X、FLTK

• 最新版は3.0（7/31リリース）

• ただし今日は2.6.11で失礼

Tiny Core Linux（２）• エクステンションをインストールすることでシステム拡張が可能

• エクステンション＝ squashfs形式の圧縮ファイル

1. tmpfsにダウンロード

2. ループバックマウント

3. /usr/local以下からシンボリックリンク

tvx: 9vx package

• Ron Minnich氏がはじめたプロジェクトをJohn (EBo) David氏がパッケージ化

• 9vxをパッケージ化している奇特なディストリビューションはTCLぐらい？

• Appbrowserからインストールして、wbarのアイコンをクリックするだけ！

デモ：起動画面

デモ：Appbrowser起動“tvx”で検索

インストール

デモ：tvx起動

デモ：9vx起動

本日の提案：余ったUSBメモリに「TinyCore Linux + 9vx」おひとつ、いかがでしょうか？

Plan9日記（http://d.hatena.ne.jp/oraccha/）

plan9 from bell labs meets tinycore linux

Technology

vx plan

google plan

plan9port plan

simple plan

vx32 time

linux pentium

linux opteron x86

linux athlon64 x86