lxc(linux container) lightweight virtual system mechanism gao feng [email protected] 1

26
LXC(Linux Container) Lightweight virtual system mechanism Gao feng [email protected] 1

Upload: casey-martens

Post on 14-Dec-2015

248 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

LXC(Linux Container)

Lightweight virtual system mechanismGao feng

[email protected]

1

Page 2: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Outline

IntroductionNamespaceSystem APILibvirt LXC

ComparisonProblemsFuture work

2

Page 3: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Introduction Container: Operation System Level virtualization

method for Linux

Kernel

P1

Guest1

P2

ContainerManagement

Tools

Namespace Set 1

P1

Guest2

P2

NamespaceSet 2

API/ABI

3

Page 4: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Introduction

Why ContainerBetter Performance

Easy to set up Multi-Tenancy environment

Kvm

Host-OS

Emulator-Lay

Guest-OS

App

Container

Host-OS

App

Page 5: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Namespace

Namespace isolates the resources of system, currently there are 6 kinds of namespaces in linux kernel.Mount namespaceUTS namespaceIPC namespaceNet namespacePid namespaceUser namespace

5

Page 6: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

P3P2P1

MountNamespace2

MountNamespace1

Mount Namespace

Each mount namespace has its own filesystem layout.

/proc/<p1>/mounts

/ /dev/sda1/home /dev/sda2

/proc/<p3>/mounts

/ /dev/sda3/boot /dev/sda4

6

/proc/<p2>/mounts

/ /dev/sda1/home /dev/sda2

Page 7: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

UTS Namespace

Every uts namespace has its own uts related information.

UTS namespace1

ostype: Linuxosrelease: 3.8.6version: …

hostname: uts1domainname: uts1

UTS namespace2

ostype: Linuxosrelease: 3.8.6version: …

hostname: uts2domainname: uts2

Unalterable

alterable

7

Page 8: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

P3P2P1 P4

IPCnamespace2

IPCnamespace1

IPC Namespace

IPC namespce isolates the interprocess communication resource(shared memory, semaphore, message queue)

8

Page 9: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Net Namespace

Net namespace isolates the networking related resources

Net Namespace1

Net devices: eth0IP address: 1.1.1.1/24RouteFirewall ruleSocketsProcsysfs…

Net Namespace2

Net devices: eth1IP address: 2.2.2.2/24RouteFirewall ruleSocketsProcsysfs…

9

Page 10: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

PID Namespace PID namespace isolates the Process ID, implemented as a

hierarchy.

PID namespace1 (Parent) (Level 0)

PID Namespace2 (Child)(Level 1)

PID Namespace3 (Child)(Level 1)

P2

pid:1

pid:2

P3

P4

ls /proc1 2 3 4

ls /proc1

ls /proc1

pid:4

P1

pid:1

pid:3

pid:1

10

Page 11: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

User Namespace

kuid/kgid: Original uid/gid, Globaluid/gid: user id in user namespace, will be

translated to kuid/kgid finallyOnly parent User NS has rights to set map

User namespace1

uid:10-14

uid_map10 2000 5

kuid: 2000-2004

User namespace2

uid:0-9

uid_map0 1000 10

kuid: 1000-1009

11

Page 12: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

User Namespace

Create and stat file in User namesapce

Usernamespace

root#touch

/file

Disk /file (kuid:1000)

uid_map:0 1000 10

root#stat /file

File : “/file”Access: uid (0/root)

12

Page 13: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

System API/ABI

Proc/proc/<pid>/ns/

System Callcloneunsharesetns

13

Page 14: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Proc

/proc/<pid>/ns/ipc: ipc namespace /proc/<pid>/ns/mnt: mount namespace /proc/<pid>/ns/net: net namespace /proc/<pid>/ns/pid: pid namespace /proc/<pid>/ns/uts: uts namespace /proc/<pid>/ns/user: user namespace

If the proc file of two processes is the same, these two processes must be in the same namespace.

14

Page 15: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

System Call

cloneint clone(int (*fn)(void *), void *child_stack,

int flags, void *arg, …);

6 new flags:

CLONE_NEWIPC,CLONE_NEWNET,CLONE_NEWNS,CLONE_NEWPID,

CLONE_NEWUTS,CLONE_NEWUSER

15

Page 16: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

System Call

clonecreate process2 and IPC namespace2

Mount1

P1 P2IPC2(new

created)

Others1

16

IPC1clone(,, CLONE_NEWIPC,)

Mount1

Others1

Page 17: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

System Call

unshareint unshare(int flags);

Namespace extends the system call unshare too. User space can use unshare to createnew namespace and the caller will run in this new created namespace.

17

Page 18: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

System Call

unshare create net namespace2

18

Mount1

P1 P1Net2(new

created)

Others1

Net1unshare(CLONE_NEWNET)

Mount1

Others1

Page 19: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

System Call

setnsint setns(int fd, int nstype);

setns is a new added system call for namespace.Process can use setns to set which namespace the process will belong to.

@fd: file descriptor of namespace(/proc/<pid>/ns/*)@nstype: type of namespace.

19

Page 20: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

System Call

setnsChange the PID namespace of P2

PID1P1

P2

PID2

setns(open(/proc/p1/ns/pid,) , 0)P2

20

PID1 P1

PID2

Page 21: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Libvirt LXC

Libvirt LXC: userspace container management tool, Implemented as one type of libvirt driver.Manage containersCreate namespaceCreate private filesystem layout for containerCreate devices for containerResources controller by cgroup

21

Page 22: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Comparison

The feature that host share the same kernel with guest makes container different from other virtualization method

22

Container KVM

performance Great Normal

OS support Linux Only No Limit

Security Normal Great

Completeness Low Great

Page 23: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Problems

/proc/meminfo, cpuinfo…Kernel space (relate to cgroup)User space (poor efficiency)

New namespaceAudit (assign to user namespace?)Syslog (do we really need it?)

23

Page 24: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Problems

Bandwidth controlTC Qdisc

On host (How to handle setting nic to container?)On container (user can change it)

NetfilterHow to control Ingress bandwidth

Disk quotaUid/Gid Quota (Many users )Project Quota (xfs only)

24

Page 25: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

Future Work

Improve Libvirt LXCUnchanged systemd in Libvirt LXCUse interface of systemd to set cgroupLibvirt LXC based DockerAudit namespace

25

Page 26: LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com 1

26

Thank you!Q&A