docker 原理與實作
DESCRIPTION
the technology behind docker. This is for osdc.tw 2014TRANSCRIPT
docker 原理與實作果凍
簡介
● 任職於迎廣科技○ python○ openstack
● http://about.me/ya790206● http://blog.blackwhite.tw/● https://github.com/ya790206/call_seq
Agenda
● linux kernel namespace● seccomp● cgroup● lxc● docker
docker
● lightweight, portable, self-sufficient containers.
● the process running in the container is isolated from the process running in the other container.
Linux startup process
● Linux startup process○ Boot loader -> ○ Kernel -> ○ Init process
● Difference between Linux distros:○ package manager○ init
Docker
Autofs lxc
Kernel namespaces
Apparmor and SELinux profiles
Seccomp policies
Control groups
Kernel capabilities Chroots
btrfs
kernel namespace
● The purpose of each namespace is to wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.
● Private view
kernel pid namespaceroot pid namespace
pid 1 (pid 1)
pid namespace x pid 2 (pid 2)
pid 3 (pid 1)
pid 4 (pid 2) ● black: the real pid.● red: the pid process use getpid
to get.
kernel namespace
Mount namespacesUTS namespacesPID namespaces Network namespacesUser namespaces IPC namespaces
int child_pid = clone(child_main, child_stack+STACK_SIZE, CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | SIGCHLD, NULL);
● https://gist.github.com/ya790206/9855021
尾巴沒藏好
int child_pid = clone(child_main, child_stack+STACK_SIZE, CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL);mount("proc", "/proc", "proc", 0, NULL);
● https://gist.github.com/ya790206/9855094
seccomp
● A process running in seccomp mode is severely limited in what it can do;
● there are only four system calls - read(), write(), exit(), and sigreturn() to already-open file descriptors.
libseccomp example
https://gist.github.com/ya790206/9579145
cgroup
● This work was started by engineers at Google
● Resource limiting● Prioritization● Accounting● Control
cgroup○ blkio — this subsystem sets limits on input/output access to and from block devices such as
physical drives (disk, solid state, USB, etc.).○ cpu — this subsystem uses the scheduler to provide cgroup tasks access to the CPU.○ cpuacct — this subsystem generates automatic reports on CPU resources used by tasks in a
cgroup.○ cpuset — this subsystem assigns individual CPUs (on a multicore system) and memory nodes to
tasks in a cgroup.○ devices — this subsystem allows or denies access to devices by tasks in a cgroup.○ freezer — this subsystem suspends or resumes tasks in a cgroup.○ memory — this subsystem sets limits on memory use by tasks in a cgroup, and generates
automatic reports on memory resources used by those tasks.○ net_cls — this subsystem tags network packets with a class identifier (classid) that allows the
Linux traffic controller (tc) to identify packets originating from a particular cgroup task.○ net_prio — this subsystem provides a way to dynamically set the priority of network traffic per
network interface.○ ns — the namespace subsystem.
cgroup freezer
● The cgroup freezer is useful to batch job management system which startand stop sets of tasks in order to schedule the resources of a machineaccording to the desires of a system administrator.
$ mount -t cgroup -ofreezer freezer /<path>/freezer
/<path>/freezer:root cgroup
tasks otherfile my
/<path>/freezer/my:sub cgroup
tasks otherfile
$ mkdir /<path>/freezer/my
all process
pid
cgroup freezer
$ mount -t cgroup -ofreezer freezer /<path>/freezer$ ch /<path>/freezer/; ls cgroup.clone_children cgroup.event_control cgroup.procs cgroup.sane_behavior notify_on_release release_agent tasks
1. mkdir my_group;cd mygroup2. echo $some_pid > tasks3. echo FROZEN > freezer.state4. echo THAWED > freezer.state
other cgroup
● memory cgroup:○ limit process memoroy usage.○ show various statistics
● blkio cgroup:○ change widget○ show various statistics
lxc
● LXC is a userspace interface for the Linux kernel containment features.
● Container templates● A set of standard tools to control the
containers
lxchost os
container A
process 1
process 2
container B
process 3
process 4
process x
A can see BA B A BA can see B.B can see A.
lxc
1. lxc-create -n test-container -t ubuntu2. lxc-ls --fancy3. lxc-start -n test-container4. lxc-console -n test-container5. lxc-stop -n test-container6. lxc-destroy -n test-container
start vs execute
● start:○ boot linux system
● execute:○ execute program directly○ make sure you have "/usr/lib/lxc/lxc-init" in your
container
sudo lxc-checkpoint -name p1 --statefile a● output:
○ lxc-checkpoint: 'checkpoint' function not implemented
linux aufs
● It allows files and directories of separate filesystem to co-exist under a single directories.
/tmp/union
/tmp/a /tmp/b /tmp/c
# apt-get install aufs-tools
# mount -t aufs -o br=/tmp/a:/tmp/b none /tmp/union/
# mount -t aufs -o br=/tmp/a=rw:/tmp/b=rw none /tmp/union
docker vs lxc
● docker is based on lxc● docker can create image from text file.● docker seldom boot system.● docker provide user-friendly interface● docker use less disk space.(aufs)
dockerrunning containers
process
rootfs
stopped containers
rootfs
image
commit
r
un
st
op
st
ar
t
rootfs
rootfs in container
image: rw
ZZZ image: ro
XXX image: ro
ubuntu image: ro
rootfs in image
image: ro
ZZZ image: ro
XXX image: ro
ubuntu image: ro
aufs
aufs
taiwan.py site dockerfile
FROM ubuntu:12.10
RUN apt-get update
RUN apt-get install -y python-dev
RUN apt-get install -y python-pip
RUN apt-get install -y git
RUN pip install mynt
RUN git clone https://github.com/lucemia/taiwan.py
RUN mynt gen -f taiwan.py/src/ taiwan.py/build/
EXPOSE 8000
CMD cd taiwan.py/build/ && python -m SimpleHTTPServer
How to run
1. cat dockerfile | sudo docker build -t taiwanpy -
2. docker run -p 8000:9000 taiwanpy3. docker stop xxx4. docker start xxx5. docker stop xxx6. docker rm xxx7. docker rmi taiwanpy
simple docker shell
● https://github.com/ya790206/misc_tools/tree/master/docker_wrapper
Summary
● Namespace for virtualization.● Cgroup for controlling a group of process.● Conatiner and host system use the same
kernel.● Docker is similar to lxc. But docker is easy
to use.
Question
Thank you
參考資料 - kernel namespace
● Namespaces in operation, part 1: namespaces overview
● PaaS under the hood, episode 1: kernel namespaces
● Introduction to Linux namespaces – Part 1: UTS
參考資料 - cgruop
● cgroup● http://en.wikipedia.
org/wiki/Cgroups
參考書目
● Linux Kernel Hacks:改善效能、提昇開發效率及節能的技巧與工具