itFinish & publish container blogpost - monochromatic - monochromatic blog: http://blog.z3bra.org Err z3bra.org 70 hgit clone git://z3bra.org/monochromatic URL:git://z3bra.org/monochromatic z3bra.org 70 1Log /scm/monochromatic/log.gph z3bra.org 70 1Files /scm/monochromatic/files.gph z3bra.org 70 1Refs /scm/monochromatic/refs.gph z3bra.org 70 i--- Err z3bra.org 70 1commit ed6b12aac58c951e9924bdd1325c2fc9ea68a0b3 /scm/monochromatic/commit/ed6b12aac58c951e9924bdd1325c2fc9ea68a0b3.gph z3bra.org 70 1parent 076c73eb2cf52b5b1fdac70165a64c1566c4b053 /scm/monochromatic/commit/076c73eb2cf52b5b1fdac70165a64c1566c4b053.gph z3bra.org 70 hAuthor: z3bra URL:mailto:willyatmailoodotorg z3bra.org 70 iDate: Thu, 24 Mar 2016 22:08:53 +0000 Err z3bra.org 70 i Err z3bra.org 70 iFinish & publish container blogpost Err z3bra.org 70 i Err z3bra.org 70 iDiffstat: Err z3bra.org 70 i M 2016/03/hand-crafted-containers.txt | 248 +++++++++++++++++++++++++++++-- Err z3bra.org 70 i M config.mk | 2 +- Err z3bra.org 70 i M css/monochrome.css | 22 ++++------------------ Err z3bra.org 70 i M index.txt | 1 + Err z3bra.org 70 i Err z3bra.org 70 i4 files changed, 242 insertions(+), 31 deletions(-) Err z3bra.org 70 i--- Err z3bra.org 70 1diff --git a/2016/03/hand-crafted-containers.txt b/2016/03/hand-crafted-containers.txt /scm/monochromatic/file/2016/03/hand-crafted-containers.txt.gph z3bra.org 70 it@@ -1,7 +1,16 @@ Err z3bra.org 70 i-# [Hand-made containers](#) Err z3bra.org 70 i+# [Hand-crafted containers](#) Err z3bra.org 70 i ## — 18 March, 2016 Err z3bra.org 70 i Err z3bra.org 70 i-### 0. intro Err z3bra.org 70 i+### tl;dr Err z3bra.org 70 i+ Err z3bra.org 70 i+ # CTNAME=blah Err z3bra.org 70 i+ # mkdir -p /ns/$CTNAME/bin /ns/$CTNAME/lib Err z3bra.org 70 i+ # ldd /bin/echo | grep '/' | cut -d'>' -f2 | awk '{print $1}' | xargs -I% cp % /ns/$CTNAME/lib/ Err z3bra.org 70 i+ # cp /bin/echo /ns/$CTNAME/bin/ Err z3bra.org 70 i+ # ip netns add $CTNAME Err z3bra.org 70 i+ # ip netns exec $CTNAME unshare -fpium --mount-proc env -i container=handcraft chroot /ns/$CTNAME /bin/echo 'Hello, world!' Err z3bra.org 70 i+ Err z3bra.org 70 i+### 0. Intro Err z3bra.org 70 i Err z3bra.org 70 i Containers are the latest trend, for a good reason: they leave room for new Err z3bra.org 70 i ideas in terms of security, flexibility, performance and much more. Err z3bra.org 70 it@@ -23,7 +32,7 @@ an application (a complex one). In this regard, there is only a single type of Err z3bra.org 70 i containers. Err z3bra.org 70 i We can now focus on what's really important, how do they work? Err z3bra.org 70 i Err z3bra.org 70 i-### 1. namespaces Err z3bra.org 70 i+### 1. Namespaces Err z3bra.org 70 i Err z3bra.org 70 i That's a keyword, so let's ask our internet god what it means: Err z3bra.org 70 i Err z3bra.org 70 it@@ -37,7 +46,7 @@ to a process. Err z3bra.org 70 i When a namespace is created for a process, all its children will be created Err z3bra.org 70 i within this namespace, and inherit the "limitations" of the parent. Err z3bra.org 70 i Err z3bra.org 70 i-#### mount Err z3bra.org 70 i+#### Mount Err z3bra.org 70 i The process will be able to mount and unmount filesystems without affecting Err z3bra.org 70 i the rest of the system. For example, if you unmount a partition within the Err z3bra.org 70 i namespace, all the processes within it will see it as unmounted, while it Err z3bra.org 70 it@@ -52,7 +61,7 @@ This namespace concern shared memory, System V message queues and sempaphores. Err z3bra.org 70 i Processes in the namespace will be unable to communicate with the host's Err z3bra.org 70 i processes this way. Err z3bra.org 70 i Err z3bra.org 70 i-#### network Err z3bra.org 70 i+#### Network Err z3bra.org 70 i Processes will have their own network stack. This includes the routing table, Err z3bra.org 70 i firewall rules, sockets, and so on. Err z3bra.org 70 i Err z3bra.org 70 it@@ -60,16 +69,231 @@ firewall rules, sockets, and so on. Err z3bra.org 70 i Processes' IDs will get a different mapping that they have on the host. They Err z3bra.org 70 i will get renumbered, starting from 1. Err z3bra.org 70 i Err z3bra.org 70 i-#### user Err z3bra.org 70 i+#### User Err z3bra.org 70 i The namespaces will have their own set of user and group IDs. Err z3bra.org 70 i Err z3bra.org 70 i-### 2. making containers Err z3bra.org 70 i+### 2. Making containers Err z3bra.org 70 i Err z3bra.org 70 i Now that we know what containers are and how they work, it's time to make Err z3bra.org 70 i-some! Err z3bra.org 70 i+one! Err z3bra.org 70 i+For the purpose of this article, we will try an build the simplest container Err z3bra.org 70 i+capable of printing "Hello, world!". Err z3bra.org 70 i+ Err z3bra.org 70 i+Here is the program: Err z3bra.org 70 i+ Err z3bra.org 70 i+ $ more < hello.c Err z3bra.org 70 i+ #include Err z3bra.org 70 i+ int Err z3bra.org 70 i+ main(int argc, char **argv) Err z3bra.org 70 i+ { Err z3bra.org 70 i+ write(1, "Hello, world!\n", 14); Err z3bra.org 70 i+ return 0; Err z3bra.org 70 i+ } Err z3bra.org 70 i+ EOF Err z3bra.org 70 i+ $ cc hello.c -o hello Err z3bra.org 70 i+ Err z3bra.org 70 i+#### 2.0 `chroot(1)` Err z3bra.org 70 i+This one is an old tool that will run a command or spawn an interactive Err z3bra.org 70 i+shell after changing the root directory. Err z3bra.org 70 i+It is used to isolate a process, or group of processes from the host's Err z3bra.org 70 i+filesystem tree. This has long be used for security purposes Err z3bra.org 70 i+(see [chroot jail](https://en.wikipedia.org/wiki/Chroot)), but escaping from Err z3bra.org 70 i+chroot is rather easy for someone with root (UID 0) access. Err z3bra.org 70 i+This is why `chroot` alone cannot be considered secure, but coupled with user Err z3bra.org 70 i+namespace and privilege dropping, one can turn a chroot in a real jail. Err z3bra.org 70 i+ Err z3bra.org 70 i+Back to the topic. Let's copy our `hello` binary into the chroot, and try to Err z3bra.org 70 i+run it: Err z3bra.org 70 i+ Err z3bra.org 70 i+ $ mkdir rootfs Err z3bra.org 70 i+ $ cp ./hello ./rootfs/hello Err z3bra.org 70 i+ # chroot ./rootfs ./hello Err z3bra.org 70 i+ chroot: failed to run command "./hello": No such file or directory Err z3bra.org 70 i+ Err z3bra.org 70 i+This is the worst error message you can get. Of course `./hello` exists! Err z3bra.org 70 i+We just copied it. But what does this error mean then? Let's take a closer Err z3bra.org 70 i+look at this binary: Err z3bra.org 70 i+ Err z3bra.org 70 i+ $ file ./hello Err z3bra.org 70 i+ ./hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-x86-64.so.2, for GNU/Linux 3.12.0, not stripped Err z3bra.org 70 i+ Err z3bra.org 70 i+The output may differ slightly depending on your system, but the important Err z3bra.org 70 i+part here is the following: Err z3bra.org 70 i+ Err z3bra.org 70 i+> dynamically linked, interpreter /lib/ld-linux-x86-64.so.2 Err z3bra.org 70 i+ Err z3bra.org 70 i+Dynamically linked binaries cannot be run on their own. Long story short, Err z3bra.org 70 i+`/lib/ld-linux-x86-64.so.2` is a program that is implicitely called to run all Err z3bra.org 70 i+the dynamic binaries on a linux system, it's called the Err z3bra.org 70 i+[linker](https://en.wikipedia.org/wiki/Dynamic_linker). So in order to have a Err z3bra.org 70 i+binary run in the chroot, you need to copy over the linker AND all the libraries Err z3bra.org 70 i+your binary links to. To get a list of these libraries, use the `ldd` command: Err z3bra.org 70 i+ Err z3bra.org 70 i+ $ ldd hello Err z3bra.org 70 i+ linux-vdso.so.1 (0x00007ffd3e7dc000) Err z3bra.org 70 i+ libc.so.6 => /lib/libc.so.6 (0x00007fdc1a482000) Err z3bra.org 70 i+ /lib/ld-linux-x86-64.so.2 (0x00007fdc1a82a000) Err z3bra.org 70 i+ Err z3bra.org 70 i+You can ignore the [`vdso`](http://man7.org/linux/man-pages/man7/vdso.7.html) Err z3bra.org 70 i+line as it's handled by the C library. Err z3bra.org 70 i+Our `hello` binary depends on two files: `/lib/ld-linux-x86-64.so.2`, the linker, Err z3bra.org 70 i+and `/lib/libc.so.6`, the C library (containing system calls like `write(2)`). Err z3bra.org 70 i+ Err z3bra.org 70 i+In order to run our `hello` program, we'll have to copy them over in place. After Err z3bra.org 70 i+that, our program should run totally fine: Err z3bra.org 70 i+ Err z3bra.org 70 i+ $ mkdir -p rootfs/lib Err z3bra.org 70 i+ $ cp /lib/ld-linux-x86-64.so.2 /lib/libc.so.6 ./rootfs/lib Err z3bra.org 70 i+ # chroot ./rootfs ./hello Err z3bra.org 70 i+ Hello, world! Err z3bra.org 70 i+ Err z3bra.org 70 i+TADAAAA!! That was easy right? Err z3bra.org 70 i+Another option is to simply compile our program *statically*. It means that all the Err z3bra.org 70 i+needed objects from libraries will be compiled into the program, removing the need Err z3bra.org 70 i+for a linker and libc in the chroot: Err z3bra.org 70 i+ Err z3bra.org 70 i+ $ mkdir rootfs Err z3bra.org 70 i+ $ cc hello.c -o hello -static -s Err z3bra.org 70 i+ $ cp hello ./rootfs Err z3bra.org 70 i+ # chroot ./rootfs ./hello Err z3bra.org 70 i+ Hello, world! Err z3bra.org 70 i+ Err z3bra.org 70 i+Let's take a look at the size of this "container". For scale, the Err z3bra.org 70 i+"[Smallest possible docker container](https://docs.docker.com/articles/baseimages/#creating-a-simple-base-image-using-scratch)" Err z3bra.org 70 i+weights 3.6Mib... Err z3bra.org 70 i+ Err z3bra.org 70 i+ $ du -sh rootfs Err z3bra.org 70 i+ 720K rootfs Err z3bra.org 70 i+ Err z3bra.org 70 i+That's most likely the lightest container you've seen, right? Err z3bra.org 70 i+ Err z3bra.org 70 i+#### 2.1 env Err z3bra.org 70 i+To isolate our process from the host, we'll have to clean all the environment Err z3bra.org 70 i+from all its variables, to make sure the container won't know anything about its Err z3bra.org 70 i+host. We can do this with the `env` command: Err z3bra.org 70 i+ Err z3bra.org 70 i+ $ export FOO="bar" Err z3bra.org 70 i+ $ env -i /bin/sh Err z3bra.org 70 i+ $ env # we are now in a subshell Err z3bra.org 70 i+ PWD=/home/z3bra Err z3bra.org 70 i+ Err z3bra.org 70 i+You can see that the subprocess doesn't have the `$FOO` variable in its Err z3bra.org 70 i+environment, even though it has been exported earlier. Err z3bra.org 70 i+You can set the environment by passing variables AFTER the `env -i` command, Err z3bra.org 70 i+this is useful to set the `$container` variable which has been "standardized" as Err z3bra.org 70 i+a way to tell processes they are running inside a container. Err z3bra.org 70 i+ Err z3bra.org 70 i+We now have a way to isolate our `hello` process from the host's environment. Err z3bra.org 70 i+ Err z3bra.org 70 i+ # env -i container="handcraft" chroot ./rootfs ./hello Err z3bra.org 70 i+ Err z3bra.org 70 i+#### 2.2 `unshare(1)` Err z3bra.org 70 i+This tool is the one that will actually isolate containers. It has been created Err z3bra.org 70 i+especially for this purpose, and will let you run a process unshared from Err z3bra.org 70 i+different namespaces: mount, user, network, PID, IPC and UTS. Err z3bra.org 70 i+In the same order, each flag will separate your `command` from the given Err z3bra.org 70 i+namespace. See `unshare(1)` for more informations: Err z3bra.org 70 i+ Err z3bra.org 70 i+ unshare -m -U -n -p -i -u Err z3bra.org 70 i+ Err z3bra.org 70 i+We can actually leave the `-n` flag untouched, as some tools provide a better Err z3bra.org 70 i+approach to network isolation (see `ip-netns(1)`, described later in this post). Err z3bra.org 70 i+ Err z3bra.org 70 i+Another point worth mentionning is that if you want to isolate the process from Err z3bra.org 70 i+the PID namespace, you should consider using the options `--fork --mount-proc`, Err z3bra.org 70 i+so that the process will see a "virtualized" `/proc` that will represent the Err z3bra.org 70 i+namespace, and not the host. For example: Err z3bra.org 70 i+ Err z3bra.org 70 i+ # unshare -p --fork --mount-proc ps -faux Err z3bra.org 70 i+ USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Err z3bra.org 70 i+ root 1 0.0 0.0 13012 2276 pts/2 R+ 23:57 0:00 ps -aux Err z3bra.org 70 i+ Err z3bra.org 70 i+We just found a way to isolate our program a bit more: Err z3bra.org 70 i+ Err z3bra.org 70 i+ # unshare -fpiumU --mount-proc env -i container="handcraft" chroot ./rootfs ./hello Err z3bra.org 70 i+ Err z3bra.org 70 i+For the curious, you can check the `nsenter(1)` program, that will help you Err z3bra.org 70 i+run a process within another process namespace. Err z3bra.org 70 i+ Err z3bra.org 70 i+#### 2.3 `ip-netns(1)` Err z3bra.org 70 i+ Err z3bra.org 70 i+The `ip(1)` command includes a `netns` subcommand to manage network namespaces. Err z3bra.org 70 i+It is useful to give network access to a process while keeping it away from the Err z3bra.org 70 i+host's network stack. Err z3bra.org 70 i+ Err z3bra.org 70 i+You need to be familiar with the concept of Err z3bra.org 70 i+[bridges](https://en.wikipedia.org/wiki/Bridging_\(networking\)), and Err z3bra.org 70 i+[virtual network interfaces](https://en.wikipedia.org/wiki/Virtual_network_interface) Err z3bra.org 70 i+(veth) pairs here. Err z3bra.org 70 i+Virtual ethernet devices pairs acts like both ends of a tube: when a packet is Err z3bra.org 70 i+written on one end, it is also written on the other. This simple concept will Err z3bra.org 70 i+help us get an internet acces *inside* the container, while using the network Err z3bra.org 70 i+stack of the host. Err z3bra.org 70 i+ Err z3bra.org 70 i+The process is easy: we will create a `veth` pair, move one end inside the Err z3bra.org 70 i+container, and bridge the other side with a physical interface. Err z3bra.org 70 i+Let's assume your physical interface is named `eth0`. We will create a bridge Err z3bra.org 70 i+`br0`, add `eth0` on this bridge, and request an IP for this interface: Err z3bra.org 70 i+ Err z3bra.org 70 i+ # brctl addbr br0 Err z3bra.org 70 i+ # brctl addif br0 eth0 Err z3bra.org 70 i+ # dhcpcd br0 Err z3bra.org 70 i+ Err z3bra.org 70 i+Then, we create a network namespace, a veth pair and move one end if this Err z3bra.org 70 i+pair inside the namespace (we will name it "handcraft"): Err z3bra.org 70 i+ Err z3bra.org 70 i+ # ip netns add handcraft Err z3bra.org 70 i+ # ip link add veth1 type veth peer name eth1 Err z3bra.org 70 i+ # ip link set eth1 netns handcraft Err z3bra.org 70 i+ Err z3bra.org 70 i+Now that our namespace has an interface able to communicate with the outside Err z3bra.org 70 i+world, we can bridge it together with `eth0` and request an IP: Err z3bra.org 70 i+ Err z3bra.org 70 i+ # brctl addif br0 veth1 Err z3bra.org 70 i+ # ip link set veth1 up Err z3bra.org 70 i+ # ip netns exec dhcpcd eth1 Err z3bra.org 70 i+ Err z3bra.org 70 i+We now have a namespace 100% isolated from the host, that can reach the Err z3bra.org 70 i+outside world over ethernet! Err z3bra.org 70 i+You can run any command inside this namespace, and they will use the network Err z3bra.org 70 i+stack we just created. For example: Err z3bra.org 70 i+ Err z3bra.org 70 i+ # ip netns exec handcraft curl -s z3bra.org/slj Err z3bra.org 70 i+ Err z3bra.org 70 i+We can now run our `hello` program with its own network stack (even though Err z3bra.org 70 i+it doesn't make any sense!): Err z3bra.org 70 i+ Err z3bra.org 70 i+ # ip netns exec handcraft unshare -fpiuUm --mount-proc env -i container="handcraft" chroot ./rootfs ./hello Err z3bra.org 70 i+ Err z3bra.org 70 i+Don't feel ashamed by such a long-ass command, because that is what `lxc`, Err z3bra.org 70 i+`docker`, and other container applications do behind your back! Err z3bra.org 70 i+ Err z3bra.org 70 i+### 3. Bonus: cgroups Err z3bra.org 70 i+ Err z3bra.org 70 i+Control groups are a feature of the kernel used to limit the resources Err z3bra.org 70 i+used by a process, or a group of processes. Cgroups can limit CPU Err z3bra.org 70 i+shares, RAM, network usage, disk I/O, ... Err z3bra.org 70 i+ Err z3bra.org 70 i+I will not cover their usage here, as this article is already long, but Err z3bra.org 70 i+They are totally worth mentionning as an improvement over our containers. Err z3bra.org 70 i+ Err z3bra.org 70 i+### 4. Congratz Err z3bra.org 70 i+ Err z3bra.org 70 i+Containers are a truly awesome concept. They make great use of new Err z3bra.org 70 i+technologies, and all the tools presented above allow the standard users Err z3bra.org 70 i+to exploit them in many different ways. Err z3bra.org 70 i+Applications like LXC and docker both recreate a full operating system, Err z3bra.org 70 i+even though they are used to run a single process (web server, database, ...). Err z3bra.org 70 i+ Err z3bra.org 70 i+By knowing how this works under the hood, we will be able to use the Err z3bra.org 70 i+container technology to isolate the application in a smarter way than Err z3bra.org 70 i+shipping it along with a full operating system. Err z3bra.org 70 i+ Err z3bra.org 70 i+For further reading, check out these links: Err z3bra.org 70 i Err z3bra.org 70 i-2.0 chroot Err z3bra.org 70 i-2.1 unshare / nsenter Err z3bra.org 70 i-2.2 ip-netns Err z3bra.org 70 i+* [http://doger.io](http://doger.io) Err z3bra.org 70 i+* [http://git.r-36.net/ns-tools](http://git.r-36.net/ns-tools) Err z3bra.org 70 i+* [https://github.com/arachsys/containers](https://github.com/arachsys/containers) Err z3bra.org 70 i+* [https://github.com/p8952/bocker](https://github.com/p8952/bocker) Err z3bra.org 70 i Err z3bra.org 70 i-3. cgroups Err z3bra.org 70 i+Now get out there, and make some containers! Err z3bra.org 70 1diff --git a/config.mk b/config.mk /scm/monochromatic/file/config.mk.gph z3bra.org 70 it@@ -1,4 +1,4 @@ Err z3bra.org 70 i-MD = ./markdown Err z3bra.org 70 i+MD = markdown Err z3bra.org 70 i Err z3bra.org 70 i NAME = monochromatic Err z3bra.org 70 i PREFIX = /var/www/blog.z3bra.org Err z3bra.org 70 1diff --git a/css/monochrome.css b/css/monochrome.css /scm/monochromatic/file/css/monochrome.css.gph z3bra.org 70 it@@ -85,27 +85,13 @@ header h1 a:hover { Err z3bra.org 70 i /* }}} */ Err z3bra.org 70 i Err z3bra.org 70 i /* Coding style () {{{ */ Err z3bra.org 70 i-code, pre { Err z3bra.org 70 i- color: inherit; Err z3bra.org 70 i+pre { Err z3bra.org 70 i+ color: #eee; Err z3bra.org 70 i font-family: monospace; Err z3bra.org 70 i font-size: 90%; Err z3bra.org 70 i- padding: 2px; Err z3bra.org 70 i- background-color: #eee; Err z3bra.org 70 i- border: 1px solid #bbb; Err z3bra.org 70 i+ background-color: #333; Err z3bra.org 70 i+ border: 1px solid #eee; Err z3bra.org 70 i border-radius: 4px; Err z3bra.org 70 i-} Err z3bra.org 70 i- Err z3bra.org 70 i-/* Err z3bra.org 70 i- * code:before, code:after { Err z3bra.org 70 i- * content: "`"; Err z3bra.org 70 i- * } Err z3bra.org 70 i- */ Err z3bra.org 70 i- Err z3bra.org 70 i-pre code:before, pre code:after { Err z3bra.org 70 i- content: none; Err z3bra.org 70 i-} Err z3bra.org 70 i- Err z3bra.org 70 i-pre { Err z3bra.org 70 i padding: 10px; Err z3bra.org 70 i overflow-x: auto; Err z3bra.org 70 i overflow-y: hidden; Err z3bra.org 70 1diff --git a/index.txt b/index.txt /scm/monochromatic/file/index.txt.gph z3bra.org 70 it@@ -1,3 +1,4 @@ Err z3bra.org 70 i+* 0x001b - [Hand-crafted containers](/2016/03/hand-crafted-containers.html) Err z3bra.org 70 i * 0x001a - [Make your own distro](/2016/01/make-your-own-distro.html) Err z3bra.org 70 i * 0x0019 - [Install Alpine at online.net](/2015/08/install-alpine-at-onlinenet.html) Err z3bra.org 70 i * 0x0018 - [cross-compiling with PCC and musl](/2015/08/cross-compiling-with-pcc-and-musl.html) Err z3bra.org 70 .