Recently, I’ve been spending a fair amount of time looking at the OCI runtime specification and at the reference implementation, runc. I tend to learn best by doing, and the low-level bits of how containers work have interested me for a long while now, so I’ve started writing a new non-production OCI runtime to learn more about it.

In the OCI spec, the docker run interface that folks familiar with Docker containers use has been broken up into multiple steps: create and start. But what’s interesting about runc’s implementation here is that the standard input/output streams (which I tend to refer to as STDIO) for the container’s main process are hooked up to the STDIO streams of runc create rather than that of runc start. This means that if you run runc create in one terminal, and then run runc start in a second terminal, the input and output of your container will be hooked up to the first terminal rather than the second!

I’m not entirely clear on the history of why runc behaves this way, but I think the how is interesting on its own. And that how is through multiple processes synchronizing via a FIFO.

In computer science, FIFO usually stands for first in, first out, and refers to the order in which processing occurs over a set of elements; the oldest is processed first. In Linux (and POSIX-compliant Unix systems), a FIFO is also a special kind of file and is otherwise known as a named pipe.

You might be familiar with a shell pipeline in which the output of one process becomes the input to another. A FIFO/named pipe allows for a similar kind of behavior but without the requirement of running inside something like a shell pipeline; instead, the pipe can exist on the filesystem and be referred to by its name. And just like a shell pipeline, writes to and reads from the pipe will block until a corresponding read or write occurs on the other side of the pipe. This gives us a cheap source of multi-process synchronization mediated by the kernel.

Let’s imagine a set of three processes that interact. Process 1 can be responsible for setting up something to happen later, process 2 (the target) might be responsible for making the thing happen, and process 3 (the trigger) might be responsible for determining when is the right time for the thing to happen. These three processes can use a FIFO as the synchronization mechanism: process 1 creates and the other two processes open opposite sides of the FIFO.

In Go, the first program (setup.go) might look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
package main

import "golang.org/x/sys/unix"

func main() {
	// do some sort of setup for whatever is interesting

	// set up the FIFO for later triggering
	if err := unix.Mkfifo("my.fifo", 0600); err != nil {
		panic(err)
	}

	// optional: invoke the target program here

	fmt.Println("Setup complete!")
}

The second (target.go) might look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
package main

import (
	"golang.org/x/sys/unix"
)

func main() {
	// this blocks until the fifo is opened for read on the other side
	fd, err := unix.Open("my.fifo", unix.O_WRONLY, 0)
	if err != nil {
		panic(err)
	}

	// this also blocks until a read call happens
	if _, err := unix.Write(fd, []byte("0")); err != nil {
		panic(err)
	}

	// then do something interesting here
	fmt.Println("Something _useful_!")
}

The third (trigger.go) might look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
package main

import (
	"io/ioutil"
	"os"
)

func main() {
	// this blocks until the fifo is opened for write on the other side
	f, err := os.OpenFile("my.fifo", os.O_RDONLY, 0)
	if err != nil {
		panic(err)
	}

	// this blocks until data is written and the file is closed
	data, err := ioutil.ReadAll(f)
	if err != nil {
		panic(err)
	}

	fmt.Println("Triggered!")
}

This code is super simplistic, but hopefully illustrates how a FIFO can provide a reasonable synchronization method. In runc, setup.go corresponds to part of the work that runc create does, target.go corresponds to the helper process that eventually executes the container process, and trigger.go corresponds to part of the work that runc start does.

There are some downsides and nuances to be aware of too. As with any named file on the filesystem, any process with the appropriate permissions can open the file. This means there can be race conditions: maybe another process is able to open the FIFO for writing prior to your target program. Or another process could open the FIFO for reading prior to your trigger program, causing an early trigger to happen. The typical way to protect against this is by setting appropriate file permissions (and having the discipline to not run something malicious as root). runc doesn’t (and can’t really) protect itself against a rogue root process, but it does set the permissions on its state directory and on the FIFO itself such that a root process is required to read from the FIFO (since reading is the trigger).

Named files on the filesystem have some other inherent concerns to keep in mind. For example, the name can change or the file can be removed and replaced by one with the same name. If you’re using the file for signaling (like above), this can create the opportunity for those changes to happen prior to your target program opening the FIFO (and depending on what you’re using it for, this might end up being an attack vector for something nefarious to happen). runc mitigates the file-replacement problem by avoiding the “nameness” of the FIFO for the target program. runc is able to do this because it is in control of the target program’s execution; runc create isn’t just responsible for creating the FIFO but also runs the target program too. Instead of the target opening the FIFO by name, runc create opens the FIFO with the O_PATH flag and passes the open file descriptor to the target program. runc sets a well-known environment variable (_LIBCONTAINER_FIFOFD) that the target program reads, then the target program is able to re-open the FIFO by opening the special file /proc/self/fd/${fifofd}.

runc’s source code isn’t super easy to navigate, but here are some selected sections that you can look at:

Reading through runc to learn how it worked and writing my own FIFO-synchronization code was pretty interesting for me. I hope you found this interesting too!

Note: The source code included in this blog post is licensed under the terms of the MIT-0 license.

Copyright 2020 Samuel Karp

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.