For review: rewritten pivot_root(2) manual page

From: Michael Kerrisk (man-pages)
Date: Mon Sep 23 2019 - 08:04:29 EST


Hello all,

I'm looking for review input for the pivot_root(2) manual
page, which I have substantially rewritten.

The original page was written 19 years ago, and has seen
little revision since that time. It contains a number of
errors. Even at the time it was first released, the
manual page already had some inaccuracies, since it was
written before the final release of the system call, whose
implementation was subsequently changed, but the manual
page was not updated to reflect those changes.

The revised page is more than 2.5 times the size of the
previous page, and now includes an example program.
As well as fixing a number of errors and adding many
missing details, the page also adds a description of the
pivot_root(".", ".") technique.

I would be happy to receive error corrections and notes
on missing details that should be added to the page.

The rendered page is shown below. The page source can
be found in the Git repo at
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git

One area of the page that I'm still not really happy with
is the "vague" wording in the second paragraph and the note
in the third paragraph about the system call possibly
changing. These pieces survive (in somewhat modified form)
from the original page, which was written before the
system call was released, and it seems there was some
question about whether the system call might still change
its behavior with respect to the root directory and current
working directory of other processes. However, after 19
years, nothing has changed, and surely it will not in the
future, since that would constitute an ABI breakage.
I'm considering to rewrite these pieces to exactly
describe what the system call does (which I already
do in the third paragraph) and remove the "may or may not"
pieces in the second paragraph. I'd welcome comments
on making that change.

The rendered page is shown below. The page source is at
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man2/pivot_root.2
in the Git repo at
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git

Thanks,

Michael

NAME
pivot_root - change the root filesystem

SYNOPSIS
int pivot_root(const char *new_root, const char *put_old);

Note: There is no glibc wrapper for this system call; see NOTES.

DESCRIPTION
pivot_root() changes the root filesystem in the mount namespace of
the calling process. More precisely, it moves the root filesystem
to the directory put_old and makes new_root the new root filesysâ
tem. The calling process must have the CAP_SYS_ADMIN capability
in the user namespace that owns the caller's mount namespace.

pivot_root() may or may not change the current root and the curâ
rent working directory of any processes or threads that use the
old root directory and which are in the same mount namespace as
the caller of pivot_root(). The caller of pivot_root() should
ensure that processes with root or current working directory at
the old root operate correctly in either case. An easy way to
ensure this is to change their root and current working directory
to new_root before invoking pivot_root(). Note also that
pivot_root() may or may not affect the calling process's current
working directory. It is therefore recommended to call chdir("/")
immediately after pivot_root().

The paragraph above is intentionally vague because at the time
when pivot_root() was first implemented, it was unclear whether
its affect on other process's root and current working directoâ
riesâand the caller's current working directoryâmight change in
the future. However, the behavior has remained consistent since
this system call was first implemented: pivot_root() changes the
root directory and the current working directory of each process
or thread in the same mount namespace to new_root if they point to
the old root directory. (See also NOTES.) On the other hand,
pivot_root() does not change the caller's current working direcâ
tory (unless it is on the old root directory), and thus it should
be followed by a chdir("/") call.

The following restrictions apply:

- new_root and put_old must be directories.

- new_root and put_old must not be on the same filesystem as the
current root. In particular, new_root can't be "/" (but can be
a bind mounted directory on the current root filesystem).

- put_old must be at or underneath new_root; that is, adding a
nonnegative number of /.. to the string pointed to by put_old
must yield the same directory as new_root.

- new_root must be a mount point. (If it is not otherwise a
mount point, it suffices to bind mount new_root on top of
itself.)

- The propagation type of the parent mount of new_root and the
parent mount of the current root directory must not be
MS_SHARED; similarly, if put_old is an existing mount point,
its propagation type must not be MS_SHARED. These restrictions
ensure that pivot_root() never propagates any changes to
another mount namespace.

- The current root directory must be a mount point.

RETURN VALUE
On success, zero is returned. On error, -1 is returned, and errno
is set appropriately.

ERRORS
pivot_root() may fail with any of the same errors as stat(2).
Additionally, it may fail with the following errors:

EBUSY new_root or put_old is on the current root filesystem.
(This error covers the pathological case where new_root is
"/".)

EINVAL new_root is not a mount point.

EINVAL put_old is not underneath new_root.

EINVAL The current root directory is not a mount point (because of
an earlier chroot(2)).

EINVAL The current root is on the rootfs (initial ramfs) filesysâ
tem; see NOTES.

EINVAL Either the mount point at new_root, or the parent mount of
that mount point, has propagation type MS_SHARED.

EINVAL put_old is a mount point and has the propagation type
MS_SHARED.

ENOTDIR
new_root or put_old is not a directory.

EPERM The calling process does not have the CAP_SYS_ADMIN capaâ
bility.

VERSIONS
pivot_root() was introduced in Linux 2.3.41.

CONFORMING TO
pivot_root() is Linux-specific and hence is not portable.

NOTES
Glibc does not provide a wrapper for this system call; call it
using syscall(2).

A command-line interface for this system call is provided by
pivot_root(8).

pivot_root() allows the caller to switch to a new root filesystem
while at the same time placing the old root mount at a location
under new_root from where it can subsequently be unmounted. (The
fact that it moves all processes that have a root directory or
current working directory on the old root filesystem to the new
root filesystem frees the old root filesystem of users, allowing
it to be unmounted more easily.)

A typical use of pivot_root() is during system startup, when the
system mounts a temporary root filesystem (e.g., an initrd), then
mounts the real root filesystem, and eventually turns the latter
into the current root of all relevant processes or threads. A
modern use is to set up a root filesystem during the creation of a
container.

The fact that pivot_root() modifies process root and current workâ
ing directories in the manner noted in DESCRIPTION is necessary in
order to prevent kernel threads from keeping the old root direcâ
tory busy with their root and current working directory, even if
they never access the filesystem in any way.

new_root and put_old may be the same directory. In particular,
the following sequence allows a pivot-root operation without needâ
ing to create and remove a temporary directory:

chdir(new_root);
mount("", ".", MS_SLAVE | MS_REC, NULL);
/* Or: MS_PRIVATE | MS_REC */
pivot_root(".", ".");
umount2(".", MNT_DETACH);

This sequence succeeds because the pivot_root() call stacks the
old root mount point (old_root) on top of the new root mount point
at /. At that point, the calling process's root directory and
current working directory refer to the new root mount point
(new_root). During the subsequent umount() call, resolution of
"." starts with new_root and then moves up the list of mounts
stacked at /, with the result that old_root is unmounted.

The rootfs (initial ramfs) cannot be pivot_root()ed. The recomâ
mended method of changing the root filesystem in this case is to
delete everything in rootfs, overmount rootfs with the new root,
attach stdin/stdout/stderr to the new /dev/console, and exec the
new init(1). Helper programs for this process exist; see
switch_root(8).

EXAMPLE
The program below demonstrates the use of pivot_root() inside a
mount namespace that is created using clone(2). After pivoting to
the root directory named in the program's first command-line arguâ
ment, the child created by clone(2) then executes the program
named in the remaining command-line arguments.

We demonstrate the program by creating a directory that will serve
as the new root filesystem and placing a copy of the (statically
linked) busybox(1) executable in that directory.

$ mkdir /tmp/rootfs
$ ls -id /tmp/rootfs # Show inode number of new root directory
319459 /tmp/rootfs
$ cp $(which busybox) /tmp/rootfs
$ PS1='bbsh$ ' sudo ./pivot_root_demo /tmp/rootfs /busybox sh
bbsh$ PATH=/
bbsh$ busybox ln busybox ln
bbsh$ ln busybox echo
bbsh$ ln busybox ls
bbsh$ ls
busybox echo ln ls
bbsh$ ls -id / # Compare with inode number above
319459 /
bbsh$ echo 'hello world'
hello world

Program source

/* pivot_root_demo.c */

#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <limits.h>

#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

static int
pivot_root(const char *new_root, const char *put_old)
{
return syscall(SYS_pivot_root, new_root, put_old);
}

#define STACK_SIZE (1024 * 1024)

static int /* Startup function for cloned child */
child(void *arg)
{
char **args = arg;
char *new_root = args[0];
const char *put_old = "/oldrootfs";
char path[PATH_MAX];

/* Ensure that 'new_root' and its parent mount don't have
shared propagation (which would cause pivot_root() to
return an error), and prevent propagation of mount
events to the initial mount namespace */

if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL) == 1)
errExit("mount-MS_PRIVATE");

/* Ensure that 'new_root' is a mount point */

if (mount(new_root, new_root, NULL, MS_BIND, NULL) == -1)
errExit("mount-MS_BIND");

/* Create directory to which old root will be pivoted */

snprintf(path, sizeof(path), "%s/%s", new_root, put_old);
if (mkdir(path, 0777) == -1)
errExit("mkdir");

/* And pivot the root filesystem */

if (pivot_root(new_root, path) == -1)
errExit("pivot_root");

/* Switch the current working working directory to "/" */

if (chdir("/") == -1)
errExit("chdir");

/* Unmount old root and remove mount point */

if (umount2(put_old, MNT_DETACH) == -1)
perror("umount2");
if (rmdir(put_old) == -1)
perror("rmdir");

/* Execute the command specified in argv[1]... */

execv(args[1], &args[1]);
errExit("execv");
}

int
main(int argc, char *argv[])
{
/* Create a child process in a new mount namespace */

char *stack = malloc(STACK_SIZE);
if (stack == NULL)
errExit("malloc");

if (clone(child, stack + STACK_SIZE,
CLONE_NEWNS | SIGCHLD, &argv[1]) == -1)
errExit("clone");

/* Parent falls through to here; wait for child */

if (wait(NULL) == -1)
errExit("wait");

exit(EXIT_SUCCESS);
}

SEE ALSO
chdir(2), chroot(2), mount(2), stat(2), initrd(4), mount_namesâ
paces(7), pivot_root(8), switch_root(8)


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/