[PATCH RFC v8 06/10] namei: LOOKUP_IN_ROOT: chroot-like path resolution

From: Aleksa Sarai
Date: Mon May 20 2019 - 09:37:27 EST

The primary motivation for the need for this flag is container runtimes
which have to interact with malicious root filesystems in the host
namespaces. One of the first requirements for a container runtime to be
secure against a malicious rootfs is that they correctly scope symlinks
(that is, they should be scoped as though they are chroot(2)ed into the
container's rootfs) and ".."-style paths[*]. The already-existing
LOOKUP_XDEV and LOOKUP_NO_MAGICLINKS help defend against other potential
attacks in a malicious rootfs scenario.

Currently most container runtimes try to do this resolution in
userspace[1], causing many potential race conditions. In addition, the
"obvious" alternative (actually performing a {ch,pivot_}root(2))
requires a fork+exec (for some runtimes) which is *very* costly if
necessary for every filesystem operation involving a container.

[*] At the moment, ".." and magic-link jumping are disallowed for the
same reason it is disabled for LOOKUP_BENEATH -- currently it is not
safe to allow it. Future patches may enable it unconditionally once
we have resolved the possible races (for "..") and semantics (for
magic-link jumping).

The most significant *at(2) semantic change with LOOKUP_IN_ROOT is that
absolute pathnames no longer cause dirfd to be ignored completely. The
rationale is that LOOKUP_IN_ROOT must necessarily chroot-scope symlinks
with absolute paths to dirfd, and so doing it for the base path seems to
be the most consistent behaviour (and also avoids foot-gunning users who
want to scope paths that are absolute).

[1]: https://github.com/cyphar/filepath-securejoin

Co-developed-by: Christian Brauner <christian@xxxxxxxxxx>
Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx>
fs/namei.c | 6 +++---
include/linux/namei.h | 1 +
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f997c82eb9c2..d18671a06bdb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1137,7 +1137,7 @@ const char *get_link(struct nameidata *nd, bool trailing)
if (unlikely(nd->flags & LOOKUP_NO_MAGICLINKS))
return ERR_PTR(-ELOOP);
/* Not currently safe. */
- if (unlikely(nd->flags & LOOKUP_BENEATH))
+ if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT)))
return ERR_PTR(-EXDEV);
* For trailing_symlink we check whether the symlink's
@@ -1827,7 +1827,7 @@ static inline int handle_dots(struct nameidata *nd, int type)
* cause our parent to have moved outside of the root and us to skip
* over it.
- if (unlikely(nd->flags & LOOKUP_BENEATH))
+ if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT)))
return -EXDEV;
if (!nd->root.mnt)
@@ -2378,7 +2378,7 @@ static const char *path_init(struct nameidata *nd, unsigned flags)

nd->m_seq = read_seqbegin(&mount_lock);

- if (unlikely(nd->flags & LOOKUP_BENEATH)) {
+ if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) {
error = dirfd_path_init(nd);
if (unlikely(error))
return ERR_PTR(error);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 7bc819ad0cd3..4b1ee717cb14 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
#define LOOKUP_NO_MAGICLINKS 0x040000 /* No /proc/$pid/fd/ "symlink" crossing. */
#define LOOKUP_NO_SYMLINKS 0x080000 /* No symlink crossing *at all*.
+#define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as %current->fs->root. */

extern int path_pts(struct path *path);