[PATCH v4 0/4] namei: O_* flags to restrict path resolution
From: Aleksa Sarai
Date: Mon Nov 12 2018 - 09:27:13 EST
Sorry for not sending a series earlier, I've been busy with assignments.
Patch changelog:
v4:
* Remove AT_* flag reservations, as they require more discussion.
* Switch to path_is_under() over __d_path() for breakout checking.
* Make O_XDEV no longer block openat("/tmp", "/", O_XDEV) -- dirfd
is now ignored for absolute paths to match other flags.
* Improve the dirfd_path_init() refactor and move it to a separate
commit.
* Remove reference to Linux-capsicum.
* Switch "proclink" name to "magic link".
v3: [resend]
v2:
* Made ".." resolution with AT_THIS_ROOT and AT_BENEATH safe(r) with
some semi-aggressive __d_path checking (see patch 3).
* Disallowed "proclinks" with AT_THIS_ROOT and AT_BENEATH, in the
hopes they can be re-enabled once safe.
* Removed the selftests as they will be reimplemented as xfstests.
* Removed stat(2) support, since you can already get it through
O_PATH and fstatat(2).
The need for some sort of control over VFS's path resolution (to avoid
malicious paths resulting in inadvertent breakouts) has been a very
long-standing desire of many userspace applications. This patchset is a
revival of Al Viro's old AT_NO_JUMPS[1,2] patchset (which was a variant
of David Drysdale's O_BENEATH patchset[3] which was a spin-off of the
Capsicum project[4]) with a few additions and changes made based on the
previous discussion within [5] as well as others I felt were useful.
In line with the conclusions of the original discussion of AT_NO_JUMPS,
the flag has been split up into separate flags:
* O_XDEV blocks all mountpoint crossings (upwards, downwards, or
through absolute links). Absolute pathnames alone in openat(2) do
not trigger this.
* O_NOMAGICLINKS blocks resolution through /proc/$pid/fd-style links.
This is done by blocking the usage of nd_jump_link() during
resolution in a filesystem. The term "magic links" is used to match
with the only reference to these links in Documentation/, but I'm
happy to change the name.
It should be noted that this is different to the scope of O_NOFOLLOW
in that it applies to all path components. However, you can do
open(O_NOFOLLOW|O_NOMAGICLINKS|O_PATH) on a "magic link" and it will
*not* fail (assuming that no parent component was a "magic link"),
and you will have an fd for the "magic link".
* O_BENEATH disallows escapes to outside the starting dirfd's tree,
using techniques such as ".." or absolute links. Absolute paths in
openat(2) are also disallowed. Conceptually this flag is to ensure
you "stay below" a certain point in the filesystem tree -- but this
requires some additional to protect against various races that would
allow escape using ".." (see patch 4 for more detail).
Currently O_BENEATH implies O_NOMAGICLINKS, because it can trivially
beam you around the filesystem (breaking the protection). In future,
there might be similar safety checks as in patch 4, but that
requires more discussion.
In addition, two new flags were added that expand on the above ideas:
* O_NOSYMLINKS does what it says on the tin. No symlink resolution is
allowed at all, including "magic links". Just as with O_NOMAGICLINKS
this can still be used with (O_PATH|O_NOFOLLOW) to open an fd for
the symlink as long as no parent path had a symlink component.
* O_THISROOT is an extension of O_BENEATH that, rather than blocking
attempts to move past the root, forces all such movements to be
scoped to the starting point. This provides chroot(2)-like
protection but without the cost of a chroot(2) for each filesystem
operation, as well as being safe against race attacks that chroot(2)
is not.
If a race is detected (as with O_BENEATH) then an error is
generated, and similar to O_BENEATH it is not permitted to cross
"magic links" with O_THISROOT.
The primary need for this is from container runtimes, which
currently need to do symlink scoping in userspace[6] when opening
paths in a potentially malicious container. There is a long list of
CVEs that could have bene mitigated by having O_THISROOT (such as
CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, to name a few).
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: David Howells <dhowells@xxxxxxxxxx>
Cc: Jann Horn <jannh@xxxxxxxxxx>
Cc: Christian Brauner <christian@xxxxxxxxxx>
Cc: David Drysdale <drysdale@xxxxxxxxxx>
Cc: <containers@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Cc: <linux-fsdevel@xxxxxxxxxxxxxxx>
Cc: <linux-api@xxxxxxxxxxxxxxx>
[1]: https://lwn.net/Articles/721443/
[2]: https://lore.kernel.org/patchwork/patch/784221/
[3]: https://lwn.net/Articles/619151/
[4]: https://lwn.net/Articles/603929/
[5]: https://lwn.net/Articles/723057/
[6]: https://github.com/cyphar/filepath-securejoin
Aleksa Sarai (4):
namei: split out nd->dfd handling to dirfd_path_init
namei: O_BENEATH-style path resolution flags
namei: O_THISROOT: chroot-like path resolution
namei: aggressively check for nd->root escape on ".." resolution
fs/fcntl.c | 2 +-
fs/namei.c | 205 ++++++++++++++++++++++---------
fs/open.c | 13 +-
include/linux/fcntl.h | 3 +-
include/linux/namei.h | 8 ++
include/uapi/asm-generic/fcntl.h | 20 +++
6 files changed, 189 insertions(+), 62 deletions(-)
--
2.19.1