Re: [PATCH v2] vfs: add support for empty path to openat2(2)

From: Christian Brauner

Date: Wed Mar 04 2026 - 09:03:41 EST


On Mon, Mar 02, 2026 at 02:16:50PM +0100, Jori Koolstra wrote:
> To get an operable version of an O_PATH file descriptors, it is possible
> to use openat(fd, ".", O_DIRECTORY) for directories, but other files
> currently require going through open("/proc/<pid>/fd/<nr>") which
> depends on a functioning procfs.
>
> This patch adds the OPENAT2_EMPTY_PATH flag to openat2(2). If passed
> LOOKUP_EMPTY is set at path resolve time.
>
> Signed-off-by: Jori Koolstra <jkoolstra@xxxxxxxxx>
> ---
> fs/open.c | 9 ++++-----
> include/linux/fcntl.h | 5 ++++-
> include/uapi/linux/openat2.h | 4 ++++
> 3 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index 91f1139591ab..4f0a76dc8993 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -1160,7 +1160,7 @@ struct file *kernel_file_open(const struct path *path, int flags,
> EXPORT_SYMBOL_GPL(kernel_file_open);
>
> #define WILL_CREATE(flags) (flags & (O_CREAT | __O_TMPFILE))
> -#define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC)
> +#define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC | OPENAT2_EMPTY_PATH)
>
> inline struct open_how build_open_how(int flags, umode_t mode)
> {
> @@ -1185,9 +1185,6 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
> int lookup_flags = 0;
> int acc_mode = ACC_MODE(flags);
>
> - BUILD_BUG_ON_MSG(upper_32_bits(VALID_OPEN_FLAGS),
> - "struct open_flags doesn't yet handle flags > 32 bits");
> -
> /*
> * Strip flags that aren't relevant in determining struct open_flags.
> */
> @@ -1281,6 +1278,8 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
> lookup_flags |= LOOKUP_DIRECTORY;
> if (!(flags & O_NOFOLLOW))
> lookup_flags |= LOOKUP_FOLLOW;
> + if (flags & OPENAT2_EMPTY_PATH)
> + lookup_flags |= LOOKUP_EMPTY;
>
> if (how->resolve & RESOLVE_NO_XDEV)
> lookup_flags |= LOOKUP_NO_XDEV;
> @@ -1362,7 +1361,7 @@ static int do_sys_openat2(int dfd, const char __user *filename,
> if (unlikely(err))
> return err;
>
> - CLASS(filename, name)(filename);
> + CLASS(filename_flags, name)(filename, op.lookup_flags);
> return FD_ADD(how->flags, do_file_open(dfd, name, &op));
> }
>
> diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
> index a332e79b3207..d1bb87ff70e3 100644
> --- a/include/linux/fcntl.h
> +++ b/include/linux/fcntl.h
> @@ -7,10 +7,13 @@
>
> /* List of all valid flags for the open/openat flags argument: */
> #define VALID_OPEN_FLAGS \
> + /* lower 32-bit flags */ \
> (O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \
> O_APPEND | O_NDELAY | O_NONBLOCK | __O_SYNC | O_DSYNC | \
> FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
> - O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE)
> + O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | \
> + /* upper 32-bit flags (openat2(2) only) */ \
> + OPENAT2_EMPTY_PATH)

I forgot to mention this cautionary little nugget in the last review...

The legacy open(2)/openat(2) codepaths currently aren't able to deal
with flag values in the upper 32-bit of a u64 flag parameter.

Basically, by adding OPENAT2_EMPTY_PATH into VALID_OPEN_FLAGS that's now
a u64. That has fun consequences:

inline struct open_how build_open_how(int flags, umode_t mode)
{
struct open_how how = {
.flags = flags & VALID_OPEN_FLAGS,

This will now cause bits 32 to 63 to be raised and how.flags ends up
with OPENAT2_EMPTY_PATH by pure chance.

That in turn means open(2), openat(2), and io_uring openat can
inadvertently enable OPENAT2_EMPTY_PATH when the flags value has bit 31
set.

So that needs to be fixed.

Another thing that I would like to see explicitly mentioned in the
commit message is that OPENAT2_EMPTY_PATH means that it will now be
possible to reopen an O_PATH file descriptor in sandboxes that don't
mount procfs.

IOW, if there's any program out there that - weirdly - relies on the
fact that you can create a container without procfs mounted, drop
cap_sys_admin and then funnel O_PATH fds into there via SCM_RIGHTS
relying on them not being able to be re-opened read-write will be
surprised.

While I don't necessarily think (anymore) that this is a realistic
scenario it is worth documenting it.