Re: [RFC PATCH 0/1] vfs: transitive upgrade restrictions for fds

From: Jeff Layton

Date: Tue Mar 24 2026 - 08:39:29 EST


On Mon, 2026-03-23 at 23:00 +0100, Jori Koolstra wrote:
> Add upgrade restrictions to openat2(). Extend struct open_how to allow
> setting transitive restrictions on using file descriptors to open other
> files. A use case for this feature is to block services or containers
> from re-opening/upgrading an O_PATH file descriptor through e.g.
> /proc/<pid>/fd/<nr> or OPENAT2_EMPTY_PATH (if upstreamed) as O_WRONLY.
>
> The implementation idea is this: magic paths like /proc/<pid>/fd/<nr>
> (currently the only one of its sort AFAIK) go through nd_jump_link() to
> hard set current->nameidata. To include information about the fd
> yielding the magic link, we add a new struct jump_how as a parameter.
> This struct may include restictions or other metadata attached to the
> magic link jump other than the struct path to jump to. So far it has
> only one unsigned int field: allowed_upgrades. This is a flag int that
> (for now) may be either READ_UPGRADABLE, WRITE_UPGRADABLE, or
> DENY_UPGRADES.
>
> The idea is that you can restrict what kind of open flags may be used
> to open files in any way using this fd as a starting point
> (transitively). The check is enforced in may_open_upgrade(), which is
> just the old may_open() with an extra test. To keep this state attached
> to the fds, we add a field f_allowed_upgrades to struct file. Then
> in do_open(), after success, we compute:
>
> file->f_allowed_upgrades =
> op->allowed_upgrades & nd->allowed_upgrades;
>
> where op is the struct open_flags that is build from open_how in
> build_open_flags(), and nd->allowed_upgrades is set during path
> traversal either in path_init() or nd_jump_link().
>
> The implementation and the idea are a bit rough; it is the first bit of
> less trivial work I have done on the kernel, hence the RFC status. I did
> create some self tests already which this patch passes, and nothing
> seems to break on a fresh vng kernel. But obviously there may be MANY
> things I am overlooking.
>
> The original idea for this features comes form the UAPI group kernel
> feature idea list [1].
>
> [1] https://github.com/uapi-group/kernel-features?tab=readme-ov-file#upgrade-masks-in-openat2
>
> Jori Koolstra (1):
> vfs: transitive upgrade restrictions for fds
>
> fs/file_table.c | 2 ++
> fs/internal.h | 1 +
> fs/namei.c | 38 ++++++++++++++++++++++++++++----
> fs/open.c | 9 ++++++++
> fs/proc/base.c | 24 ++++++++++++++------
> fs/proc/fd.c | 6 ++++-
> fs/proc/internal.h | 4 +++-
> include/linux/fcntl.h | 6 ++++-
> include/linux/fs.h | 1 +
> include/linux/namei.h | 15 ++++++++++++-
> include/uapi/asm-generic/fcntl.h | 4 ++++
> include/uapi/linux/openat2.h | 1 +
> 12 files changed, 96 insertions(+), 15 deletions(-)


It's an interesting idea, but I could see it being difficult to track
the result of this across a large chain of open fd's.

If you are going to do this, then at the very least you should add a
mechanism (fcntl() command?) to query the current f_allowed_upgrade
mask, so that this can be debugged in some fashion.
--
Jeff Layton <jlayton@xxxxxxxxxx>