[GIT PULL 16/16 for v7.2] vfs procfs

From: Christian Brauner

Date: Fri Jun 12 2026 - 11:22:53 EST


Hey Linus,

/* Summary */

This contains the procfs changes for this cycle:

* Revamp fs/filesystems.c

The file was a mess with a hand-rolled linked list in desperate need
of a cleanup. The filesystems list is now RCU-ified, /proc files can
be marked permanent from outside fs/proc/, and the string emitted
when reading /proc/filesystems is pre-generated and cached instead
of pointer-chasing and printfing entry by entry on every read. The
file is read frequently because libselinux reads it and is linked
into numerous frequently used programs (even ones you would not
suspect, like sed!). Scalability also improves since reference
maintenance on open/close is bypassed.

open+read+close cycle single-threaded (ops/s):
before: 442732
after: 1063462 (+140%)

open+read+close cycle with 20 processes (ops/s):
before: 606177
after: 3300576 (+444%)

A follow-up patch adds missing unlocks in some corner cases and
tidies things up.

* Relax the mount visibility check for subset=pid mounts

When procfs is mounted with subset=pid, all static files become
unavailable and only the dynamic pid information is accessible. In
that case there is no point in imposing the full mount visibility
restrictions on the mounter - everything that can be hidden in
procfs is already inaccessible. These restrictions prevented procfs
from being mounted inside rootless containers since almost all
container implementations overmount parts of procfs to hide certain
directories.

As part of this /proc/self/net is only shown in subset=pid mounts
for CAP_NET_ADMIN, reconfiguring subset=pid is rejected, the
SB_I_USERNS_VISIBLE superblock flag is replaced with an
FS_USERNS_MOUNT_RESTRICTED filesystem flag, fully visible mounts are
recorded in a list, and the mount restrictions are finally
documented.

* Protect ptrace_may_access() with exec_update_lock in procfs

Most uses of ptrace_may_access() in procfs should hold
exec_update_lock to avoid TOCTOU issues with concurrent privileged
execve() (like setuid binary execution). This fixes the easy cases -
the owner and visibility checks and the FD link permission checks -
with the gnarlier ones to follow later.

/* Testing */

gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)

No build failures or warnings were observed.

/* Conflicts */

Merge conflicts with mainline
=============================

No known conflicts.

Merge conflicts with other trees
================================

This will have a merge conflict with:
[1]: https://lore.kernel.org/20260612-vfs-misc-v72-13d57389d260@brauner

Both add a new fs_flags define at the same location in
include/linux/fs.h. The bit values don't overlap. It can be resolved
as follows:

diff --cc include/linux/fs.h
index 10d35a68f597,e7ff9f8b1485..dcd0575a3830
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@@ -2281,7 -2281,7 +2281,8 @@@ struct file_system_type
#define FS_MGTIME 64 /* FS uses multigrain timestamps */
#define FS_LBS 128 /* FS supports LBS */
#define FS_POWER_FREEZE 256 /* Always freeze on suspend/hibernate */
+ #define FS_USERNS_MOUNT_RESTRICTED 512 /* Restrict mount in userns if not already visible */
+#define FS_USERNS_DELEGATABLE 1024 /* Can be mounted inside userns from outside */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
int (*init_fs_context)(struct fs_context *);
const struct fs_parameter_spec *parameters;

The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:

Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)

are available in the Git repository at:

git@xxxxxxxxxxxxxxxxxxx:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.procfs

for you to fetch changes up to cf30ceccfaec3d2549ff60f7c915625f12dd3a93:

fs: fix ups and tidy ups to /proc/filesystems caching (2026-06-12 14:26:27 +0200)

----------------------------------------------------------------
vfs-7.2-rc1.procfs

Please consider pulling these changes from the signed vfs-7.2-rc1.procfs tag.

Thanks!
Christian

----------------------------------------------------------------
Alexey Dobriyan (1):
proc: allow to mark /proc files permanent outside of fs/proc/

Alexey Gladkov (4):
proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN
proc: prevent reconfiguring subset=pid
proc: handle subset=pid separately in userns visibility checks
docs: proc: add documentation about mount restrictions

Christian Brauner (7):
namespace: record fully visible mounts in list
fs: move SB_I_USERNS_VISIBLE to FS_USERNS_MOUNT_RESTRICTED
fs: RCU-ify filesystems list
sysfs: remove trivial sysfs_get_tree() wrapper
Merge patch series "revamp fs/filesystems.c"
Merge patch series "proc: subset=pid: Relax check of mount visibility"
Merge patch series "proc: protect ptrace_may_access() with exec_update_lock"

Jann Horn (2):
proc: protect ptrace_may_access() with exec_update_lock (part 1)
proc: protect ptrace_may_access() with exec_update_lock (FD links)

Mateusz Guzik (2):
fs: cache the string generated by reading /proc/filesystems
fs: fix ups and tidy ups to /proc/filesystems caching

Documentation/filesystems/proc.rst | 19 ++-
fs/filesystems.c | 330 +++++++++++++++++++++++++------------
fs/mount.h | 4 +
fs/namespace.c | 34 +++-
fs/ocfs2/super.c | 1 -
fs/proc/array.c | 6 +
fs/proc/base.c | 160 ++++++++----------
fs/proc/fd.c | 27 ++-
fs/proc/generic.c | 10 ++
fs/proc/internal.h | 5 +-
fs/proc/namespaces.c | 12 ++
fs/proc/proc_net.c | 8 +
fs/proc/root.c | 24 ++-
fs/sysfs/mount.c | 18 +-
include/linux/fs.h | 3 +-
include/linux/fs/super_types.h | 2 +-
include/linux/proc_fs.h | 13 ++
kernel/acct.c | 2 +-
18 files changed, 429 insertions(+), 249 deletions(-)