Re: [GIT PULL 01/12 for v6.18] misc

From: Christian Brauner

Date: Mon Sep 29 2025 - 05:47:27 EST


On Fri, Sep 26, 2025 at 04:18:55PM +0200, Christian Brauner wrote:
> Hey Linus,
>
> /* Summary */
> This contains the usual selections of misc updates for this cycle.
>
> Features:
>
> - Add "initramfs_options" parameter to set initramfs mount options. This
> allows to add specific mount options to the rootfs to e.g., limit the
> memory size.
>
> - Add RWF_NOSIGNAL flag for pwritev2()
>
> Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
> signal from being raised when writing on disconnected pipes or
> sockets. The flag is handled directly by the pipe filesystem and
> converted to the existing MSG_NOSIGNAL flag for sockets.
>
> - Allow to pass pid namespace as procfs mount option
>
> Ever since the introduction of pid namespaces, procfs has had very
> implicit behaviour surrounding them (the pidns used by a procfs mount
> is auto-selected based on the mounting process's active pidns, and the
> pidns itself is basically hidden once the mount has been constructed).
>
> This implicit behaviour has historically meant that userspace was
> required to do some special dances in order to configure the pidns of
> a procfs mount as desired. Examples include:
>
> * In order to bypass the mnt_too_revealing() check, Kubernetes creates
> a procfs mount from an empty pidns so that user namespaced
> containers can be nested (without this, the nested containers would
> fail to mount procfs). But this requires forking off a helper
> process because you cannot just one-shot this using mount(2).
>
> * Container runtimes in general need to fork into a container before
> configuring its mounts, which can lead to security issues in the
> case of shared-pidns containers (a privileged process in the pidns
> can interact with your container runtime process).
> While SUID_DUMP_DISABLE and user namespaces make this less of an
> issue, the strict need for this due to a minor uAPI wart is kind of
> unfortunate.
>
> Things would be much easier if there was a way for userspace to just
> specify the pidns they want. So this pull request contains changes
> to implement a new "pidns" argument which can be set using
> fsconfig(2):
>
> fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
> fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
>
> or classic mount(2) / mount(8):
>
> // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
> mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
>
> Cleanups:
>
> - Remove the last references to EXPORT_OP_ASYNC_LOCK.
>
> - Make file_remove_privs_flags() static.
>
> - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used.
>
> - Use try_cmpxchg() in start_dir_add().
>
> - Use try_cmpxchg() in sb_init_done_wq().
>
> - Replace offsetof() with struct_size() in ioctl_file_dedupe_range().
>
> - Remove vfs_ioctl() export.
>
> - Replace rwlock() with spinlock in epoll code as rwlock causes priority
> inversion on preempt rt kernels.
>
> - Make ns_entries in fs/proc/namespaces const.
>
> - Use a switch() statement() in init_special_inode() just like we do in
> may_open().
>
> - Use struct_size() in dir_add() in the initramfs code.
>
> - Use str_plural() in rd_load_image().
>
> - Replace strcpy() with strscpy() in find_link().
>
> - Rename generic_delete_inode() to inode_just_drop() and
> generic_drop_inode() to inode_generic_drop().
>
> - Remove unused arguments from fcntl_{g,s}et_rw_hint().
>
> Fixes:
>
> - Document @name parameter for name_contains_dotdot() helper.
>
> - Fix spelling mistake.
>
> - Always return zero from replace_fd() instead of the file descriptor number.
>
> - Limit the size for copy_file_range() in compat mode to prevent a signed
> overflow.
>
> - Fix debugfs mount options not being applied.
>
> - Verify the inode mode when loading it from disk in minixfs.
>
> - Verify the inode mode when loading it from disk in cramfs.
>
> - Don't trigger automounts with RESOLVE_NO_XDEV
>
> If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
> through automounts, but could still trigger them.
>
> - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in tracepoints.
>
> - Fix unused variable warning in rd_load_image() on s390.
>
> - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD.
>
> - Use ns_capable_noaudit() when determining net sysctl permissions.
>
> - Don't call path_put() under namespace semaphore in listmount() and statmount().
>
> /* Testing */
>
> gcc (Debian 14.2.0-19) 14.2.0
> Debian clang version 19.1.7 (3+b1)
>
> No build failures or warnings were observed.
>
> /* Conflicts */

There is one issue that was reported after I had generated the pull
request. The mnt_ns_release() function can be passed a NULL pointer and
that case needs to be handled.

I'm appending a patch that I would ask you to please just apply on top
of it. If you rather want me resend the pull request please just tell
me!
>From 9f11a1a5cab7e70bdb31077e475ab15d86d03682 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@xxxxxxxxxx>
Date: Mon, 29 Sep 2025 11:41:16 +0200
Subject: [PATCH] mount: handle NULL values in mnt_ns_release()

When calling in listmount() mnt_ns_release() may be passed a NULL
pointer. Handle that case gracefully.

Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
---
fs/namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6686c9f54b40..8db446cd7f4a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -180,7 +180,7 @@ static void mnt_ns_tree_add(struct mnt_namespace *ns)
static void mnt_ns_release(struct mnt_namespace *ns)
{
/* keep alive for {list,stat}mount() */
- if (refcount_dec_and_test(&ns->passive)) {
+ if (ns && refcount_dec_and_test(&ns->passive)) {
fsnotify_mntns_delete(ns);
put_user_ns(ns->user_ns);
kfree(ns);
--
2.47.3