Re: [PATCH] fs/namespace: notify pollers of legacy propagation changes

From: Christian Brauner

Date: Fri May 29 2026 - 06:25:44 EST

On Fri, May 29, 2026 at 05:54:41PM +0800, Guopeng Zhang wrote:
> From: Guopeng Zhang <zhangguopeng@xxxxxxxxxx>
>
> Changing mount propagation through the legacy mount API changes
> user-visible mountinfo contents, including the shared: and master:
> optional fields.
>
> The mount_setattr() path already touches the mount namespace after
> change_mnt_propagation(), so pollers of /proc/<pid>/mountinfo are woken
> when the namespace event changes.
>
> The legacy mount --make-* path also changes propagation through
> change_mnt_propagation(), and MOVE_MOUNT_SET_GROUP updates the
> propagation relationship of the target mount. Both paths currently
> return without touching the affected mount namespace.
>
> As a result, userspace polling /proc/<pid>/mountinfo can miss these
> propagation-only changes even though mountinfo has changed.
>
> A simple reproducer that polls /proc/self/mountinfo while changing
> propagation shows the inconsistency.
>
> Before this change:
>
> legacy MS_SHARED: poll ret=0 revents=0x0
> mount_setattr MS_SHARED: poll ret=1 revents=0xa
>
> After this change:
>
> legacy MS_SHARED: poll ret=1 revents=0xa
> mount_setattr MS_SHARED: poll ret=1 revents=0xa
>
> Touch the affected mount namespace after successfully changing
> propagation state in do_change_type() and do_set_group(). Take the
> vfsmount lock for write around touch_mnt_namespace(), as required by
> its locking rules.
>
> Signed-off-by: Guopeng Zhang <zhangguopeng@xxxxxxxxxx>
> ---
> fs/namespace.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 9a66a806a9b8..f871c7bf3bc8 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2908,6 +2908,10 @@ static int do_change_type(const struct path *path, int ms_flags)
> for (m = mnt; m; m = (recurse ? next_mnt(m, mnt) : NULL))
> change_mnt_propagation(m, type);
>
> + lock_mount_hash();
> + touch_mnt_namespace(mnt->mnt_ns);
> + unlock_mount_hash();
> +
> return 0;
> }
>
> @@ -3479,6 +3483,11 @@ static int do_set_group(const struct path *from_path, const struct path *to_path
> list_add(&to->mnt_share, &from->mnt_share);
> set_mnt_shared(to);
> }
> +
> + lock_mount_hash();
> + touch_mnt_namespace(to->mnt_ns);
> + unlock_mount_hash();

Doing this would cause seqcount readers to retry on mount propagation
changes when all of them really only care about mount topology changes.
So this can likely use:

guard(mount_locked_reader)();
touch_mnt_namespace(mnt_ns);

Even today, observing an unchanged seqcount across mnt->mnt_flags reads
doesn't guarantee that it really wasn't changed.