Re: [PATCH] fs: Teach path_connected to handle nfs filesystems with multiple roots.
From: Al Viro
Date: Thu Mar 15 2018 - 18:34:57 EST
On Wed, Mar 14, 2018 at 06:20:29PM -0500, Eric W. Biederman wrote:
>
> On nfsv2 and nfsv3 the nfs server can export subsets of the same
> filesystem and report the same filesystem identifier, so that the nfs
> client can know they are the same filesystem. The subsets can be from
> disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no
> way to find the common root of all directory trees exported form the
> server with the same filesystem identifier.
>
> The practical result is that in struct super s_root for nfs s_root is
> not necessarily the root of the filesystem. The nfs mount code sets
> s_root to the root of the first subset of the nfs filesystem that the
> kernel mounts.
>
> This effects the dcache invalidation code in generic_shutdown_super
> currently called shrunk_dcache_for_umount and that code for years
> has gone through an additional list of dentries that might be dentry
> trees that need to be freed to accomodate nfs.
>
> When I wrote path_connected I did not realize nfs was so special, and
> it's hueristic for avoiding calling is_subdir can fail.
>
> The practical case where this fails is when there is a move of a
> directory from the subtree exposed by one nfs mount to the subtree
> exposed by another nfs mount. This move can happen either locally or
> remotely. With the remote case requiring that the move directory be cached
> before the move and that after the move someone walks the path
> to where the move directory now exists and in so doing causes the
> already cached directory to be moved in the dcache through the magic
> of d_splice_alias.
>
> If someone whose working directory is in the move directory or a
> subdirectory and now starts calling .. from the initial mount of nfs
> (where s_root == mnt_root), then path_connected as a heuristic will
> not bother with the is_subdir check. As s_root really is not the root
> of the nfs filesystem this heuristic is wrong, and the path may
> actually not be connected and path_connected can fail.
>
> The is_subdir function might be cheap enough that we can call it
> unconditionally. Verifying that will take some benchmarking and
> the result may not be the same on all kernels this fix needs
> to be backported to. So I am avoiding that for now.
>
> Filesystems with snapshots such as nilfs and btrfs do something
> similar. But as the directory tree of the snapshots are disjoint
> from one another and from the main directory tree rename won't move
> things between them and this problem will not occur.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Reported-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> ---
>
> Al do you want to push this one to Linus or shall I?
Applied; I think there might be a helper lurking in there, but for now
that'll do.