Re: [PATCH 2/2] kernfs: remove mutex in kernfs_dop_revalidate

From: Tejun Heo
Date: Wed Dec 02 2020 - 13:47:21 EST


Hello,

On Wed, Dec 02, 2020 at 10:58:37PM +0800, Fox Chen wrote:
> There is a big mutex in kernfs_dop_revalidate which slows down the
> concurrent performance of kernfs.
>
> Since kernfs_dop_revalidate only does some checks, the lock is
> largely unnecessary. Also, according to kernel filesystem locking
> document:
> https://www.kernel.org/doc/html/latest/filesystems/locking.html
> locking is not in the protocal for d_revalidate operation.

That's just describing the rules seen from vfs side. It doesn't say anything
about locking rules internal to each file system implementation.

> This patch remove this mutex from
> kernfs_dop_revalidate, so kernfs_dop_revalidate
> can run concurrently.
>
> Signed-off-by: Fox Chen <foxhlchen@xxxxxxxxx>
> ---
> fs/kernfs/dir.c | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 9aec80b9d7c6..c2267c93f546 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -26,7 +26,6 @@ static DEFINE_SPINLOCK(kernfs_idr_lock); /* root->ino_idr */
>
> static bool kernfs_active(struct kernfs_node *kn)
> {
> - lockdep_assert_held(&kernfs_mutex);
> return atomic_read(&kn->active) >= 0;
> }
>
> @@ -557,10 +556,9 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
>
> /* Always perform fresh lookup for negatives */
> if (d_really_is_negative(dentry))
> - goto out_bad_unlocked;
> + goto out_bad;
>
> kn = kernfs_dentry_node(dentry);
> - mutex_lock(&kernfs_mutex);
>
> /* The kernfs node has been deactivated */
> if (!kernfs_active(kn))
> @@ -579,11 +577,8 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
> kernfs_info(dentry->d_sb)->ns != kn->ns)
> goto out_bad;
>
> - mutex_unlock(&kernfs_mutex);
> return 1;
> out_bad:
> - mutex_unlock(&kernfs_mutex);
> -out_bad_unlocked:
> return 0;
> }

I don't see how this can be safe. Nothing even protects the dentry from
turning negative in the middle and it may end up trying to deref NULL. I'm
sure we can make this not need kernfs_mutex but that'd have to be a lot more
careful.

Thanks.

--
tejun