Re: [PATCH v5 3/6] kernfs: use VFS negative dentry caching
From: Eric W. Biederman
Date: Mon Jun 07 2021 - 14:27:58 EST
Ian Kent <raven@xxxxxxxxxx> writes:
> If there are many lookups for non-existent paths these negative lookups
> can lead to a lot of overhead during path walks.
>
> The VFS allows dentries to be created as negative and hashed, and caches
> them so they can be used to reduce the fairly high overhead alloc/free
> cycle that occurs during these lookups.
>
> Use the kernfs node parent revision to identify if a change has been
> made to the containing directory so that the negative dentry can be
> discarded and the lookup redone.
>
> Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
> ---
> fs/kernfs/dir.c | 53 +++++++++++++++++++++++++++++++----------------------
> 1 file changed, 31 insertions(+), 22 deletions(-)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index b88432c48851f..5ae95e8d1aea1 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1039,13 +1039,32 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
> if (flags & LOOKUP_RCU)
> return -ECHILD;
>
> - /* Always perform fresh lookup for negatives */
> - if (d_really_is_negative(dentry))
> - goto out_bad_unlocked;
> -
> kn = kernfs_dentry_node(dentry);
> mutex_lock(&kernfs_mutex);
>
> + /* Negative hashed dentry? */
> + if (!kn) {
> + struct dentry *d_parent = dget_parent(dentry);
> + struct kernfs_node *parent;
> +
> + /* If the kernfs parent node has changed discard and
> + * proceed to ->lookup.
> + */
> + parent = kernfs_dentry_node(d_parent);
> + if (parent) {
> + if (kernfs_dir_changed(parent, dentry)) {
> + dput(d_parent);
> + goto out_bad;
> + }
> + }
> + dput(d_parent);
> +
> + /* The kernfs node doesn't exist, leave the dentry
> + * negative and return success.
> + */
> + goto out;
> + }
What part of this new negative hashed dentry check needs the
kernfs_mutex?
I guess it is the reading of kn->dir.rev.
Since all you are doing is comparing if two fields are equal it
really should not matter. Maybe somewhere there needs to be a
sprinkling of primitives like READ_ONCE.
It just seems like such a waste to put all of that under kernfs_mutex
on the off chance kn->dir.rev will change while it is being read.
Eric