Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

From: Oleg Drokin
Date: Tue Jul 05 2016 - 02:24:22 EST

Next message: Minchan Kim: "Re: [PATCH 12/31] mm, vmscan: make shrink_node decisions more node-centric"
Previous message: Aaron Lu: "Re: [RFC RESEND PATCH] swap: choose swap device according to numa node"
In reply to: Oleg Drokin: "Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes."
Next in thread: Al Viro: "Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Jul 3, 2016, at 2:29 AM, Al Viro wrote:

> On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
>
>> Sorry to nag you about this, but did any of those pan out?
>>
>> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
>> a dentry already (though a potentially shared one, I understand).
>> Would not it be better to try and establish some dentry locking rule for calling into
>> d_splice_alias() instead? At least then the callers can make sure the dentry does
>> not change under them?
>> Though I guess if there's dentry locking like that, we might as well do all the
>> checking in d_splice_alias(), but that means the unhashed dentries would no
>> longer be disallowed which is a change of semantic from now.--
>
> FWIW, the only interesting case here is this:
> * no O_CREAT in flags (otherwise the parent is held exclusive).
> * dentry is found in hash
> * dentry is negative
> * dentry has passed ->d_revalidate() (i.e. in case of
> NFS it had nfs_neg_need_reval() return false).
>
> Only two instances are non-trivial in that respect - NFS and Lustre.
> Everything else will simply fail open() with ENOENT in that case.
>
> And at least for NFS we could bloody well do d_drop + d_alloc_parallel +
> finish_no_open and bugger off in case it's not in_lookup, otherwise do
> pretty much what we do in case we'd got in_lookup from the very beginning.
> Some adjustments are needed for that case (basically, we need to make
> sure we hit d_lookup_done() matching that d_alloc_parallel() and deal
> with refcounting correctly).
>
> Tentative NFS patch follows; I don't understand Lustre well enough, but it
> looks like a plausible strategy there as well.

I think I know why this does not work or at least part of the reason.

> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index d8015a03..5474e39 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> struct file *file, unsigned open_flags,
> umode_t mode, int *opened)
> {
> + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
> struct nfs_open_context *ctx;
> struct dentry *res;
> struct iattr attr = { .ia_valid = ATTR_OPEN };
> struct inode *inode;
> unsigned int lookup_flags = 0;
> + bool switched = false;
> int err;
>
> /* Expect a negative dentry */
> @@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> attr.ia_size = 0;
> }
>
> + if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {

So we come racing here from multiple threads (say 3 or more - we have seen this
in the older crash reports, so totally possible)

> + d_drop(dentry);

One lucky one does this first before the others perform the !d_unhashed check above.
This makes the other ones to not enter here.

And we are back to the original problem of multiple threads trying to instantiate
same dentry as before.

Only this time the race is somehow even wider because of some other interactions
I don't really understand, at least removing this patch I feel like I am having
harder time hitting the original issue.

> + switched = true;
> + dentry = d_alloc_parallel(dentry->d_parent,
> + &dentry->d_name, &wq);
> + if (IS_ERR(dentry))
> + return PTR_ERR(dentry);
> + if (unlikely(!d_in_lookup(dentry)))
> + return finish_no_open(file, dentry);
> + }
> +
> ctx = create_nfs_open_context(dentry, open_flags);
> err = PTR_ERR(ctx);
> if (IS_ERR(ctx))
> @@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> put_nfs_open_context(ctx);
> out:
> + if (unlikely(switched)) {
> + d_lookup_done(dentry);
> + dput(dentry);
> + }
> return err;
>
> no_open:
> res = nfs_lookup(dir, dentry, lookup_flags);
> - err = PTR_ERR(res);
> + if (switched) {
> + d_lookup_done(dentry);
> + if (!res)
> + res = dentry;
> + else
> + dput(dentry);
> + }
> if (IS_ERR(res))
> - goto out;
> -
> + return PTR_ERR(res);
> return finish_no_open(file, res);
> }
> EXPORT_SYMBOL_GPL(nfs_atomic_open);

Next message: Minchan Kim: "Re: [PATCH 12/31] mm, vmscan: make shrink_node decisions more node-centric"
Previous message: Aaron Lu: "Re: [RFC RESEND PATCH] swap: choose swap device according to numa node"
In reply to: Oleg Drokin: "Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes."
Next in thread: Al Viro: "Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]