Re: [PATCH] afs: Fix speculative status fetches

From: Marc Dionne
Date: Thu Apr 29 2021 - 08:03:37 EST


On Wed, Apr 28, 2021 at 9:36 AM David Howells <dhowells@xxxxxxxxxx> wrote:
>
> The generic/464 xfstest causes kAFS to emit occasional warnings of the
> form:
>
> kAFS: vnode modified {100055:8a} 30->31 YFS.StoreData64 (c=6015)
>
> This indicates that the data version received back from the server did not
> match the expected value (the DV should be incremented monotonically for
> each individual modification op committed to a vnode).
>
> What is happening is that a lookup call is doing a bulk status fetch
> speculatively on a bunch of vnodes in a directory besides getting the
> status of the vnode it's actually interested in. This is racing with a
> StoreData operation (though it could also occur with, say, a MakeDir op).
>
> On the client, a modification operation locks the vnode, but the bulk
> status fetch only locks the parent directory, so no ordering is imposed
> there (thereby avoiding an avenue to deadlock).
>
> On the server, the StoreData op handler doesn't lock the vnode until it's
> received all the request data, and downgrades the lock after committing the
> data until it has finished sending change notifications to other clients -
> which allows the status fetch to occur before it has finished.
>
> This means that:
>
> - a status fetch can access the target vnode either side of the exclusive
> section of the modification
>
> - the status fetch could start before the modification, yet finish after,
> and vice-versa.
>
> - the status fetch and the modification RPCs can complete in either order.
>
> - the status fetch can return either the before or the after DV from the
> modification.
>
> - the status fetch might regress the locally cached DV.
>
> Some of these are handled by the previous fix[1], but that's not sufficient
> because it checks the DV it received against the DV it cached at the start
> of the op, but the DV might've been updated in the meantime by a locally
> generated modification op.
>
> Fix this by the following means:
>
> (1) Keep track of when we're performing a modification operation on a
> vnode. This is done by marking vnode parameters with a 'modification'
> note that causes the AFS_VNODE_MODIFYING flag to be set on the vnode
> for the duration.
>
> (2) Altering the speculation race detection to ignore speculative status
> fetches if either the vnode is marked as being modified or the data
> version number is not what we expected.
>
> Note that whilst the "vnode modified" warning does get recovered from as it
> causes the client to refetch the status at the next opportunity, it will
> also invalidate the pagecache, so changes might get lost.
>
> Fixes: a9e5c87ca744 ("afs: Fix speculative status fetch going out of order wrt to modifications")
> Reported-by: Marc Dionne <marc.dionne@xxxxxxxxxxxx>
> Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
> cc: linux-afs@xxxxxxxxxxxxxxxxxxx
> Link: https://lore.kernel.org/r/160605082531.252452.14708077925602709042.stgit@xxxxxxxxxxxxxxxxxxxxxx/ [1]
> ---
>
> fs/afs/dir.c | 7 +++++++
> fs/afs/dir_silly.c | 3 +++
> fs/afs/fs_operation.c | 6 ++++++
> fs/afs/inode.c | 6 ++++--
> fs/afs/internal.h | 2 ++
> fs/afs/write.c | 1 +
> 6 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/fs/afs/dir.c b/fs/afs/dir.c
> index 117df15e5367..9fbe5a5ec9bd 100644
> --- a/fs/afs/dir.c
> +++ b/fs/afs/dir.c
> @@ -1419,6 +1419,7 @@ static int afs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
>
> afs_op_set_vnode(op, 0, dvnode);
> op->file[0].dv_delta = 1;
> + op->file[0].modification = true;
> op->file[0].update_ctime = true;
> op->dentry = dentry;
> op->create.mode = S_IFDIR | mode;
> @@ -1500,6 +1501,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry)
>
> afs_op_set_vnode(op, 0, dvnode);
> op->file[0].dv_delta = 1;
> + op->file[0].modification = true;
> op->file[0].update_ctime = true;
>
> op->dentry = dentry;
> @@ -1636,6 +1638,7 @@ static int afs_unlink(struct inode *dir, struct dentry *dentry)
>
> afs_op_set_vnode(op, 0, dvnode);
> op->file[0].dv_delta = 1;
> + op->file[0].modification = true;
> op->file[0].update_ctime = true;
>
> /* Try to make sure we have a callback promise on the victim. */
> @@ -1718,6 +1721,7 @@ static int afs_create(struct user_namespace *mnt_userns, struct inode *dir,
>
> afs_op_set_vnode(op, 0, dvnode);
> op->file[0].dv_delta = 1;
> + op->file[0].modification = true;
> op->file[0].update_ctime = true;
>
> op->dentry = dentry;
> @@ -1792,6 +1796,7 @@ static int afs_link(struct dentry *from, struct inode *dir,
> afs_op_set_vnode(op, 0, dvnode);
> afs_op_set_vnode(op, 1, vnode);
> op->file[0].dv_delta = 1;
> + op->file[0].modification = true;
> op->file[0].update_ctime = true;
> op->file[1].update_ctime = true;
>
> @@ -1987,6 +1992,8 @@ static int afs_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
> afs_op_set_vnode(op, 1, new_dvnode); /* May be same as orig_dvnode */
> op->file[0].dv_delta = 1;
> op->file[1].dv_delta = 1;
> + op->file[0].modification = true;
> + op->file[1].modification = true;
> op->file[0].update_ctime = true;
> op->file[1].update_ctime = true;
>
> diff --git a/fs/afs/dir_silly.c b/fs/afs/dir_silly.c
> index 04f75a44f243..dae9a57d7ec0 100644
> --- a/fs/afs/dir_silly.c
> +++ b/fs/afs/dir_silly.c
> @@ -73,6 +73,8 @@ static int afs_do_silly_rename(struct afs_vnode *dvnode, struct afs_vnode *vnode
> afs_op_set_vnode(op, 1, dvnode);
> op->file[0].dv_delta = 1;
> op->file[1].dv_delta = 1;
> + op->file[0].modification = true;
> + op->file[1].modification = true;
> op->file[0].update_ctime = true;
> op->file[1].update_ctime = true;
>
> @@ -201,6 +203,7 @@ static int afs_do_silly_unlink(struct afs_vnode *dvnode, struct afs_vnode *vnode
> afs_op_set_vnode(op, 0, dvnode);
> afs_op_set_vnode(op, 1, vnode);
> op->file[0].dv_delta = 1;
> + op->file[0].modification = true;
> op->file[0].update_ctime = true;
> op->file[1].op_unlinked = true;
> op->file[1].update_ctime = true;
> diff --git a/fs/afs/fs_operation.c b/fs/afs/fs_operation.c
> index 2cb0951acca6..d222dfbe976b 100644
> --- a/fs/afs/fs_operation.c
> +++ b/fs/afs/fs_operation.c
> @@ -118,6 +118,8 @@ static void afs_prepare_vnode(struct afs_operation *op, struct afs_vnode_param *
> vp->cb_break_before = afs_calc_vnode_cb_break(vnode);
> if (vnode->lock_state != AFS_VNODE_LOCK_NONE)
> op->flags |= AFS_OPERATION_CUR_ONLY;
> + if (vp->modification)
> + set_bit(AFS_VNODE_MODIFYING, &vnode->flags);
> }
>
> if (vp->fid.vnode)
> @@ -225,6 +227,10 @@ int afs_put_operation(struct afs_operation *op)
>
> if (op->ops && op->ops->put)
> op->ops->put(op);
> + if (op->file[0].modification)
> + clear_bit(AFS_VNODE_MODIFYING, &op->file[0].vnode->flags);
> + if (op->file[1].modification && op->file[1].vnode != op->file[0].vnode)
> + clear_bit(AFS_VNODE_MODIFYING, &op->file[1].vnode->flags);
> if (op->file[0].put_vnode)
> iput(&op->file[0].vnode->vfs_inode);
> if (op->file[1].put_vnode)
> diff --git a/fs/afs/inode.c b/fs/afs/inode.c
> index 3a129b9fd9b8..80b6c8d967d5 100644
> --- a/fs/afs/inode.c
> +++ b/fs/afs/inode.c
> @@ -294,8 +294,9 @@ void afs_vnode_commit_status(struct afs_operation *op, struct afs_vnode_param *v
> op->flags &= ~AFS_OPERATION_DIR_CONFLICT;
> }
> } else if (vp->scb.have_status) {
> - if (vp->dv_before + vp->dv_delta != vp->scb.status.data_version &&
> - vp->speculative)
> + if (vp->speculative &&
> + (test_bit(AFS_VNODE_MODIFYING, &vnode->flags) ||
> + vp->dv_before != vnode->status.data_version))
> /* Ignore the result of a speculative bulk status fetch
> * if it splits around a modification op, thereby
> * appearing to regress the data version.
> @@ -911,6 +912,7 @@ int afs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> }
> op->ctime = attr->ia_ctime;
> op->file[0].update_ctime = 1;
> + op->file[0].modification = true;
>
> op->ops = &afs_setattr_operation;
> ret = afs_do_sync_operation(op);
> diff --git a/fs/afs/internal.h b/fs/afs/internal.h
> index 52157a05796a..5ed416f4ff33 100644
> --- a/fs/afs/internal.h
> +++ b/fs/afs/internal.h
> @@ -645,6 +645,7 @@ struct afs_vnode {
> #define AFS_VNODE_PSEUDODIR 7 /* set if Vnode is a pseudo directory */
> #define AFS_VNODE_NEW_CONTENT 8 /* Set if file has new content (create/trunc-0) */
> #define AFS_VNODE_SILLY_DELETED 9 /* Set if file has been silly-deleted */
> +#define AFS_VNODE_MODIFYING 10 /* Set if we're performing a modification op */
>
> struct list_head wb_keys; /* List of keys available for writeback */
> struct list_head pending_locks; /* locks waiting to be granted */
> @@ -762,6 +763,7 @@ struct afs_vnode_param {
> bool set_size:1; /* Must update i_size */
> bool op_unlinked:1; /* True if file was unlinked by op */
> bool speculative:1; /* T if speculative status fetch (no vnode lock) */
> + bool modification:1; /* Set if the content gets modified */
> };
>
> /*
> diff --git a/fs/afs/write.c b/fs/afs/write.c
> index dc66ff15dd16..3edb6204b937 100644
> --- a/fs/afs/write.c
> +++ b/fs/afs/write.c
> @@ -377,6 +377,7 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t
>
> afs_op_set_vnode(op, 0, vnode);
> op->file[0].dv_delta = 1;
> + op->file[0].modification = true;
> op->store.write_iter = iter;
> op->store.pos = pos;
> op->store.size = size;
>
>
>
> _______________________________________________
> linux-afs mailing list
> http://lists.infradead.org/mailman/listinfo/linux-afs

Also verified that it gets rid of the warnings while running the
generic/464, so you can add:

Tested-by: Marc Dionne <marc.dionne@xxxxxxxxxxxx>

Marc