Re: [PATCH] 9p: Fix DIO read through netfs
From: Steve French
Date: Mon Aug 19 2024 - 18:36:40 EST
I did some additional testing of this patch after seeing regressions
starting with 6.11-rc4 with xfstests generic/125 and generic/210 and
was able to bisect it to this patch which just went in:
commit e3786b29c54cdae3490b07180a54e2461f42144c
Author: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b2405dd4d4d4..3f3842e7b44a 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct
netfs_io_subrequest *subreq)
goto out;
}
- __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+ if (subreq->rreq->origin != NETFS_DIO_READ)
+ __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
rc = rdata->server->ops->async_readv(rdata);
out:
It is very simple to repro.
I originally noticed it with tests to shares in Azure, but it also
fails locally to Samba with default cifs.ko mount options.
log message is below:
[ 7884.205037] Workqueue: cifsiod smb2_readv_worker [cifs]
[ 7884.205262] RIP: 0010:netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205299] Code: 01 00 00 e8 02 b4 07 df 4c 8b 4c 24 08 49 89 d8
4c 89 e9 41 8b b4 24 d4 01 00 00 44 89 f2 48 c7 c7 40 10 65 c1 e8 30
a9 b6 de <0f> 0b 48 8b 7c 24 18 4c 8d bd c0 00 00 00 e8 2d b5 07 df 48
8b 7c
[ 7884.205305] RSP: 0018:ff1100010705fce8 EFLAGS: 00010286
[ 7884.205312] RAX: dffffc0000000000 RBX: 0000000000001000 RCX: 0000000000000027
[ 7884.205317] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ff110004cb1b1a08
[ 7884.205322] RBP: ff11000119450900 R08: ffffffffa03e346e R09: ffe21c0099636341
[ 7884.205326] R10: ff110004cb1b1a0b R11: 0000000000000001 R12: ff11000137b68a80
[ 7884.205330] R13: 000000000000012c R14: 0000000000000001 R15: ff11000126a96f78
[ 7884.205335] FS: 0000000000000000(0000) GS:ff110004cb180000(0000)
knlGS:0000000000000000
[ 7884.205339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7884.205344] CR2: 00007f0035f0a67c CR3: 000000000f664004 CR4: 0000000000371ef0
[ 7884.205354] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7884.205359] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7884.205363] Call Trace:
[ 7884.205367] <TASK>
[ 7884.205373] ? __warn+0xa4/0x220
[ 7884.205386] ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205423] ? report_bug+0x1d4/0x1e0
[ 7884.205436] ? handle_bug+0x42/0x80
[ 7884.205442] ? exc_invalid_op+0x18/0x50
[ 7884.205449] ? asm_exc_invalid_op+0x1a/0x20
[ 7884.205464] ? irq_work_claim+0x1e/0x40
[ 7884.205475] ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205512] ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205554] process_one_work+0x4cf/0xb80
[ 7884.205573] ? __pfx_lock_acquire+0x10/0x10
[ 7884.205582] ? __pfx_process_one_work+0x10/0x10
[ 7884.205599] ? assign_work+0xd6/0x110
[ 7884.205609] worker_thread+0x2cd/0x550
[ 7884.205622] ? __pfx_worker_thread+0x10/0x10
[ 7884.205632] kthread+0x187/0x1d0
[ 7884.205639] ? __pfx_kthread+0x10/0x10
[ 7884.205648] ret_from_fork+0x34/0x60
[ 7884.205655] ? __pfx_kthread+0x10/0x10
[ 7884.205661] ret_from_fork_asm+0x1a/0x30
[ 7884.205684] </TASK>
[ 7884.205688] irq event stamp: 23635
[ 7884.205692] hardirqs last enabled at (23641): [<ffffffffa022b58b>]
console_unlock+0x15b/0x170
[ 7884.205699] hardirqs last disabled at (23646): [<ffffffffa022b570>]
console_unlock+0x140/0x170
[ 7884.205705] softirqs last enabled at (23402): [<ffffffffa0131a6e>]
__irq_exit_rcu+0xfe/0x120
[ 7884.205712] softirqs last disabled at (23397): [<ffffffffa0131a6e>]
__irq_exit_rcu+0xfe/0x120
[ 7884.205718] ---[ end trace 0000000000000000 ]---
On Fri, Aug 9, 2024 at 10:43 PM Dominique Martinet
<asmadeus@xxxxxxxxxxxxx> wrote:
>
> David Howells wrote on Fri, Aug 09, 2024 at 02:56:09PM +0100:
> > From: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
> >
> > 9p: Fix DIO read through netfs
>
> nitpick: now sure how that ended up here but this is duplicated with the
> subject (the commit message ends up with this line twice)
>
> > If a program is watching a file on a 9p mount, it won't see any change in
> > size if the file being exported by the server is changed directly in the
> > source filesystem, presumably because 9p doesn't have change notifications,
> > and because netfs skips the reads if the file is empty.
> >
> > Fix this by attempting to read the full size specified when a DIO read is
> > requested (such as when 9p is operating in unbuffered mode) and dealing
> > with a short read if the EOF was less than the expected read.
> >
> > To make this work, filesystems using netfslib must not set
> > NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
> > I don't want to mandatorily clear this flag in netfslib for DIO because,
> > say, ceph might make a read from an object that is not completely filled,
> > but does not reside at the end of file - and so we need to clear the
> > excess.
> >
> > This can be tested by watching an empty file over 9p within a VM (such as
> > in the ktest framework):
> >
> > while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo
>
> (This is basically the same thing but if one wants to control the read
> timing for more precise/verbose debugging:
> exec 3< /host/tmp/foo
> read -u 3 content && echo $content
> (repeat as appropriate)
> exec 3>&-
> )
>
> > then writing something into the empty file. The watcher should immediately
> > display the file content and break out of the loop. Without this fix, it
> > remains in the loop indefinitely.
> >
> > Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
> > Written-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
>
> Thanks for adding extra comments & fixing other filesystems.
>
> I've checked this covers all cases of setting NETFS_SREQ_CLEAR_TAIL so
> hopefully shouldn't have further side effects, this sounds good to me:
>
> Signed-off-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
>
> > Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
> > cc: Eric Van Hensbergen <ericvh@xxxxxxxxxx>
> > cc: Latchesar Ionkov <lucho@xxxxxxxxxx>
> > cc: Christian Schoenebeck <linux_oss@xxxxxxxxxxxxx>
> > cc: Marc Dionne <marc.dionne@xxxxxxxxxxxx>
> > cc: Ilya Dryomov <idryomov@xxxxxxxxx>
> > cc: Steve French <sfrench@xxxxxxxxx>
> > cc: Paulo Alcantara <pc@xxxxxxxxxxxxx>
> > cc: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > cc: v9fs@xxxxxxxxxxxxxxx
> > cc: linux-afs@xxxxxxxxxxxxxxxxxxx
> > cc: ceph-devel@xxxxxxxxxxxxxxx
> > cc: linux-cifs@xxxxxxxxxxxxxxx
> > cc: linux-nfs@xxxxxxxxxxxxxxx
> > cc: netfs@xxxxxxxxxxxxxxx
> > cc: linux-fsdevel@xxxxxxxxxxxxxxx
> > ---
> > fs/9p/vfs_addr.c | 3 ++-
> > fs/afs/file.c | 3 ++-
> > fs/ceph/addr.c | 6 ++++--
> > fs/netfs/io.c | 17 +++++++++++------
> > fs/nfs/fscache.c | 3 ++-
> > fs/smb/client/file.c | 3 ++-
> > 6 files changed, 23 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> > index a97ceb105cd8..24fdc74caeba 100644
> > --- a/fs/9p/vfs_addr.c
> > +++ b/fs/9p/vfs_addr.c
> > @@ -75,7 +75,8 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
> >
> > /* if we just extended the file size, any portion not in
> > * cache won't be on server and is zeroes */
> > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > + if (subreq->rreq->origin != NETFS_DIO_READ)
> > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >
> > netfs_subreq_terminated(subreq, err ?: total, false);
> > }
> > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > index c3f0c45ae9a9..ec1be0091fdb 100644
> > --- a/fs/afs/file.c
> > +++ b/fs/afs/file.c
> > @@ -242,7 +242,8 @@ static void afs_fetch_data_notify(struct afs_operation *op)
> >
> > req->error = error;
> > if (subreq) {
> > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > + if (subreq->rreq->origin != NETFS_DIO_READ)
> > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > netfs_subreq_terminated(subreq, error ?: req->actual_len, false);
> > req->subreq = NULL;
> > } else if (req->done) {
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index cc0a2240de98..c4744a02db75 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -246,7 +246,8 @@ static void finish_netfs_read(struct ceph_osd_request *req)
> > if (err >= 0) {
> > if (sparse && err > 0)
> > err = ceph_sparse_ext_map_end(op);
> > - if (err < subreq->len)
> > + if (err < subreq->len &&
> > + subreq->rreq->origin != NETFS_DIO_READ)
> > __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > if (IS_ENCRYPTED(inode) && err > 0) {
> > err = ceph_fscrypt_decrypt_extents(inode,
> > @@ -282,7 +283,8 @@ static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq)
> > size_t len;
> > int mode;
> >
> > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > + if (rreq->origin != NETFS_DIO_READ)
> > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
> >
> > if (subreq->start >= inode->i_size)
> > diff --git a/fs/netfs/io.c b/fs/netfs/io.c
> > index c179a1c73fa7..5367caf3fa28 100644
> > --- a/fs/netfs/io.c
> > +++ b/fs/netfs/io.c
> > @@ -530,7 +530,8 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
> >
> > if (transferred_or_error == 0) {
> > if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
> > - subreq->error = -ENODATA;
> > + if (rreq->origin != NETFS_DIO_READ)
> > + subreq->error = -ENODATA;
> > goto failed;
> > }
> > } else {
> > @@ -601,9 +602,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
> > }
> > if (subreq->len > ictx->zero_point - subreq->start)
> > subreq->len = ictx->zero_point - subreq->start;
> > +
> > + /* We limit buffered reads to the EOF, but let the
> > + * server deal with larger-than-EOF DIO/unbuffered
> > + * reads.
> > + */
> > + if (subreq->len > rreq->i_size - subreq->start)
> > + subreq->len = rreq->i_size - subreq->start;
> > }
> > - if (subreq->len > rreq->i_size - subreq->start)
> > - subreq->len = rreq->i_size - subreq->start;
> > if (rreq->rsize && subreq->len > rreq->rsize)
> > subreq->len = rreq->rsize;
> >
> > @@ -739,11 +745,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
> > do {
> > _debug("submit %llx + %llx >= %llx",
> > rreq->start, rreq->submitted, rreq->i_size);
> > - if (rreq->origin == NETFS_DIO_READ &&
> > - rreq->start + rreq->submitted >= rreq->i_size)
> > - break;
> > if (!netfs_rreq_submit_slice(rreq, &io_iter))
> > break;
> > + if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags))
> > + break;
> > if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
> > test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
> > break;
> > diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> > index bf29a65c5027..7a558dea75c4 100644
> > --- a/fs/nfs/fscache.c
> > +++ b/fs/nfs/fscache.c
> > @@ -363,7 +363,8 @@ void nfs_netfs_read_completion(struct nfs_pgio_header *hdr)
> > return;
> >
> > sreq = netfs->sreq;
> > - if (test_bit(NFS_IOHDR_EOF, &hdr->flags))
> > + if (test_bit(NFS_IOHDR_EOF, &hdr->flags) &&
> > + sreq->rreq->origin != NETFS_DIO_READ)
> > __set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
> >
> > if (hdr->error)
> > diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
> > index b2405dd4d4d4..3f3842e7b44a 100644
> > --- a/fs/smb/client/file.c
> > +++ b/fs/smb/client/file.c
> > @@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq)
> > goto out;
> > }
> >
> > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > + if (subreq->rreq->origin != NETFS_DIO_READ)
> > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >
> > rc = rdata->server->ops->async_readv(rdata);
> > out:
> >
>
> --
> Dominique Martinet | Asmadeus
>
--
Thanks,
Steve