Re: [PATCH 08/10] nfsd: fix partial-write detection in nfsd_direct_write
From: Jeff Layton
Date: Fri May 29 2026 - 13:17:57 EST
On Fri, 2026-05-29 at 12:57 -0400, Chuck Lever wrote:
>
> On Thu, May 28, 2026, at 5:55 PM, Jeff Layton wrote:
> > From: Chris Mason <clm@xxxxxxxx>
> >
> > nfsd_direct_write() walks a list of write segments and, after each
> > vfs_iocb_iter_write(), tries to detect a short write so the loop can
> > stop before placing the next segment at a wrong file offset:
> >
> > host_err = vfs_iocb_iter_write(file, kiocb, &segments[i].iter);
> > if (host_err < 0)
> > return host_err;
> > *cnt += host_err;
> > if (host_err < segments[i].iter.count)
> > break; /* partial write */
> >
> > vfs_iocb_iter_write() runs the iter through ->write_iter(), which
> > advances the iter by the number of bytes written. By the time the
> > check runs, segments[i].iter.count is the residual, not the original
> > request length:
> >
> > before write_iter: iter.count == original_len
> > after write_iter: iter.count == original_len - host_err
> >
> > The condition then reduces to host_err < original_len - host_err, so
> > the break fires only when less than half of the segment was written.
> > Any short write completing between 50% and 99% of the segment slips
> > through; the loop advances to the next segment with kiocb->ki_pos
> > only bumped by the short amount, writing the next segment's payload
> > at the wrong offset and over-reporting *cnt to the NFS client.
> >
> > Snapshot the segment's byte count before the write and compare
> > host_err against that snapshot so any short write breaks the loop.
> >
> > Fixes: 06c5c97293e3 ("NFSD: Implement NFSD_IO_DIRECT for NFS WRITE")
> > Assisted-by: kres:claude-opus-4-7
> > Signed-off-by: Chris Mason <clm@xxxxxxxx>
> > ---
> > fs/nfsd/vfs.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 980217f755b7..619f252af4d1 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1380,6 +1380,7 @@ nfsd_direct_write(struct svc_rqst *rqstp, struct
> > svc_fh *fhp,
> > struct file *file = nf->nf_file;
> > unsigned int nsegs, i;
> > ssize_t host_err;
> > + size_t expected;
> >
> > nsegs = nfsd_write_dio_iters_init(nf, rqstp->rq_bvec, nvecs,
> > kiocb, *cnt, segments);
> > @@ -1401,11 +1402,13 @@ nfsd_direct_write(struct svc_rqst *rqstp,
> > struct svc_fh *fhp,
> > kiocb->ki_flags |= IOCB_DONTCACHE;
> > }
> >
> > + expected = iov_iter_count(&segments[i].iter);
> > +
> > host_err = vfs_iocb_iter_write(file, kiocb, &segments[i].iter);
> > if (host_err < 0)
> > return host_err;
> > *cnt += host_err;
> > - if (host_err < segments[i].iter.count)
> > + if (host_err < (ssize_t)expected)
> > break; /* partial write */
> > }
> >
> >
> > --
> > 2.54.0
>
> How many filesystems can return a short write in this case?
> My impression was that only the NFS client can do that.
>
No idea right offhand, but NFS is exportable. Since
vfs_iocb_iter_write() is allowed to return a short write, I think we
have to deal with that properly here.
--
Jeff Layton <jlayton@xxxxxxxxxx>