Re: [PATCH 08/10] nfsd: fix partial-write detection in nfsd_direct_write
From: Chuck Lever
Date: Fri May 29 2026 - 13:34:02 EST
On Thu, May 28, 2026, at 5:55 PM, Jeff Layton wrote:
> From: Chris Mason <clm@xxxxxxxx>
>
> nfsd_direct_write() walks a list of write segments and, after each
> vfs_iocb_iter_write(), tries to detect a short write so the loop can
> stop before placing the next segment at a wrong file offset:
>
> host_err = vfs_iocb_iter_write(file, kiocb, &segments[i].iter);
> if (host_err < 0)
> return host_err;
> *cnt += host_err;
> if (host_err < segments[i].iter.count)
> break; /* partial write */
>
> vfs_iocb_iter_write() runs the iter through ->write_iter(), which
> advances the iter by the number of bytes written. By the time the
> check runs, segments[i].iter.count is the residual, not the original
> request length:
>
> before write_iter: iter.count == original_len
> after write_iter: iter.count == original_len - host_err
>
> The condition then reduces to host_err < original_len - host_err, so
> the break fires only when less than half of the segment was written.
> Any short write completing between 50% and 99% of the segment slips
> through; the loop advances to the next segment with kiocb->ki_pos
> only bumped by the short amount, writing the next segment's payload
> at the wrong offset and over-reporting *cnt to the NFS client.
>
> Snapshot the segment's byte count before the write and compare
> host_err against that snapshot so any short write breaks the loop.
>
> Fixes: 06c5c97293e3 ("NFSD: Implement NFSD_IO_DIRECT for NFS WRITE")
> Assisted-by: kres:claude-opus-4-7
> Signed-off-by: Chris Mason <clm@xxxxxxxx>
> ---
> fs/nfsd/vfs.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 980217f755b7..619f252af4d1 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1380,6 +1380,7 @@ nfsd_direct_write(struct svc_rqst *rqstp, struct
> svc_fh *fhp,
> struct file *file = nf->nf_file;
> unsigned int nsegs, i;
> ssize_t host_err;
> + size_t expected;
>
> nsegs = nfsd_write_dio_iters_init(nf, rqstp->rq_bvec, nvecs,
> kiocb, *cnt, segments);
> @@ -1401,11 +1402,13 @@ nfsd_direct_write(struct svc_rqst *rqstp,
> struct svc_fh *fhp,
> kiocb->ki_flags |= IOCB_DONTCACHE;
> }
>
> + expected = iov_iter_count(&segments[i].iter);
> +
> host_err = vfs_iocb_iter_write(file, kiocb, &segments[i].iter);
> if (host_err < 0)
> return host_err;
> *cnt += host_err;
> - if (host_err < segments[i].iter.count)
> + if (host_err < (ssize_t)expected)
> break; /* partial write */
> }
>
>
> --
> 2.54.0
How many filesystems can return a short write in this case?
My impression was that only the NFS client can do that.
--
Chuck Lever