Re: [PATCH 08/10] nfsd: fix partial-write detection in nfsd_direct_write
From: Chuck Lever
Date: Fri May 29 2026 - 13:54:55 EST
On 5/29/26 1:01 PM, Jeff Layton wrote:
> On Fri, 2026-05-29 at 12:57 -0400, Chuck Lever wrote:
>>
>> On Thu, May 28, 2026, at 5:55 PM, Jeff Layton wrote:
>>> From: Chris Mason <clm@xxxxxxxx>
>>>
>>> nfsd_direct_write() walks a list of write segments and, after each
>>> vfs_iocb_iter_write(), tries to detect a short write so the loop can
>>> stop before placing the next segment at a wrong file offset:
>>>
>>> host_err = vfs_iocb_iter_write(file, kiocb, &segments[i].iter);
>>> if (host_err < 0)
>>> return host_err;
>>> *cnt += host_err;
>>> if (host_err < segments[i].iter.count)
>>> break; /* partial write */
>>>
>>> vfs_iocb_iter_write() runs the iter through ->write_iter(), which
>>> advances the iter by the number of bytes written. By the time the
>>> check runs, segments[i].iter.count is the residual, not the original
>>> request length:
>>>
>>> before write_iter: iter.count == original_len
>>> after write_iter: iter.count == original_len - host_err
>>>
>>> The condition then reduces to host_err < original_len - host_err, so
>>> the break fires only when less than half of the segment was written.
>>> Any short write completing between 50% and 99% of the segment slips
>>> through; the loop advances to the next segment with kiocb->ki_pos
>>> only bumped by the short amount, writing the next segment's payload
>>> at the wrong offset and over-reporting *cnt to the NFS client.
>>>
>>> Snapshot the segment's byte count before the write and compare
>>> host_err against that snapshot so any short write breaks the loop.
>>>
>>> Fixes: 06c5c97293e3 ("NFSD: Implement NFSD_IO_DIRECT for NFS WRITE")
>>> Assisted-by: kres:claude-opus-4-7
>>> Signed-off-by: Chris Mason <clm@xxxxxxxx>
>>> ---
>>> fs/nfsd/vfs.c | 5 ++++-
>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index 980217f755b7..619f252af4d1 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -1380,6 +1380,7 @@ nfsd_direct_write(struct svc_rqst *rqstp, struct
>>> svc_fh *fhp,
>>> struct file *file = nf->nf_file;
>>> unsigned int nsegs, i;
>>> ssize_t host_err;
>>> + size_t expected;
>>>
>>> nsegs = nfsd_write_dio_iters_init(nf, rqstp->rq_bvec, nvecs,
>>> kiocb, *cnt, segments);
>>> @@ -1401,11 +1402,13 @@ nfsd_direct_write(struct svc_rqst *rqstp,
>>> struct svc_fh *fhp,
>>> kiocb->ki_flags |= IOCB_DONTCACHE;
>>> }
>>>
>>> + expected = iov_iter_count(&segments[i].iter);
>>> +
>>> host_err = vfs_iocb_iter_write(file, kiocb, &segments[i].iter);
>>> if (host_err < 0)
>>> return host_err;
>>> *cnt += host_err;
>>> - if (host_err < segments[i].iter.count)
>>> + if (host_err < (ssize_t)expected)
>>> break; /* partial write */
>>> }
>>>
>>>
>>> --
>>> 2.54.0
>>
>> How many filesystems can return a short write in this case?
>> My impression was that only the NFS client can do that.
>>
>
> No idea right offhand, but NFS is exportable. Since
> vfs_iocb_iter_write() is allowed to return a short write, I think we
> have to deal with that properly here.
NFSD_IO_DIRECT is experimental, and doesn't make sense (to me)
to use with an NFS re-export.
If we can find another filesystem that might return a short write
with NFSD_IO_DIRECT, I might consider this a higher priority.
--
Chuck Lever