Re: [DISCUSSION] Preventing ENOSPC/EDQUOT writeback errors on network filesystems

From: Piyush Sachdeva

Date: Wed May 13 2026 - 09:45:02 EST


Jeff Layton <jlayton@xxxxxxxxxx> writes:

> On Tue, 2026-05-05 at 11:41 +0530, Piyush Sachdeva wrote:
>> Hi,
>> There have been plenty of discussions on how to handle writeback errors for
>> network filesystems, but most have focused on error reporting after the fact.
>> I'd like to start a discussion around preventing writeback errors specifically
>> ENOSPC and EDQUOT, before they cause silent data loss.
>>
>> The problem:
>> With buffered writes on network filesystems (cifs, nfs, etc.), the write()
>> syscall copies data into the page cache and returns success immediately. The
>> actual upload to the server happens later during writeback. If the server is
>> out of space at that point, the write fails with ENOSPC. The netfs/writeback
>> layer records this error via mapping_set_error(), but critically the folio's
>> writeback flag is cleared and the page is now clean. Under memory pressure, the
>> VM can reclaim these clean pages, permanently losing data that the application
>> believes was successfully written. Meanwhile, i_size has already been updated
>> to reflect the new file size. So stat() shows a file size inclusive of the data
>> that was never persisted. Another inconsistency here is that total free space
>> hasn't been modified for the file system on the server, leading to incorrect
>> values in statfs() output from the client's pov (assuming statfs() calls go
>> to the server).
>> To illustrate with real-world scenarios:
>>
>> - A user or application can keep issuing writes to an fd well beyond the
>> available space, since buffered writes return success as soon as data is
>> copied to the page cache. A significant amount of data, exceeding the
>> available quota can accumulate before fsync() is called, at which point
>> critical data loss is nearly certain.
>>
>> - A malicious user can exploit this to keep resources pinned and memory
>> oversubscribed, impacting other applications.
>>
>> The error is technically observable: fsync() will return it, and close()
>> surfaces it through the flush callback. But in practice, many applications
>> check neither, and the POSIX "just call fsync()" answer isn't satisfying for
>> users who lose data silently.
>>
>
> Yet, it is the only real answer we have.
>
> This is just a fundamental issue with buffered writes and delayed
> writeback. Either you flush the data to stable storage now, or you have
> to do it later. If you do it later, then it can still fail for all
> sorts of reasons.
>
>> Local filesystems largely avoid this because they can check available space
>> synchronously in write_begin() and fail the write() syscall directly. Network
>> filesystems can't do this cheaply — a round-trip per write to check server
>> space would negate the benefits of buffered I/O.
>>
>> Through recent development, netfs is becoming a central layer for network
>> filesystem I/O. It already has retry logic for transient failures (EAGAIN,
>> ECONNABORTED), but ENOSPC/EDQUOT remain hard failures. This affects every
>> network filesystem using buffered writes.
>>
>> I am curious to know if NFS has a solution to this and what the approach is
>> towards this specific problem by NFS community?
>>
>> This problem is worth solving for all network filesystems. I have a few
>> thoughts on approaches, combining cached statfs() output with
>> fallocate()-style pre-allocation on the write path:
>>
>> 1. Pre-allocate space on the server before writing to the page cache,
>> analogous to fallocate() on the write path. This guarantees server-side
>> space for page cache data.
>>
>> 2. Since per-write fallocate() calls require a server round-trip, effectively
>> negating the benefit of buffered I/O. Use cached statfs() output to gate
>> when pre-allocation is triggered. For example, once free space drops below
>> 20% of total space, enable fallocate() on the write path. Otherwise, let
>> writes proceed as normal.
>>
>> 3. Handle refresh and synchronization of the cached statfs() data separately
>> to avoid staleness.
>>
>> I'd appreciate feedback from the community on viable approaches.
>
> NFSv4.2 does have an ALLOCATE operation:
>
> https://datatracker.ietf.org/doc/html/rfc7862#section-15.1
>
> ...and such an operation could (in principle) precede WRITE in a
> compound, but that doesn't really help you. By the time we're issuing
> RPCs to the server, the client application has already finished its
> writes and moved on.
>
> For applications that want to avoid ENOSPC/EDQUOT, the best thing they
> could do is call fallocate() themselves to ensure that the space
> exists. With a sufficiently recent NFS client and server, that should
> DTRT.

Hey Jeff,
Thanks for your email and for sharing the NFS spec. I noticed that the
ALLOCATE operation ends up checking for space during write-back as well,
and the initial concern of loosing data still remain. But if we do the
operation before writing to the page-cache, it would be a performance
issue.

I will try a few experiments and then post my findings here.

--
Regards,
Piyush