Re: [RFC] xfs: fake fallocate success for always CoW inodes

From: Hans Holmberg
Date: Tue Nov 11 2025 - 03:35:19 EST


On 06/11/2025 15:46, Christoph Hellwig wrote:
> On Thu, Nov 06, 2025 at 02:42:30PM +0000, Matthew Wilcox wrote:
>> On Thu, Nov 06, 2025 at 02:52:12PM +0100, Christoph Hellwig wrote:
>>> On Thu, Nov 06, 2025 at 02:48:12PM +0100, Florian Weimer wrote:
>>>> * Hans Holmberg:
>>>>
>>>>> We don't support preallocations for CoW inodes and we currently fail
>>>>> with -EOPNOTSUPP, but this causes an issue for users of glibc's
>>>>> posix_fallocate[1]. If fallocate fails, posix_fallocate falls back on
>>>>> writing actual data into the range to try to allocate blocks that way.
>>>>> That does not actually gurantee anything for CoW inodes however as we
>>>>> write out of place.
>>>> Why doesn't fallocate trigger the copy instead? Isn't this what the
>>>> user is requesting?
>>> What copy?
>> I believe Florian is thinking of CoW in the sense of "share while read
>> only, then you have a mutable block allocation", rather than the
>> WAFL (or SMR) sense of "we always put writes in a new location".
> Note that the glibc posix_fallocate(3( fallback will never copy anyway.
> It does a racy check and somewhat broken check if there is already
> data, and if it thinks there isn't it writes zeroes. Which is the
> wrong thing for just about every use case imaginable. And the only
> thing to stop it from doing that is to implement fallocate(2) and
> return success.

In stead of returning success in fallocate(2), could we in stead return
an distinct error code that would tell the caller that:

The optimized allocation not supported, AND there is no use trying to
preallocate data using writes?

EUSELESS would be nice to have, but that is not available.

Then posix_fallocate could fail with -EINVAL (which looks legit according
to the man page "the underlying filesystem does not support the operation")
or skip the writes and return success (whatever is preferable)