Re: [RFC] xfs: fake fallocate success for always CoW inodes

From: Florian Weimer

Date: Thu Nov 06 2025 - 11:31:44 EST


* Matthew Wilcox:

> On Thu, Nov 06, 2025 at 02:52:12PM +0100, Christoph Hellwig wrote:
>> On Thu, Nov 06, 2025 at 02:48:12PM +0100, Florian Weimer wrote:
>> > * Hans Holmberg:
>> >
>> > > We don't support preallocations for CoW inodes and we currently fail
>> > > with -EOPNOTSUPP, but this causes an issue for users of glibc's
>> > > posix_fallocate[1]. If fallocate fails, posix_fallocate falls back on
>> > > writing actual data into the range to try to allocate blocks that way.
>> > > That does not actually gurantee anything for CoW inodes however as we
>> > > write out of place.
>> >
>> > Why doesn't fallocate trigger the copy instead? Isn't this what the
>> > user is requesting?
>>
>> What copy?
>
> I believe Florian is thinking of CoW in the sense of "share while read
> only, then you have a mutable block allocation", rather than the
> WAFL (or SMR) sense of "we always put writes in a new location".

Ahh. That's a new aspect to the discussion that was previously lost to
me. Previous discussions focused on cases where the kernel couldn't do
the pre-population operation safely even though it was beneficial from
an application perspective. And not cases where the operation was
meaningless because of the way the file system was implemented.

(Pre-allocating CoW space as part of fallocate appears to be difficult
because I don't see how to surface this space usage to applications and
adminstrators.)

It's been a few years, I think, and maybe we should drop the allocation
logic from posix_fallocate in glibc? Assuming that it's implemented
everywhere it makes sense? There are more always-CoW, compressing file
systems these days, so applications just have to come to terms with the
fact that even after posix_fallocate, writes can still fail, and not
just because of media errors. So maybe posix_fallocate isn't that
meaningful anymore.

Thanks,
Floriana