Re: [PATCH v2 2/7] iomap: Add zero unwritten mappings dio support

From: Darrick J. Wong
Date: Fri Dec 13 2024 - 19:57:03 EST


On Fri, Dec 13, 2024 at 03:47:40PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 12, 2024 at 12:40:07PM -0800, Darrick J. Wong wrote:
> > > However, I still think that we should be able to atomic write mixed extents,
> > > even though it is a pain to implement. To that end, I could be convinced
> > > again that we don't require it...
> >
> > Well... if you /did/ add a few entries to include/uapi/linux/fs.h for
> > ways that an untorn write can fail, then we could define the programming
> > interface as so:
> >
> > "If you receive -EBADMAP, then call fallocate(FALLOC_FL_MAKE_OVERWRITE)
> > to force all the mappings to pure overwrites."
>
> Ewwwwwwwwwwwwwwwwwwwww.
>
> That's not a sane API in any way.

Oh I know, I'd much rather stick to the view that block untorn writes
are a means for programs that only ever do IO in large(ish) blocks to
take advantage of a hardware feature that also wants those large
blocks. And only if the file mapping is in the correct state, and the
program is willing to *maintain* them in the correct state to get the
better performance. I don't want xfs to grow code to write zeroes to
mapped blocks just so it can then write-untorn to the same blocks.

The gross part is that I think if you want to do untorn multi-fsblock
writes, then you need forcealign. In turn, forcealign has to handle COW
of shared blocks. willy and I looked through the changes I made to
support dirtying and writing back gangs of pages for rtreflink when the
rtextsize > 1, and didn't find anything insane in there. Using that to
handle COWing forcealign file blocks should work, though things get
tricky once you add atomic untorn writes because we can't split bios.

Everything else I think should use exchange-range because it has so many
fewer limitations.

--D

> > ...since there have been a few people who have asked about that ability
> > so that they can write+fdatasync without so much overhead from file
> > metadata updates.
>
> And all of them fundamentally misunderstood file system semantics and/or
> used weird bypasses that are dommed to corrupt the file system sooner
> or later.