Re: [PATCH 1/4] iomap: Lift blocksize restriction on atomic writes
From: Darrick J. Wong
Date: Fri Jan 17 2025 - 13:52:41 EST
On Thu, Jan 16, 2025 at 07:52:25AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 14, 2025 at 03:57:26PM -0800, Darrick J. Wong wrote:
> > Ok, let's do that then. Just to be clear -- for any RWF_ATOMIC direct
> > write that's correctly aligned and targets a single mapping in the
> > correct state, we can build the untorn bio and submit it. For
> > everything else, prealloc some post EOF blocks, write them there, and
> > exchange-range them.
> >
> > Tricky questions: How do we avoid collisions between overlapping writes?
> > I guess we find a free file range at the top of the file that is long
> > enough to stage the write, and put it there? And purge it later?
> >
> > Also, does this imply that the maximum file size is less than the usual
> > 8EB?
>
> I think literally using the exchrange code for anything but an
> initial prototype is a bad idea for the above reasons. If we go
> beyond proving this is possible you'd want a version of exchrange
> where the exchange partners is not a file mapping, but a cow staging
> record.
The trouble is that the br_startoff attribute of cow staging mappings
aren't persisted on disk anywhere, which is why exchange-range can't
handle the cow fork. You could open an O_TMPFILE and swap between the
two files, though that gets expensive per-io unless you're willing to
stash that temp file somewhere.
At this point I think we should slap the usual EXPERIMENTAL warning on
atomic writes through xfs and let John land the simplest multi-fsblock
untorn write support, which only handles the corner case where all the
stars are <cough> aligned; and then make an exchange-range prototype
and/or all the other forcealign stuff.
(Lifting in smaller pieces sounds a lot better than having John carry
around an increasingly large patchset...)
--D