Re: [PATCH 1/4] iomap: Lift blocksize restriction on atomic writes

From: Darrick J. Wong
Date: Thu Dec 05 2024 - 01:30:38 EST


On Thu, Dec 05, 2024 at 07:35:45AM +1100, Dave Chinner wrote:
> On Wed, Dec 04, 2024 at 03:43:41PM +0000, John Garry wrote:
> > From: "Ritesh Harjani (IBM)" <ritesh.list@xxxxxxxxx>
> >
> > Filesystems like ext4 can submit writes in multiples of blocksizes.
> > But we still can't allow the writes to be split into multiple BIOs. Hence
> > let's check if the iomap_length() is same as iter->len or not.
> >
> > It is the responsibility of userspace to ensure that a write does not span
> > mixed unwritten and mapped extents (which would lead to multiple BIOs).
>
> How is "userspace" supposed to do this?
>
> No existing utility in userspace is aware of atomic write limits or
> rtextsize configs, so how does "userspace" ensure everything is
> laid out in a manner compatible with atomic writes?
>
> e.g. restoring a backup (or other disaster recovery procedures) is
> going to have to lay the files out correctly for atomic writes.
> backup tools often sparsify the data set and so what gets restored
> will not have the same layout as the original data set...
>
> Where's the documentation that outlines all the restrictions on
> userspace behaviour to prevent this sort of problem being triggered?
> Common operations such as truncate, hole punch, buffered writes,
> reflinks, etc will trip over this, so application developers, users
> and admins really need to know what they should be doing to avoid
> stepping on this landmine...

I'm kinda assuming that this requires forcealign to get the extent
alignments correct, and writing zeroes non-atomically if the extent
state gets mixed up before retrying the untorn write. John?

> Further to that, what is the correction process for users to get rid
> of this error? They'll need some help from an atomic write
> constraint aware utility that can resilver the file such that these
> failures do not occur again. Can xfs_fsr do this? Or maybe the new
> exchangr-range code? Or does none of this infrastructure yet exist?

TBH aside from databases doing pure overwrites to storage hardware, I
think everyone else would be better served by start_commit /
commit_range since it's /much/ more flexible.

> > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx>
> > jpg: tweak commit message
> > Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
> > ---
> > fs/iomap/direct-io.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index b521eb15759e..3dd883dd77d2 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -306,7 +306,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > size_t copied = 0;
> > size_t orig_count;
> >
> > - if (atomic && length != fs_block_size)
> > + if (atomic && length != iter->len)
> > return -EINVAL;
>
> Given this is now rejecting IOs that are otherwise well formed from
> userspace, this situation needs to have a different errno now. The
> userspace application has not provided an invalid set of
> IO parameters for this IO - the IO has been rejected because
> the previously set up persistent file layout was screwed up by
> something in the past.
>
> i.e. This is not an application IO submission error anymore, hence
> EINVAL is the wrong error to be returning to userspace here.

Admittedly it would be useful to add at least a couple of new errnos for
alignment issues and incorrect file storage mapping states. How
difficult is it to get a new errno code added to uapi and then plumbed
into glibc? Are new errno additions to libc still gate-kept by Ulrich
or is my knowlege 15 years out of date?

--D

> -Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>