Re: [PATCH RFC 06/10] xfs: iomap CoW-based atomic write support
From: Darrick J. Wong
Date: Thu Feb 06 2025 - 16:44:30 EST
On Thu, Feb 06, 2025 at 11:10:40AM +0000, John Garry wrote:
> On 05/02/2025 20:05, Darrick J. Wong wrote:
> > On Tue, Feb 04, 2025 at 12:01:23PM +0000, John Garry wrote:
> > > In cases of an atomic write occurs for misaligned or discontiguous disk
> > > blocks, we will use a CoW-based method to issue the atomic write.
> > >
> > > So, for that case, return -EAGAIN to request that the write be issued in
> > > CoW atomic write mode. The dio write path should detect this, similar to
> > > how misaligned regalar DIO writes are handled.
> > >
> > > Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
> > > ---
> > > fs/xfs/xfs_iomap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 66 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > > index ae3755ed00e6..2c2867d728e4 100644
> > > --- a/fs/xfs/xfs_iomap.c
> > > +++ b/fs/xfs/xfs_iomap.c
> > > @@ -809,9 +809,12 @@ xfs_direct_write_iomap_begin(
> > > struct xfs_bmbt_irec imap, cmap;
> > > xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
> > > xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length);
> > > + bool atomic = flags & IOMAP_ATOMIC;
> > > int nimaps = 1, error = 0;
> > > bool shared = false;
> > > + bool found = false;
> > > u16 iomap_flags = 0;
> > > + bool need_alloc;
> > > unsigned int lockmode;
> > > u64 seq;
> > > @@ -832,7 +835,7 @@ xfs_direct_write_iomap_begin(
> > > * COW writes may allocate delalloc space or convert unwritten COW
> > > * extents, so we need to make sure to take the lock exclusively here.
> > > */
> > > - if (xfs_is_cow_inode(ip))
> > > + if (xfs_is_cow_inode(ip) || atomic)
> > > lockmode = XFS_ILOCK_EXCL;
> > > else
> > > lockmode = XFS_ILOCK_SHARED;
> > > @@ -857,12 +860,73 @@ xfs_direct_write_iomap_begin(
> > > if (error)
> > > goto out_unlock;
> > > +
> > > + if (flags & IOMAP_ATOMIC_COW) {
> > > + error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
> > > + &lockmode,
> > > + (flags & IOMAP_DIRECT) || IS_DAX(inode), true);
> >
> > Weird nit not relate to this patch: Is there ever a case where
> > IS_DAX(inode) and (flags & IOMAP_DAX) disagree? I wonder if this odd
> > construction could be simplified to:
> >
> > (flags & (IOMAP_DIRECT | IOMAP_DAX))
>
> I'm not sure. I assume that we always want to convert for DAX, and IOMAP_DAX
> may not be set always for DIO path - but I only see xfs_file_write_iter() ->
> xfs_file_dax_write() ->dax_iomap_rw(xfs_dax_write_iomap_ops), which sets
> IOMAP_DAX in iomap_iter.flags
>
> >
> > ?
> >
> > > + if (error)
> > > + goto out_unlock;
> > > +
> > > + end_fsb = imap.br_startoff + imap.br_blockcount;
> > > + length = XFS_FSB_TO_B(mp, end_fsb) - offset;
> > > +
> > > + if (imap.br_startblock != HOLESTARTBLOCK) {
> > > + seq = xfs_iomap_inode_sequence(ip, 0);
> > > +
> > > + error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags,
> > > + iomap_flags | IOMAP_F_ATOMIC_COW, seq);
> > > + if (error)
> > > + goto out_unlock;
> > > + }
> > > + seq = xfs_iomap_inode_sequence(ip, 0);
> > > + xfs_iunlock(ip, lockmode);
> > > + return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
> > > + iomap_flags | IOMAP_F_ATOMIC_COW, seq);
> > > + }
> >
> > /me wonders if this should be a separate helper so that the main
> > xfs_direct_write_iomap_begin doesn't get even longer... but otherwise
> > the logic in here looks sane.
>
> I can do that. Maybe some code can be factored out for regular "found cow
> path".
>
> >
> > > +
> > > + need_alloc = imap_needs_alloc(inode, flags, &imap, nimaps);
> > > +
> > > + if (atomic) {
> > > + /* Use CoW-based method if any of the following fail */
> > > + error = -EAGAIN;
> > > +
> > > + /*
> > > + * Lazily use CoW-based method for initial alloc of data.
> > > + * Check br_blockcount for FSes which do not support atomic
> > > + * writes > 1x block.
> > > + */
> > > + if (need_alloc && imap.br_blockcount > 1)
> > > + goto out_unlock;
> > > +
> > > + /* Misaligned start block wrt size */
> > > + if (!IS_ALIGNED(imap.br_startblock, imap.br_blockcount))
> > > + goto out_unlock;
> > > +
> > > + /* Discontiguous or mixed extents */
> > > + if (!imap_spans_range(&imap, offset_fsb, end_fsb))
> > > + goto out_unlock;
> > > + }
> >
> > (Same two comments here.)
>
> ok
>
> >
> > > +
> > > if (imap_needs_cow(ip, flags, &imap, nimaps)) {
> > > error = -EAGAIN;
> > > if (flags & IOMAP_NOWAIT)
> > > goto out_unlock;
> > > + if (atomic) {
> > > + /* Detect whether we're already covered in a cow fork */
> > > + error = xfs_find_trim_cow_extent(ip, &imap, &cmap, &shared, &found);
> > > + if (error)
> > > + goto out_unlock;
> > > +
> > > + if (shared) {
> > > + error = -EAGAIN;
> > > + goto out_unlock;
> >
> > What is this checking? That something else already created a mapping in
> > the COW fork, so we want to bail out to get rid of it?
>
> I want to check if some data is shared. In that case, we should unshare.
Why is it necessary to unshare? Userspace gave us a buffer of new
contents, and we're already prepared to write that out of place and
remap it.
> And I am not sure if that check is sufficient.
>
> On the buffered write path, we may have something in a CoW fork - in that
> case it should be flushed, right?
Flushed against what? Concurrent writeback or something? The directio
setup should have flushed dirty pagecache, so the only things left in
the COW fork are speculative preallocations. (IOWs, I don't understand
what needs to be flushed or why.)
--D
>
> Thanks,
> John
>