Re: [PATCH v6 10/13] xfs: iomap COW-based atomic write support

From: John Garry
Date: Tue Mar 18 2025 - 13:45:23 EST


On 18/03/2025 08:32, Christoph Hellwig wrote:
On Tue, Mar 18, 2025 at 08:22:53AM +0000, John Garry wrote:
Is xfs_reflink_allocate_cow even the right helper to use? We know we
absolutely want a a COW fork extent, we know there can't be delalloc
extent to convert as we flushed dirty data, so most of the logic in it
is pointless.

Well xfs_reflink_allocate_cow gives us what we want when we set some flag
(XFS_REFLINK_FORCE_COW).

Are you hinting at a dedicated helper? Note that
xfs_reflink_fill_cow_hole() also handles the XFS_REFLINK_FORCE_COW flag.

We might not even need much of a helper. This all comes from my
experience with the zoned code that always writes out of place as well.
I initially tried to reuse the existing iomap_begin methods and all
these helpers, but in the end it turned out to much, much simpler to
just open code the logic. Now the atomic cow code will be a little
more complex in some aspect (unwritten extents, speculative
preallocations), but also simpler in others (no need to support buffered
I/O including zeroing and sub-block writes that require the ugly
srcmap based read-modify-write), but I suspect the overall trade-off
will be similar.

After all the high-level logic for the atomic COW iomap_begin really
should just be:

- take the ilock
- check the COW fork if there is an extent for the start of the range.
- If yes:
- if the extent is unwritten, convert the part overlapping with
the range to written

I was wondering when it could be written, and then I figured it could be if we had this sort of scenario:
- initially we have a mixed of shared and unshared file, like:

mnt/file:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 192..199 0 (192..199) 8 100000
1: [8..15]: 32840..32847 0 (32840..32847) 8 000000
2: [16..20479]: 208..20671 0 (208..20671) 20464 100000
FLAG Values:
0100000 Shared extent
0010000 Unwritten preallocated extent
0001000 Doesn't begin on stripe unit
0000100 Doesn't end on stripe unit
0000010 Doesn't begin on stripe width
0000001 Doesn't end on stripe width

- try atomic write of size 32K @ offset 0
- we first try HW atomics method and call xfs_direct_write_iomap_begin() -> xfs_reflink_allocate_cow()
- CoW fork mapping is no good for atomics (too short), so try CoW atomic method
- in CoW atomic method, we find that pre-alloc'ed CoW fork extent (which was converted to written already)

- return the extent
- If no:
- allocate a new extent covering the range in the COW fork

Doing this without going down multiple levels of helpers designed for
an entirely different use case will probably simplify things
significantly.

Please suggest any further modifications to the following attempt. I have XFS_REFLINK_FORCE_COW still being passed to xfs_reflink_fill_cow_hole(), but xfs_reflink_fill_cow_hole() is quite a large function and I am not sure if I want to duplicate lots of it.

static int
xfs_atomic_write_cow_iomap_begin(
struct inode *inode,
loff_t offset,
loff_t length,
unsigned flags,
struct iomap *iomap,
struct iomap *srcmap)
{
ASSERT(flags & IOMAP_WRITE);
ASSERT(flags & IOMAP_DIRECT);

struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length);
xfs_fileoff_t count_fsb = end_fsb - offset_fsb;
struct xfs_bmbt_irec imap = {
.br_startoff = offset_fsb,
.br_startblock = HOLESTARTBLOCK,
.br_blockcount = end_fsb - offset_fsb,
.br_state = XFS_EXT_UNWRITTEN,
};
struct xfs_bmbt_irec cmap;
bool shared = false;
unsigned int lockmode = XFS_ILOCK_EXCL;
u64 seq;
struct xfs_iext_cursor icur;

if (xfs_is_shutdown(mp))
return -EIO;

if (!xfs_has_reflink(mp))
return -EINVAL;

if (!ip->i_cowfp) {
ASSERT(!xfs_is_reflink_inode(ip));
xfs_ifork_init_cow(ip);
}

error = xfs_ilock_for_iomap(ip, flags, &lockmode);
if (error)
return error;

if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap) && cmap.br_startoff <= offset_fsb) {
if (cmap.br_state == XFS_EXT_UNWRITTEN) {
xfs_trim_extent(&cmap, offset_fsb, count_fsb);

error = xfs_reflink_convert_unwritten(ip, &imap, &cmap, XFS_REFLINK_CONVERT_UNWRITTEN);
if (error)
goto out_unlock;
}
} else {
error = xfs_reflink_fill_cow_hole(ip, &imap, &cmap, &shared,
&lockmode, XFS_REFLINK_CONVERT_UNWRITTEN |
XFS_REFLINK_FORCE_COW | XFS_REFLINK_ALLOC_EXTSZALIGN);
if (error)
goto out_unlock;
}

finish:
... // as before
}

xfs_reflink_convert_unwritten() and others are private to reflink.c, so this function should also prob live there.

thanks