Re: [PATCH v5 03/10] xfs: Refactor xfs_reflink_end_cow_extent()

From: John Garry
Date: Wed Mar 12 2025 - 18:06:38 EST


On 12/03/2025 15:46, Darrick J. Wong wrote:
On Wed, Mar 12, 2025 at 01:35:23AM -0700, Christoph Hellwig wrote:
On Wed, Mar 12, 2025 at 08:27:05AM +0000, John Garry wrote:
On 12/03/2025 07:24, Christoph Hellwig wrote:
On Mon, Mar 10, 2025 at 06:39:39PM +0000, John Garry wrote:
Refactor xfs_reflink_end_cow_extent() into separate parts which process
the CoW range and commit the transaction.

This refactoring will be used in future for when it is required to commit
a range of extents as a single transaction, similar to how it was done
pre-commit d6f215f359637.

Darrick pointed out that if you do more than just a tiny number
of extents per transactions you run out of log reservations very
quickly here:

https://urldefense.com/v3/__https://lore.kernel.org/all/20240329162936.GI6390@frogsfrogsfrogs/__;!!ACWV5N9M2RV99hQ!PWLcBof1tKimKUObvCj4vOhljWjFmjtzVHLx9apcU5Rah1xZnmp_3PIq6eSwx6TdEXzMLYYyBfmZLgvj$

how does your scheme deal with that?

The resblks calculation in xfs_reflink_end_atomic_cow() takes care of this,
right? Or does the log reservation have a hard size limit, regardless of
that calculation?

The resblks calculated there are the reserved disk blocks and have
nothing to do with the log reservations, which comes from the
tr_write field passed in. There is some kind of upper limited to it
obviously by the log size, although I'm not sure if we've formalized
that somewhere. Dave might be the right person to ask about that.

The (very very rough) upper limit for how many intent items you can
attach to a tr_write transaction is:

per_extent_cost = (cui_size + rui_size + bui_size + efi_size + ili_size)
max_blocks = tr_write::tr_logres / per_extent_cost

(ili_size is the inode log item size)

So will it be something like this:

static size_t
xfs_compute_awu_max_extents(
struct xfs_mount *mp)
{
struct xfs_trans_res *resp = &M_RES(mp)->tr_write;
size_t logtotal = xfs_bui_log_format_sizeof(1)+
xfs_cui_log_format_sizeof(1) +
xfs_efi_log_format_sizeof(1) +
xfs_rui_log_format_sizeof(1) +
sizeof(struct xfs_inode_log_format);

return rounddown_pow_of_two(resp->tr_logres / logtotal);
}

static inline void
xfs_compute_awu_max(
struct xfs_mount *mp, int jjcount)
{
....
mp->m_awu_max =
min_t(unsigned int, awu_max, xfs_compute_awu_max_extents(mp));
}


((I would halve that for the sake of paranoia))

since you have to commit all those intent items into the first
transaction in the chain. The difficulty we've always had is computing
the size of an intent item in the ondisk log, since that's a (somewhat
minor) layering violation -- it's xfs_cui_log_format_sizeof() for a CUI,
but then there' could be overhead for the ondisk log headers themselves.

Maybe we ought to formalize the computation of that since reap.c also
has a handwavy XREAP_MAX_DEFER_CHAIN that it uses to roll the scrub
transaction periodically... because I'd prefer we not add another
hardcoded limit. My guess is that the software fallback can probably
support any awu_max that a hardware wants to throw at us, but let's
actually figure out the min(sw, hw) that we can support and cap it at
that.

--D