Re: [PATCH v5 03/10] xfs: Refactor xfs_reflink_end_cow_extent()

From: John Garry
Date: Thu Mar 13 2025 - 02:11:56 EST


On 13/03/2025 04:51, Darrick J. Wong wrote:
Hence if we are walking a range of extents in the BMBT to unmap
them, then we should only be generating 2 intents per loop - a BUI
for the BMBT removal and a CUI for the shared refcount decrease.
That means we should be able to run at least a thousand iterations
of that loop per transaction without getting anywhere near the
transaction reservation limits.

*However!*

We have to relog every intent we haven't processed in the deferred
batch every-so-often to prevent the outstanding intents from pinning
the tail of the log. Hence the larger the number of intents in the
initial batch, the more work we have to do later on (and the more
overall log space and bandwidth they will consume) to relog them
them over and over again until they pop to the head of the
processing queue.

Hence there is no real perforamce advantage to creating massive intent
batches because we end up doing more work later on to relog those
intents to prevent journal space deadlocks. It also doesn't speed up
processing, because we still process the intent chains one at a time
from start to completion before moving on to the next high level
intent chain that needs to be processed.

Further, after the first couple of intent chains have been
processed, the initial log space reservation will have run out, and
we are now asking for a new resrevation on every transaction roll we
do. i.e. we now are now doing a log space reservation on every
transaction roll in the processing chain instead of only doing it
once per high level intent chain.

Hence from a log space accounting perspective (the hottest code path
in the journal), it is far more efficient to perform a single high
level transaction per extent unmap operation than it is to batch
intents into a single high level transaction.

My advice is this: we should never batch high level iterative
intent-based operations into a single transaction because it's a
false optimisation. It might look like it is an efficiency
improvement from the high level, but it ends up hammering the hot,
performance critical paths in the transaction subsystem much, much
harder and so will end up being slower than the single transaction
per intent-based operation algorithm when it matters most....
How specifically do you propose remapping all the extents in a file
range after an untorn write? The regular cow ioend does a single
transaction per extent across the entire ioend range and cannot deliver
untorn writes. This latest proposal does, but now you've torn that idea
down too.

At this point I have run out of ideas and conclude that can only submit
to your superior intellect.

--D

I'm hearing that we can fit thousands without getting anywhere the limits - this is good.

But then also it is not optimal in terms of performance to batch, right? Performance is not so important here. This is for a software fallback, which we should not frequently hit. And even if we do, we're still typically not going to have many extents.

For our specific purpose, we want 16KB atomic writes - that is max of 4 extents. So this does not really sound like something to be concerned with for these atomic write sizes.

We can add some arbitrary FS awu max, like 64KB, if that makes people feel more comfortable.

Thanks,
John