Re: [PATCH v3 1/2] ext4: fix fast commit inode enqueueing during a full journal commit

From: Luis Henriques
Date: Tue Jul 09 2024 - 10:40:31 EST


On Mon, Jul 08 2024, Theodore Ts'o wrote:

> On Wed, May 29, 2024 at 10:20:29AM +0100, Luis Henriques (SUSE) wrote:
>> When a full journal commit is on-going, any fast commit has to be enqueued
>> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing
>> is done only once, i.e. if an inode is already queued in a previous fast
>> commit entry it won't be enqueued again. However, if a full commit starts
>> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
>> be done into FC_Q_STAGING. And this is not being done in function
>> ext4_fc_track_template().
>>
>> This patch fixes the issue by re-enqueuing an inode into the STAGING queue
>> during the fast commit clean-up callback if it has a tid (i_sync_tid)
>> greater than the one being handled. The STAGING queue will then be spliced
>> back into MAIN.
>>
>> This bug was found using fstest generic/047. This test creates several 32k
>> bytes files, sync'ing each of them after it's creation, and then shutting
>> down the filesystem. Some data may be loss in this operation; for example a
>> file may have it's size truncated to zero.
>>
>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@xxxxxxxxx>
>
> This patch is causing a regression for the test generic/472
> generic/496 generic/643 if fast_commit is enabled. So using the
> ext4/adv or ext4/fast_commit configuration, e.g:
>
> % kvm-xfstests -c ext4/fast_commit generic/472 generic/496 generic/643
>
> For all of these test, the failures seem to involve the swapon command
> erroring out:
>
> --- tests/generic/496.out 2024-06-13 18:57:39.000000000 -0400
> +++ /results/ext4/results-fast_commit/generic/496.out.bad 2024-07-08 23:46:39.720
> @@ -1,3 +1,4 @@
> QA output created by 496
> fallocate swap
> mixed swap
> +swapon: Invalid argument
> ...
>
> but it's unclear why this patch would affect swapon.

OK, that's... embarrassing. I should have caught these failures :-(

> I've never been able to see generic/047 failure in any of my ext4/dev
> testing, nor in any of my daily fs-next CI testing. So for that
> reason, I'm going to drop this patch from my tree.

There's nothing special about my test environment. I can reproduce the
generic/047 failure (although not 100% of the times) by running it
manually in a virtme-ng test environment, using MKFS_OPTIONS="-O fast_commit".
Here's what I see when running it:

FSTYP -- ext4
PLATFORM -- Linux/x86_64 virtme-ng 6.10.0-rc7+ #269 SMP PREEMPT_DYNAMIC Tue Jul 9 14:24:22 WEST 2024
MKFS_OPTIONS -- -F -O fast_commit /dev/vdb1
MOUNT_OPTIONS -- -o acl,user_xattr /dev/vdb1 /tmp/mnt/scratch

generic/047 162s ... - output mismatch (see [...]/testing/xfstests-dev/results//generic/047.out.bad)
--- tests/generic/047.out 2021-01-11 12:08:14.972458324 +0000
+++ [...]/testing/xfstests-dev/results//generic/047.out.bad 2024-07-09 14:28:36.626435948 +0100
@@ -1 +1,2 @@
QA output created by 047
+file /tmp/mnt/scratch/944 has incorrect size - fsync failed
...
(Run 'diff -u [...]/testing/xfstests-dev/tests/generic/047.out [...]/testing/xfstests-dev/results//generic/047.out.bad' to see the entire diff)
Ran: generic/047
Failures: generic/047
Failed 1 of 1 tests

> The second patch in this series appears to be independent at least
> from a logical perspective --- although a minor change is needed to
> resolve a merge conflict after dropping this change.
>
> Luis, Harshad, could you look in this failure and then resubmit once
> it's been fixed? Thanks!! Also, Luis, can you give more details
> about the generic/047 failure that you had seen? Is it one of those
> flaky tests where you need to run the test dozens or hundreds of time
> to see the failure?


So, I've done some quick tests, but I'll need some more time to dig into
it. And this is what I _think_ it's happening:

When activating a swap file, the kernel forces an fsync, calling
ext4_sync_file() which will then call ext4_fc_commit() and, eventually,
the ext4_fc_cleanup().

With this patch an inode may be re-enqueued into the STAGING queue and
then spliced back into MAIN; and that's exactly what I see happening.

Later, still on the swap activation path, ext4_set_iomap() will be called
and will do this:

if (ext4_inode_datasync_dirty(inode) ||
offset + length > i_size_read(inode))
iomap->flags |= IOMAP_F_DIRTY;

'ext4_inode_datasync_dirty()' will be true because '->i_fc_list' is not
empty. And that's why the swapoff will fail.

Anyway, I'll try to figure out what's missing here (or what's wrong with
my patch).

Cheers,
--
Luís