Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree

From: Dave Chinner
Date: Sat Mar 31 2018 - 18:02:58 EST


On Fri, Mar 30, 2018 at 02:47:05AM +0000, Sasha Levin wrote:
> On Thu, Mar 29, 2018 at 10:05:35AM +1100, Dave Chinner wrote:
> >On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
> > This commit has been processed by the -stable helper bot and determined
> > to be a high probability candidate for -stable trees. (score: 6.4845)
> >
> > The bot has tested the following trees: v4.15.12, v4.14.29, v4.9.89, v4.4.123, v4.1.50, v3.18.101.
> >
> > v4.15.12: Build OK!
> > v4.14.29: Build OK!
> > v4.9.89: Build OK!
> > v4.4.123: Build OK!
> > v4.1.50: Build OK!
> > v3.18.101: Build OK!
> >
> > XFS Specific tests:
> >
> > v4.15.12 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.15.12/tests/):
> > No tests completed!

Can you capture the actual check command output into it's own file?
That tells us at a glance which tests succeed or failed.

So I'm looking at the v5.log file:

....
echo 'export MKFS_OPTIONS='\''-m crc=0,reflink=0,rmapbt=0, -i sparse=0,'\'''
....


FSTYP -- xfs (debug)
PLATFORM -- Linux/x86_64 autosel 4.15.12+
MKFS_OPTIONS -- -f -m crc=0,reflink=0,rmapbt=0, -i sparse=0,
/dev/vdb
MOUNT_OPTIONS -- /dev/vdb /mnt2

That's not testing v5 filesystems. That's turned off crcs, and
so is testing a v4 filesystem. You'll see this on filesysetms that
don't support reflink:

[not run] Reflink not supported by test filesystem type: xfs

Also, you need to make the test filesystem to match the options
the test run is configured with (i.e. v4, v5, reflink, etc)
otherwise half the tests don't exercise the expected config.

[not run] src/dbtest not built

[not run] chacl command not found

[not run] xfs_io set_encpolicy support is missing

You need to update your userspace.

And the test run has not completed. It's run to:

generic/430 [11172.480621] run fstests generic/430 at 2018-03-30 00:20:12
+ scp -i /home/sasha/ssh/id_rsa -P 10022 -r root@xxxxxxxxx:/root/xfstests-dev/results /home/sasha/data/results/test/v4.15.12/tests//v5/
+ az vm delete -y --resource-group sasha-auto-stable --name sasha-worker-629016242-vm

generic/430 and then stopped. There's still another ~50 tests in the
generic group to run, and then there's the shared and XFS subdirs to
run, too. So there's still something wrong in the way you are
setting up/installing fstests here....

> > v4.14.29 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.14.29/tests/):
> > No tests completed!
> > v4.9.89 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.9.89/tests/):
> > No tests completed!
> > v4.4.123 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.4.123/tests/):
> > v4:
> > Thu Mar 29 21:23:57 UTC 2018
> > Interrupted!
> > Passed all 0 tests
> > v4_reflink:

There's no such configuration as "v4 reflink". reflink is only
available on v5 (crc enabled) filesystems on kernels >=4.10 (IIRC).
Perhaps you've mislabelled them?

> Let me know if this would be good enough for now, and if there's
> anything else to add that'll be useful.
>
> This brings me to the sad part of this mail: not a single stable kernel
> survived a run. Most are paniced, some are hanging, and some were killed
> because of KASan.
>
> All have hit various warnings in fs/iomap.c,

Normal - the dmesg filter in the test harness catches those and
ignores them if they are known/expected to occur.

> and kernels accross several
> versions hit the BUG at fs/xfs/xfs_message.c:113 (+-1 line)

That's an ASSERT() failure, indicating a fatal error. e.g:

Stuff like this (from
http://stable-bot.westus2.cloudapp.azure.com/test/v4.9.89/tests/v4_reflink.log)

.....
generic/083 [ 4443.536212] run fstests generic/083 at 2018-03-29 22:32:17
[ 4444.557989] XFS (vdb): Unmounting Filesystem
[ 4445.498461] XFS (vdb): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[ 4445.505860] XFS (vdb): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[ 4445.513090] XFS (vdb): Mounting V5 Filesystem
[ 4445.531284] XFS (vdb): Ending clean mount
[ 4458.087406] XFS: Assertion failed: xfs_is_reflink_inode(ip), file: fs/xfs/xfs_reflink.c, line: 509

[snip stack trace]

Indicate a problem that should not be occurring. It's debug an
triage time - there's some problem that needs backports to fix. I
doubt anyone in XFS land has time to do this on top of everything
else we alreayd have to do...

> 4.15.12 is hitting a use-after-free in xfs_efi_release().

Debug and triage time.

> 4.14.29 and 4.9.89 seems to end up with corrupted memory (KASAN
> warnings) at or before generic/027.

More debug and triage time.

> And finally, 3.18.101 is pretty unhappy with sleeping functions called
> from atomic context.

Needle in a haystack :/

So this is just basic XFS validation, and it's throwing problems up
all over the place. Now do you see why we've been saying maintaining
stable backports for XFS is pretty much a full time job for someone?

And keep in mind this is just one filesystem. You're going to end up
with the same issues on ext4 and btrfs - the regression tests are
going to show up all sorts of problems that have been fixed in the
upstream kernels but never backported....

Cheers,

Dave.

--
Dave Chinner
david@xxxxxxxxxxxxx