Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

From: Dave Chinner
Date: Tue Oct 04 2016 - 15:44:45 EST


On Tue, Oct 04, 2016 at 01:43:43PM +0200, Oleg Nesterov wrote:
> On 10/03, Oleg Nesterov wrote:
> >
> > On 10/03, Dave Chinner wrote:
> > >
> > > On Fri, Sep 30, 2016 at 07:14:34PM +0200, Oleg Nesterov wrote:
> > > > On 09/27, Oleg Nesterov wrote:
> > > > >
> > > > Jan, I gave up.
> > > >
> > > > Whatever I did xfstests-dev/check reports a lot of failures, kernel bugs,
> > > > and finally either crashes the kernel or hangs.
> > >
> > > Did you run the auto group like I suggested? That's the set of tests
> > > that should complete successfully with minimal failures and without
> > > crashing the machine.
> >
> > OK, I'll reserve a testing machine again.
>
> Fedora 23 running in KVM guest
>
> kernel v4.8 clean, see .config below
>
> xfstests from git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> last commit 0bbd20b104765c75faaf8e28a54c32df505ce7fd ("btrfs: test
> free space tree mount options")
>
> script:
>
> dd if=/dev/zero of=TEST.img bs=1MiB count=8192
> dd if=/dev/zero of=SCRATCH.img bs=1MiB count=8192
>
> losetup --find --show TEST.img
> losetup --find --show SCRATCH.img
>
> mkfs.xfs -f /dev/loop0
> mkfs.xfs -f /dev/loop1
>
> mkdir -p TEST SCRATCH
>
> mount /dev/loop0 TEST
> mount /dev/loop1 SCRATCH
>
> TEST_DEV=/dev/loop0 TEST_DIR=TEST SCRATCH_DEV=/dev/loop1 SCRATCH_MNT=SCRATCH ./check -g auto

Looks sane.

> This time it hangs after generic/274:
>
> --- tests/generic/274.out 2016-10-04 04:23:24.209006171 -0400
> +++ /root/XFS/xfstests-dev/results//generic/274.out.bad 2016-10-04 05:17:49.291742498 -0400
> @@ -3,3 +3,4 @@
> preallocation test
> ------------------------------
> done
> +./common/rc: line 344: 4693 Segmentation fault exit
> ...
> (Run 'diff -u tests/generic/274.out /root/XFS/xfstests-dev/results//generic/274.out.bad' to see the entire diff)
>
> because of kernel bug:
>
> [ 2730.242537] run fstests generic/274 at 2016-10-04 05:17:34
> [ 2730.738352] XFS (loop1): Mounting V5 Filesystem
> [ 2730.741451] XFS (loop1): Ending clean mount
> [ 2744.508698] ------------[ cut here ]------------
> [ 2744.509190] kernel BUG at ./include/linux/swap.h:276!
>
> static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
> {
> VM_BUG_ON(!workingset_node_shadows(node));
> node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
> }

Linus had a massive rant about this yesterday. Regression was
introduced between v4.8-rc8 and v4.8 and it took Linus's machine
down. xfstests found a filesystem related kernel bug, as it's
supposed to.

> plus the following warnings:
>
> [ 1894.500040] run fstests generic/070 at 2016-10-04 05:03:39
> [ 1895.076655] =================================
> [ 1895.077136] [ INFO: inconsistent lock state ]
> [ 1895.077574] 4.8.0 #1 Not tainted
> [ 1895.077900] ---------------------------------
> [ 1895.078330] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
> [ 1895.078993] fsstress/18239 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [ 1895.079522] (&xfs_nondir_ilock_class){++++?-}, at: [<ffffffffc049ad45>] xfs_ilock+0x165/0x210 [xfs]
> [ 1895.080529] {IN-RECLAIM_FS-W} state was registered at:

[snip kswapd doing XFS inode reclaim]

> [ 1895.102565] CPU: 0 PID: 18239 Comm: fsstress Not tainted 4.8.0 #1
> [ 1895.103160] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [ 1895.103721] 0000000000000086 000000008f20d9c8 ffff880038b73818 ffffffffb6449133
> [ 1895.104486] ffff880038a5cd00 ffffffffb78bcd70 ffff880038b73868 ffffffffb6106c75
> [ 1895.105257] 0000000000000000 0000000000000001 ffff880000000001 0000000000000008
> [ 1895.106034] Call Trace:
> [ 1895.106281] [<ffffffffb6449133>] dump_stack+0x85/0xc2
> [ 1895.106789] [<ffffffffb6106c75>] print_usage_bug+0x215/0x240
> [ 1895.107352] [<ffffffffb610722b>] mark_lock+0x58b/0x650
> [ 1895.107888] [<ffffffffb61061e0>] ? check_usage_forwards+0x160/0x160
> [ 1895.108506] [<ffffffffb610735f>] mark_held_locks+0x6f/0xa0
> [ 1895.109085] [<ffffffffb610a5ea>] lockdep_trace_alloc+0xca/0x110
> [ 1895.109695] [<ffffffffb6260b89>] kmem_cache_alloc_node_trace+0x39/0x2a0
> [ 1895.110414] [<ffffffffb623e2ce>] ? vm_map_ram+0x2de/0x490
> [ 1895.110995] [<ffffffffb623e2ce>] vm_map_ram+0x2de/0x490
> [ 1895.111536] [<ffffffffb623e03b>] ? vm_map_ram+0x4b/0x490
> [ 1895.112142] [<ffffffffc04823bc>] _xfs_buf_map_pages+0x6c/0x110 [xfs]
> [ 1895.112837] [<ffffffffc04847f7>] xfs_buf_get_map+0x2c7/0x470 [xfs]
> [ 1895.113500] [<ffffffffc043f5b9>] xfs_attr_rmtval_set+0x2c9/0x450 [xfs]
> [ 1895.114235] [<ffffffffc0438019>] xfs_attr_node_addname+0x4c9/0x5d0 [xfs]
> [ 1895.114948] [<ffffffffc043917c>] xfs_attr_set+0x3dc/0x460 [xfs]
> [ 1895.115593] [<ffffffffc04a8d4f>] xfs_xattr_set+0x4f/0x90 [xfs]
> [ 1895.116221] [<ffffffffb62bb2d0>] generic_setxattr+0x70/0x80
> [ 1895.116798] [<ffffffffb62bba9f>] __vfs_setxattr_noperm+0xaf/0x1a0
> [ 1895.117445] [<ffffffffb63bc525>] ? security_inode_setxattr+0x65/0xb0
> [ 1895.118121] [<ffffffffb62bbc37>] vfs_setxattr+0xa7/0xb0
> [ 1895.118651] [<ffffffffb62bbd69>] setxattr+0x129/0x190

And that is a bug in the lockdep annotations for memory allocation because it
fails to take into account the current task flags that are set via
memalloc_noio_save() to prevent vmalloc from doing GFP_KERNEL allocations. i.e.
in _xfs_buf_map_pages():

/*
* vm_map_ram() will allocate auxillary structures (e.g.
* pagetables) with GFP_KERNEL, yet we are likely to be under
* GFP_NOFS context here. Hence we need to tell memory reclaim
* that we are in such a context via PF_MEMALLOC_NOIO to prevent
* memory reclaim re-entering the filesystem here and
* potentially deadlocking.
*/
noio_flag = memalloc_noio_save();
do {
bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count,
-1, PAGE_KERNEL);
if (bp->b_addr)
break;
vm_unmap_aliases();
} while (retried++ <= 1);
memalloc_noio_restore(noio_flag);

So: lockdep annotation bug.

> [ 2018.537291] run fstests generic/095 at 2016-10-04 05:05:43
> [ 2018.733488] XFS (loop1): Unmounting Filesystem
> [ 2019.341189] XFS (loop1): Mounting V5 Filesystem
> [ 2019.344046] XFS (loop1): Ending clean mount
> [ 2019.796930] ------------[ cut here ]------------
> [ 2019.797529] WARNING: CPU: 0 PID: 10347 at fs/xfs/xfs_file.c:311 xfs_file_dio_aio_read+0x2d2/0x370 [xfs]
[snip]
> [ 2019.814146] WARNING: CPU: 0 PID: 10363 at fs/xfs/xfs_file.c:665 xfs_file_dio_aio_write+0x391/0x460 [xfs]
[snip]

And generic/095 is expected to generate those warnings and filters
them out in the test (the test harness checks dmesg for unexpected
failures). I explained the reason for these warnings yesterday.

In summary: xfstests found a real bug, a lockdep annotation
annoyance, and exercised a path that causes data corruption and
should trigger warnings. Looks to me like xfstests is doing exactly
what it is supposed to be doing doing....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx