Re: [RFC PATCH] vfs: add inode lockdep assertions

From: Darrick J. Wong
Date: Wed Sep 06 2023 - 17:26:20 EST


On Wed, Sep 06, 2023 at 10:10:25PM +0100, Matthew Wilcox wrote:
> On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote:
> > You'd be better off converting this to:
> >
> > return __xfs_rwsem_islocked(&ip->i_lock.mr_lock,
> > (lock_flags & XFS_ILOCK_SHARED));
> >
> > And then fixing __xfs_rwsem_islocked to do:
> >
> > static inline bool
> > __xfs_rwsem_islocked(
> > struct rw_semaphore *rwsem,
> > bool shared)
> > {
> > if (!debug_locks) {
> > if (!shared)
> > return rwsem_is_write_locked(rwsem);
> > return rwsem_is_locked(rwsem);
> > }
> >
> > ...
> > }
>
> so ... I did that. And then many isilocked() calls started failing.
> generic/017 was the first one:
>
> 00030 XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_trans_inode.c, line: 93
> 00030 ------------[ cut here ]------------
> 00030 WARNING: CPU: 5 PID: 77 at fs/xfs/xfs_message.c:104 assfail+0x2c/0x40
> 00030 Modules linked in:
> 00030 CPU: 5 PID: 77 Comm: kworker/5:1 Kdump: loaded Not tainted 6.5.0-00006-g88a61c17df8f-dirty #14
> 00030 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> 00030 Workqueue: xfsalloc xfs_btree_split_worker
> 00030 RIP: 0010:assfail+0x2c/0x40
> 00030 Code: 89 d0 41 89 c9 48 c7 c2 80 70 0f 82 48 89 f1 48 89 e5 48 89 fe 48 c7
> c7 08 4f 07 82 e8 fd fd ff ff 80 3d 26 cc ed 00 00 75 04 <0f> 0b 5d c3 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48
> 00030 RSP: 0018:ffff888003f0bc28 EFLAGS: 00010246
> 00030 RAX: 00000000ffffffea RBX: ffff88800d9a0000 RCX: 000000007fffffff
> 00030 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff82074f08
> 00030 RBP: ffff888003f0bc28 R08: 0000000000000000 R09: 000000000000000a
> 00030 R10: 000000000000000a R11: 0fffffffffffffff R12: ffff888009ff6000
> 00030 R13: ffff88800ac8a000 R14: 0000000000000001 R15: ffff88800af0a000
> 00030 FS: 0000000000000000(0000) GS:ffff88807d940000(0000) knlGS:0000000000000000
> 00030 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 00030 CR2: 00007ff177b05d58 CR3: 0000000007aa2002 CR4: 0000000000770ea0
> 00030 PKRU: 55555554
> 00030 Call Trace:
> 00030 <TASK>
> 00030 ? show_regs+0x5c/0x70
> 00030 ? __warn+0x84/0x180
> 00030 ? assfail+0x2c/0x40
> 00030 ? report_bug+0x18e/0x1c0
> 00030 ? handle_bug+0x3e/0x70
> 00030 ? exc_invalid_op+0x18/0x70
> 00030 ? asm_exc_invalid_op+0x1b/0x20
> 00030 ? assfail+0x2c/0x40
> 00030 ? assfail+0x23/0x40
> 00030 xfs_trans_log_inode+0xf9/0x120
> 00030 xfs_bmbt_alloc_block+0xf0/0x1c0
> 00030 __xfs_btree_split+0xf8/0x6c0
> 00030 ? __this_cpu_preempt_check+0x13/0x20
> 00030 ? lock_acquire+0xc8/0x280
> 00030 xfs_btree_split_worker+0x8a/0x110
> 00030 process_one_work+0x23f/0x510
> 00030 worker_thread+0x4d/0x3b0
> 00030 ? process_one_work+0x510/0x510
> 00030 kthread+0x106/0x140
> 00030 ? kthread_complete_and_exit+0x20/0x20
> 00030 ret_from_fork+0x31/0x50
> 00030 ? kthread_complete_and_exit+0x20/0x20
> 00030 ret_from_fork_asm+0x11/0x20
> 00030 </TASK>
> 00030 irq event stamp: 2901
> 00030 hardirqs last enabled at (2909): [<ffffffff810e4d83>] __up_console_sem+0x53/0x60
> 00030 hardirqs last disabled at (2916): [<ffffffff810e4d68>] __up_console_sem+0x38/0x60
> 00030 softirqs last enabled at (0): [<ffffffff81067bd0>] copy_process+0x830/0x1c10
> 00030 softirqs last disabled at (0): [<0000000000000000>] 0x0
> 00030 ---[ end trace 0000000000000000 ]---
>
> but here's the thing, I have lockdep enabled. So it's not testing my
> new rwsem_is_write_locked() code, it's testing the current lockdep
> stuff.
>
> So I have a feeling that we're not telling lockdep that we've acquired
> the i_lock. Or something? Seems unlikely that this is a real bug;
> surely we'd've noticed before now.
>
> Code here:
> https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/mrlock
>
> ie git clone git://git.infradead.org/users/willy/pagecache.git mrlock
>
> You don't need the top commit. 754fb6a68dae is enough to provoke it,
> as long as you have CONFIG_LOCKDEP enabled.

Yeah. The lockdep assertions in __xfs_rwsem_islocked are broken because
the fsstress thread takes the ILOCK_EXCL, decides to defer a bmap btree
split to a workqueue to avoid overflowing the kernel stack, and doesn't
tell lockdep that the workqueue is (effectively) the rwsem owner until
it completes.

The last time we tried to get rid of mrlock_t, dchinner suggested using
lockdep_assert_held_non_owner to fix the assertions, but (a) that
doesn't seem to exist in TOT 6.5 and (b) the inode rwsems protect the
inode data structure, not threads. Hence you could just get rid of the
lockdep assertions and use the rwsem_* versions unconditionally.

--D