[PATCH v2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim
From: Waiman Long
Date: Thu Jun 04 2020 - 17:01:55 EST
Depending on the workloads, the following circular locking dependency
warning between sb_internal (a percpu rwsem) and fs_reclaim (a pseudo
lock) may show up:
======================================================
WARNING: possible circular locking dependency detected
5.0.0-rc1+ #60 Tainted: G W
------------------------------------------------------
fsfreeze/4346 is trying to acquire lock:
0000000026f1d784 (fs_reclaim){+.+.}, at:
fs_reclaim_acquire.part.19+0x5/0x30
but task is already holding lock:
0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
which lock already depends on the new lock.
:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal);
lock(fs_reclaim);
lock(sb_internal);
lock(fs_reclaim);
*** DEADLOCK ***
4 locks held by fsfreeze/4346:
#0: 00000000b478ef56 (sb_writers#8){++++}, at: percpu_down_write+0xb4/0x650
#1: 000000001ec487a9 (&type->s_umount_key#28){++++}, at: freeze_super+0xda/0x290
#2: 000000003edbd5a0 (sb_pagefaults){++++}, at: percpu_down_write+0xb4/0x650
#3: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
stack backtrace:
Call Trace:
dump_stack+0xe0/0x19a
print_circular_bug.isra.10.cold.34+0x2f4/0x435
check_prev_add.constprop.19+0xca1/0x15f0
validate_chain.isra.14+0x11af/0x3b50
__lock_acquire+0x728/0x1200
lock_acquire+0x269/0x5a0
fs_reclaim_acquire.part.19+0x29/0x30
fs_reclaim_acquire+0x19/0x20
kmem_cache_alloc+0x3e/0x3f0
kmem_zone_alloc+0x79/0x150
xfs_trans_alloc+0xfa/0x9d0
xfs_sync_sb+0x86/0x170
xfs_log_sbcount+0x10f/0x140
xfs_quiesce_attr+0x134/0x270
xfs_fs_freeze+0x4a/0x70
freeze_super+0x1af/0x290
do_vfs_ioctl+0xedc/0x16c0
ksys_ioctl+0x41/0x80
__x64_sys_ioctl+0x73/0xa9
do_syscall_64+0x18f/0xd23
entry_SYSCALL_64_after_hwframe+0x49/0xbe
According to Dave Chinner:
Freezing the filesystem, after all the data has been cleaned. IOWs
memory reclaim will never run the above writeback path when
the freeze process is trying to allocate a transaction here because
there are no dirty data pages in the filesystem at this point.
Indeed, this xfs_sync_sb() path sets XFS_TRANS_NO_WRITECOUNT so that
it /doesn't deadlock/ by taking freeze references for the
transaction. We've just drained all the transactions
in progress and written back all the dirty metadata, too, and so the
filesystem is completely clean and only needs the superblock to be
updated to complete the freeze process. And to do that, it does not
take a freeze reference because calling sb_start_intwrite() here
would deadlock.
IOWs, this is a false positive, caused by the fact that
xfs_trans_alloc() is called from both above and below memory reclaim
as well as within /every level/ of freeze processing. Lockdep is
unable to describe the staged flush logic in the freeze process that
prevents deadlocks from occurring, and hence we will pretty much
always see false positives in the freeze path....
Perhaps breaking the fs_reclaim pseudo lock into a per filesystem lock
may fix the issue. However, that will greatly complicate the logic and
may not be worth it.
Another way to fix it is to disable the taking of the fs_reclaim
pseudo lock when in the freezing code path as a reclaim on the freezed
filesystem is not possible as stated above. Given that a KM_NOLOCKDEP
flag has been introduced in commit 6dcde60efd94 ("xfs: more lockdep
whackamole with kmem_alloc*"), this flag can be used to disable lockdep
when the filesystem has been freezed.
Without this patch, the command sequence below will show that the lock
dependency chain sb_internal -> fs_reclaim exists.
# fsfreeze -f /home
# fsfreeze --unfreeze /home
# grep -i fs_reclaim -C 3 /proc/lockdep_chains | grep -C 5 sb_internal
After applying the patch, such sb_internal -> fs_reclaim lock dependency
chain can no longer be found. Because of that, the locking dependency
warning will not be shown.
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
fs/xfs/xfs_log.c | 3 ++-
fs/xfs/xfs_trans.c | 8 +++++++-
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 00fda2e8e738..d273d4e74ef8 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -433,7 +433,8 @@ xfs_log_reserve(
XFS_STATS_INC(mp, xs_try_logspace);
ASSERT(*ticp == NULL);
- tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, 0);
+ tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent,
+ mp->m_super->s_writers.frozen ? KM_NOLOCKDEP : 0);
*ticp = tic;
xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 3c94e5ff4316..3a9f394a0f02 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -261,8 +261,14 @@ xfs_trans_alloc(
* Allocate the handle before we do our freeze accounting and setting up
* GFP_NOFS allocation context so that we avoid lockdep false positives
* by doing GFP_KERNEL allocations inside sb_start_intwrite().
+ *
+ * To prevent false positive lockdep warning of circular locking
+ * dependency between sb_internal and fs_reclaim, disable the
+ * acquisition of the fs_reclaim pseudo-lock when the superblock
+ * has been frozen or in the process of being frozen.
*/
- tp = kmem_zone_zalloc(xfs_trans_zone, 0);
+ tp = kmem_zone_zalloc(xfs_trans_zone,
+ mp->m_super->s_writers.frozen ? KM_NOLOCKDEP : 0);
if (!(flags & XFS_TRANS_NO_WRITECOUNT))
sb_start_intwrite(mp->m_super);
--
2.18.1