Re: [RFC PATCH] jffs2: fix recursive fs_reclaim deadlock

From: Zhihao Cheng
Date: Fri Mar 15 2024 - 07:20:09 EST


在 2024/3/15 15:59, Qingfang Deng 写道:
When testing jffs2 on a memory-constrained system, lockdep detected a
possible circular locking dependency.

kswapd0/266 is trying to acquire lock:
ffffff802865e508 (&f->sem){+.+.}-{3:3}, at: jffs2_do_clear_inode+0x44/0x200

but task is already holding lock:
ffffffd010e843c0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x40

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (fs_reclaim){+.+.}-{0:0}:
lock_acquire+0x6c/0x90
fs_reclaim_acquire+0x7c/0xa0
kmem_cache_alloc+0x5c/0x400
jffs2_alloc_inode_cache+0x18/0x20
jffs2_do_read_inode+0x1e0/0x310
jffs2_iget+0x154/0x540
jffs2_do_fill_super+0x214/0x3f0
jffs2_fill_super+0x138/0x180
mtd_get_sb+0xcc/0x120
get_tree_mtd+0x168/0x400
jffs2_get_tree+0x14/0x20
vfs_get_tree+0x48/0x130
path_mount+0xa64/0x12d0
__arm64_sys_mount+0x368/0x3e0
do_el0_svc+0xa0/0x140
el0_svc+0x1c/0x30
el0_sync_handler+0x9c/0x120
el0_sync+0x148/0x180

-> #0 (&f->sem){+.+.}-{3:3}:
__lock_acquire+0x18cc/0x2bb0
lock_acquire.part.0+0x170/0x2e0
lock_acquire+0x6c/0x90
__mutex_lock+0x10c/0xaa0
mutex_lock_nested+0x54/0x80
jffs2_do_clear_inode+0x44/0x200
jffs2_evict_inode+0x44/0x50
evict+0x120/0x290
dispose_list+0x88/0xd0
prune_icache_sb+0xa8/0xd0
super_cache_scan+0x1c4/0x240
shrink_slab.constprop.0+0x2a0/0x7f0
shrink_node+0x398/0x8e0
balance_pgdat+0x268/0x550
kswapd+0x154/0x7c0
kthread+0x1f0/0x200
ret_from_fork+0x10/0x20

I think it's a false positive warning. Jffs2 is trying to get root inode in process '#1', which means that the filesystem is not mounted yet(Because d_make_root is after jffs2_iget(sb,1), there is no way to access other inodes.), so it is impossible that jffs2 inode is being evicted in '#0'.
other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(&f->sem);
lock(fs_reclaim);
lock(&f->sem);

*** DEADLOCK ***

3 locks held by kswapd0/266:
#0: ffffffd010e843c0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x40
#1: ffffffd010e62eb0 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab.constprop.0+0x78/0x7f0
#2: ffffff80225340e0 (&type->s_umount_key#40){.+.+}-{3:3}, at: super_cache_scan+0x3c/0x240

It turns out jffs2 uses GFP_KERNEL as the memory allocation flags
throughout the code, and commonly, inside the critical section of
jffs2_inode_info::sem. When running low on memory, any allocation within
the critical section may trigger a direct reclaim, which recurses back
to jffs2_do_clear_inode().

Replace GFP_KERNEL with GFP_NOFS to avoid the recursion.

Signed-off-by: Qingfang Deng <dqfext@xxxxxxxxx>
---
XXX: Posting this as RFC, as I don't know if all GFP_KERNEL occurrences
should be replaced, or if this is just a false positive.