Re: [4.15-rc9] fs_reclaim lockdep trace
From: Tetsuo Handa
Date: Sat Jan 27 2018 - 20:17:32 EST
Linus Torvalds wrote:
> On Sat, Jan 27, 2018 at 2:24 PM, Dave Jones <davej@xxxxxxxxxxxxxxxxx> wrote:
>> On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote:
>> > Just triggered this on a server I was rsync'ing to.
>>
>> Actually, I can trigger this really easily, even with an rsync from one
>> disk to another. Though that also smells a little like networking in
>> the traces. Maybe netdev has ideas.
>
> Is this new to 4.15? Or is it just that you're testing something new?
>
> If it's new and easy to repro, can you just bisect it? And if it isn't
> new, can you perhaps check whether it's new to 4.14 (ie 4.13 being
> ok)?
>
> Because that fs_reclaim_acquire/release() debugging isn't new to 4.15,
> but it was rewritten for 4.14.. I'm wondering if that remodeling ended
> up triggering something.
--- linux-4.13.16/mm/page_alloc.c
+++ linux-4.14.15/mm/page_alloc.c
@@ -3527,53 +3519,12 @@
return true;
}
return false;
}
#endif /* CONFIG_COMPACTION */
-#ifdef CONFIG_LOCKDEP
-struct lockdep_map __fs_reclaim_map =
- STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
-
-static bool __need_fs_reclaim(gfp_t gfp_mask)
-{
- gfp_mask = current_gfp_context(gfp_mask);
-
- /* no reclaim without waiting on it */
- if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
- return false;
-
- /* this guy won't enter reclaim */
- if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
- return false;
-
- /* We're only interested __GFP_FS allocations for now */
- if (!(gfp_mask & __GFP_FS))
- return false;
-
- if (gfp_mask & __GFP_NOLOCKDEP)
- return false;
-
- return true;
-}
-
-void fs_reclaim_acquire(gfp_t gfp_mask)
-{
- if (__need_fs_reclaim(gfp_mask))
- lock_map_acquire(&__fs_reclaim_map);
-}
-EXPORT_SYMBOL_GPL(fs_reclaim_acquire);
-
-void fs_reclaim_release(gfp_t gfp_mask)
-{
- if (__need_fs_reclaim(gfp_mask))
- lock_map_release(&__fs_reclaim_map);
-}
-EXPORT_SYMBOL_GPL(fs_reclaim_release);
-#endif
-
/* Perform direct synchronous page reclaim */
static int
__perform_reclaim(gfp_t gfp_mask, unsigned int order,
const struct alloc_context *ac)
{
struct reclaim_state reclaim_state;
@@ -3582,21 +3533,21 @@
cond_resched();
/* We now go into synchronous reclaim */
cpuset_memory_pressure_bump();
noreclaim_flag = memalloc_noreclaim_save();
- fs_reclaim_acquire(gfp_mask);
+ lockdep_set_current_reclaim_state(gfp_mask);
reclaim_state.reclaimed_slab = 0;
current->reclaim_state = &reclaim_state;
progress = try_to_free_pages(ac->zonelist, order, gfp_mask,
ac->nodemask);
current->reclaim_state = NULL;
- fs_reclaim_release(gfp_mask);
+ lockdep_clear_current_reclaim_state();
memalloc_noreclaim_restore(noreclaim_flag);
cond_resched();
return progress;
}
>
> Adding PeterZ to the participants list in case he has ideas. I'm not
> seeing what would be the problem in that call chain from hell.
>
> Linus
Dave Jones wrote:
> ============================================
> WARNING: possible recursive locking detected
> 4.15.0-rc9-backup-debug+ #1 Not tainted
> --------------------------------------------
> sshd/24800 is trying to acquire lock:
> (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
>
> but task is already holding lock:
> (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(fs_reclaim);
> lock(fs_reclaim);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 2 locks held by sshd/24800:
> #0: (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40
> #1: (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
>
> stack backtrace:
> CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
> Call Trace:
> dump_stack+0xbc/0x13f
> __lock_acquire+0xa09/0x2040
> lock_acquire+0x12e/0x350
> fs_reclaim_acquire.part.102+0x29/0x30
> kmem_cache_alloc+0x3d/0x2c0
> alloc_extent_state+0xa7/0x410
> __clear_extent_bit+0x3ea/0x570
> try_release_extent_mapping+0x21a/0x260
> __btrfs_releasepage+0xb0/0x1c0
> btrfs_releasepage+0x161/0x170
> try_to_release_page+0x162/0x1c0
> shrink_page_list+0x1d5a/0x2fb0
> shrink_inactive_list+0x451/0x940
> shrink_node_memcg.constprop.88+0x4c9/0x5e0
> shrink_node+0x12d/0x260
> try_to_free_pages+0x418/0xaf0
> __alloc_pages_slowpath+0x976/0x1790
> __alloc_pages_nodemask+0x52c/0x5c0
> new_slab+0x374/0x3f0
> ___slab_alloc.constprop.81+0x47e/0x5a0
> __slab_alloc.constprop.80+0x32/0x60
> __kmalloc_track_caller+0x267/0x310
> __kmalloc_reserve.isra.40+0x29/0x80
> __alloc_skb+0xee/0x390
> sk_stream_alloc_skb+0xb8/0x340
> tcp_sendmsg_locked+0x8e6/0x1d30
> tcp_sendmsg+0x27/0x40
> inet_sendmsg+0xd0/0x310
> sock_write_iter+0x17a/0x240
> __vfs_write+0x2ab/0x380
> vfs_write+0xfb/0x260
> SyS_write+0xb6/0x140
> do_syscall_64+0x1e5/0xc05
> entry_SYSCALL64_slow_path+0x25/0x25
> ============================================
> WARNING: possible recursive locking detected
> 4.15.0-rc9-backup-debug+ #7 Not tainted
> --------------------------------------------
> snmpd/892 is trying to acquire lock:
> (fs_reclaim){+.+.}, at: [<0000000002e4c185>] fs_reclaim_acquire.part.101+0x5/0x30
>
> but task is already holding lock:
> (fs_reclaim){+.+.}, at: [<0000000002e4c185>] fs_reclaim_acquire.part.101+0x5/0x30
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(fs_reclaim);
> lock(fs_reclaim);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 2 locks held by snmpd/892:
> #0: (rtnl_mutex){+.+.}, at: [<00000000dcd3ba2f>] netlink_dump+0x89/0x520
> #1: (fs_reclaim){+.+.}, at: [<0000000002e4c185>] fs_reclaim_acquire.part.101+0x5/0x30
>
> stack backtrace:
> CPU: 5 PID: 892 Comm: snmpd Not tainted 4.15.0-rc9-backup-debug+ #7
> Call Trace:
> dump_stack+0xbc/0x13f
> __lock_acquire+0xa09/0x2040
> lock_acquire+0x12e/0x350
> fs_reclaim_acquire.part.101+0x29/0x30
> kmem_cache_alloc+0x3d/0x2c0
> alloc_extent_state+0xa7/0x410
> __clear_extent_bit+0x3ea/0x570
> try_release_extent_mapping+0x21a/0x260
> __btrfs_releasepage+0xb0/0x1c0
> btrfs_releasepage+0x161/0x170
> try_to_release_page+0x162/0x1c0
> shrink_page_list+0x1d5a/0x2fb0
> shrink_inactive_list+0x451/0x940
> shrink_node_memcg.constprop.84+0x4c9/0x5e0
> shrink_node+0x1c2/0x510
> try_to_free_pages+0x425/0xb90
> __alloc_pages_slowpath+0x955/0x1a00
> __alloc_pages_nodemask+0x52c/0x5c0
> new_slab+0x374/0x3f0
> ___slab_alloc.constprop.81+0x47e/0x5a0
> __slab_alloc.constprop.80+0x32/0x60
> __kmalloc_track_caller+0x267/0x310
> __kmalloc_reserve.isra.40+0x29/0x80
> __alloc_skb+0xee/0x390
> netlink_dump+0x2e1/0x520
> __netlink_dump_start+0x201/0x280
> rtnetlink_rcv_msg+0x6d6/0xa90
> netlink_rcv_skb+0xb6/0x1d0
> netlink_unicast+0x298/0x320
> netlink_sendmsg+0x57e/0x630
> SYSC_sendto+0x296/0x320
> do_syscall_64+0x1e5/0xc05
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x7f204299f54d
> RSP: 002b:00007ffc49024fd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f204299f54d
> RDX: 0000000000000018 RSI: 00007ffc49025010 RDI: 0000000000000012
> RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000012
> R13: 00007ffc49029550 R14: 000055e31307a250 R15: 00007ffc49029530
Both traces are identical and no fs locks held? And therefore,
doing GFP_KERNEL allocation should be safe (as long as there is
PF_MEMALLOC safeguard which prevents infinite recursion), isn't it?
Then, I think that "git bisect" should reach commit d92a8cfcb37ecd13
("locking/lockdep: Rework FS_RECLAIM annotation").