Re: lockdep warns: cpu_hotplug_lock.rw_sem --> slab_mutex --> kn->count#39

From: Qian Cai
Date: Mon Dec 09 2019 - 22:34:25 EST




> On Dec 9, 2019, at 10:21 PM, Qian Cai <cai@xxxxxx> wrote:
>
>
>
>> On Dec 9, 2019, at 3:33 PM, Christopher Lameter <cl@xxxxxxxxx> wrote:
>>
>> On Mon, 9 Dec 2019, Daniel Wagner wrote:
>>
>>> [ 5.038862]
>>> [ 5.038862] -> #2 (kn->count#39){++++}:
>>> [ 5.039329] __kernfs_remove+0x240/0x2e0
>>> [ 5.039717] kernfs_remove_by_name_ns+0x3c/0x80
>>> [ 5.040159] sysfs_slab_add+0x184/0x250
>>
>> sysfs_slab_add should not be called under any lock. But it happens during
>> an initcall (sysfs_slab_init) when the kmalloc slab array is being set up.
>>
>> And the problems results from a hotplug event? During system bringup when
>> the slab caches have not been setup yet?
>>
>> Is this really something that can happen?
>>
>>
>
>
> It happens to me too with a probably easier to trigger deadlock. Basically, we have,
>
> memcg_create_kmem_cache():
> cpu_hotplug_lock.rw_sem/mem_hotplug_lock.rw_sem â> kn->count

Correction â kmemcg_workfn() -> kmemcg_cache_shutdown_fn() does it NOT
memcg_create_kmem_cache(). How to fix it is beyond my imagination as
the relevant code looks all too old that probably need a redesign.

> Then, "slabinfo -dâ does the opposite locking order,
>
> kmem_cache_shrink_all+0x50/0x100 (cpu_hotplug_lock.rw_sem/mem_hotplug_lock.rw_sem)
> shrink_store+0x34/0x60
> slab_attr_store+0x6c/0x170
> sysfs_kf_write+0x70/0xb0
> kernfs_fop_write+0x11c/0x270 ((kn->count)
> __vfs_write+0x3c/0x70
> vfs_write+0xcc/0x200
> ksys_write+0x7c/0x140
> system_call+0x5c/0x6
>
>
> [ 1776.927152][ T9264] WARNING: possible circular locking dependency detected
> [ 1776.927164][ T9264] 5.5.0-rc1-next-20191209 #2 Not tainted
> [ 1776.927172][ T9264] ------------------------------------------------------
> [ 1776.927182][ T9264] slabinfo/9264 is trying to acquire lock:
> [ 1776.927204][ T9264] c000000001072c90 (cpu_hotplug_lock.rw_sem){++++}, at: kmem_cache_shrink_all+0x50/0x100
> [ 1776.927224][ T9264]
> [ 1776.927224][ T9264] but task is already holding lock:
> [ 1776.927246][ T9264] c0002013fae02e90 (kn->count#80){++++}, at: kernfs_fop_write+0xe0/0x270
> [ 1776.927274][ T9264]
> [ 1776.927274][ T9264] which lock already depends on the new lock.
> [ 1776.927274][ T9264]
> [ 1776.927298][ T9264]
> [ 1776.927298][ T9264] the existing dependency chain (in reverse order) is:
> [ 1776.927332][ T9264]
> [ 1776.927332][ T9264] -> #2 (kn->count#80){++++}:
> [ 1776.927355][ T9264] __kernfs_remove+0x3b0/0x440
> [ 1776.927374][ T9264] kernfs_remove+0x48/0x70
> [ 1776.927405][ T9264] sysfs_remove_dir+0x78/0xb0
> [ 1776.927437][ T9264] kobject_del+0x48/0xa0
> [ 1776.927458][ T9264] sysfs_slab_unlink+0x38/0x50
> [ 1776.927468][ T9264] shutdown_cache+0x208/0x2b0
> [ 1776.927477][ T9264] kmemcg_cache_shutdown_fn+0x20/0x40
> [ 1776.927509][ T9264] kmemcg_workfn+0x64/0xa0
> [ 1776.927530][ T9264] process_one_work+0x300/0x8e0
> [ 1776.927550][ T9264] worker_thread+0x78/0x530
> [ 1776.927580][ T9264] kthread+0x1a8/0x1b0
> [ 1776.927612][ T9264] ret_from_kernel_thread+0x5c/0x74
> [ 1776.927641][ T9264]
> [ 1776.927641][ T9264] -> #1 (slab_mutex){+.+.}:
> [ 1776.927663][ T9264] __mutex_lock+0xdc/0xb20
> [ 1776.927672][ T9264] memcg_create_kmem_cache+0x54/0x230
> [ 1776.927704][ T9264] memcg_kmem_cache_create_func+0x3c/0x280
> [ 1776.927725][ T9264] process_one_work+0x300/0x8e0
> [ 1776.927755][ T9264] worker_thread+0x78/0x530
> [ 1776.927774][ T9264] kthread+0x1a8/0x1b0
> [ 1776.927794][ T9264] ret_from_kernel_thread+0x5c/0x74
> [ 1776.927813][ T9264]
> [ 1776.927918][ T9264]
> [ 1776.927918][ T9264] -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> [ 1776.927954][ T9264] __lock_acquire+0x1644/0x2360
> [ 1776.927973][ T9264] lock_acquire+0x130/0x360
> [ 1776.927994][ T9264] cpus_read_lock+0x64/0x170
> [ 1776.928014][ T9264] kmem_cache_shrink_all+0x50/0x100
> [ 1776.928034][ T9264] shrink_store+0x34/0x60
> [ 1776.928065][ T9264] slab_attr_store+0x6c/0x170
> [ 1776.928085][ T9264] sysfs_kf_write+0x70/0xb0
> [ 1776.928115][ T9264] kernfs_fop_write+0x11c/0x270
> [ 1776.928136][ T9264] __vfs_write+0x3c/0x70
> [ 1776.928156][ T9264] vfs_write+0xcc/0x200
> [ 1776.928176][ T9264] ksys_write+0x7c/0x140
> [ 1776.928195][ T9264] system_call+0x5c/0x68
> [ 1776.928214][ T9264]
> [ 1776.928214][ T9264] other info that might help us debug this:
> [ 1776.928214][ T9264]
> [ 1776.928238][ T9264] Chain exists of:
> [ 1776.928238][ T9264] cpu_hotplug_lock.rw_sem --> slab_mutex --> kn->count#80
> [ 1776.928238][ T9264]
> [ 1776.928289][ T9264] Possible unsafe locking scenario:
> [ 1776.928289][ T9264]
> [ 1776.928321][ T9264] CPU0 CPU1
> [ 1776.928340][ T9264] ---- ----
> [ 1776.928359][ T9264] lock(kn->count#80);
> [ 1776.928377][ T9264] lock(slab_mutex);
> [ 1776.928398][ T9264] lock(kn->count#80);
> [ 1776.928418][ T9264] lock(cpu_hotplug_lock.rw_sem);
> [ 1776.928438][ T9264]
> [ 1776.928438][ T9264] *** DEADLOCK ***
> [ 1776.928438][ T9264]
> [ 1776.928471][ T9264] 3 locks held by slabinfo/9264:
> [ 1776.928490][ T9264] #0: c000000042301408 (sb_writers#4){.+.+}, at: vfs_write+0x180/0x200
> [ 1776.928525][ T9264] #1: c00020063c2a5680 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd4/0x270
> [ 1776.928561][ T9264] #2: c0002013fae02e90 (kn->count#80){++++}, at: kernfs_fop_write+0xe0/0x270
> [ 1776.928607][ T9264]
> [ 1776.928607][ T9264] stack backtrace:
> [ 1776.928640][ T9264] CPU: 65 PID: 9264 Comm: slabinfo Not tainted 5.5.0-rc1-next-20191209 #2
> [ 1776.928684][ T9264] Call Trace:
> [ 1776.928705][ T9264] [c00000162edcf790] [c000000000944460] dump_stack+0xf4/0x164 (unreliable)
> [ 1776.928740][ T9264] [c00000162edcf7e0] [c0000000001be0e8] print_circular_bug+0x298/0x2b0
> [ 1776.928773][ T9264] [c00000162edcf880] [c0000000001be360] check_noncircular+0x260/0x320
> [ 1776.928807][ T9264] [c00000162edcf980] [c0000000001c38c4] __lock_acquire+0x1644/0x2360
> [ 1776.928840][ T9264] [c00000162edcfae0] [c0000000001c12a0] lock_acquire+0x130/0x360
> [ 1776.928874][ T9264] [c00000162edcfbc0] [c0000000001063b4] cpus_read_lock+0x64/0x170
> [ 1776.928907][ T9264] [c00000162edcfc00] [c000000000384620] kmem_cache_shrink_all+0x50/0x100
> [ 1776.928951][ T9264] [c00000162edcfc40] [c000000000425e14] shrink_store+0x34/0x60
> [ 1776.928973][ T9264] [c00000162edcfc70] [c000000000423f4c] slab_attr_store+0x6c/0x170
> [ 1776.929007][ T9264] [c00000162edcfcd0] [c0000000005b7340] sysfs_kf_write+0x70/0xb0
> [ 1776.929030][ T9264] [c00000162edcfd10] [c0000000005b650c] kernfs_fop_write+0x11c/0x270
> [ 1776.929063][ T9264] [c00000162edcfd60] [c0000000004a313c] __vfs_write+0x3c/0x70
> [ 1776.929127][ T9264] [c00000162edcfd80] [c0000000004a61ec] vfs_write+0xcc/0x200
> [ 1776.929209][ T9264] [c00000162edcfdd0] [c0000000004a658c] ksys_write+0x7c/0x140
> [ 1776.929299][ T9264] [c00000162edcfe20] [c00000000000b378] system_call+0x5c/0x6