Re: [PATCH v1] mm/memory_hotplug: Don't take the cpu_hotplug_lock

From: Qian Cai
Date: Tue Sep 24 2019 - 11:03:28 EST


On Tue, 2019-09-24 at 16:36 +0200, David Hildenbrand wrote:
> Since commit 3f906ba23689 ("mm/memory-hotplug: switch locking to a percpu
> rwsem") we do a cpus_read_lock() in mem_hotplug_begin(). This was
> introduced to fix a potential deadlock between get_online_mems() and
> get_online_cpus() - the memory and cpu hotplug lock. The root issue was
> that build_all_zonelists() -> stop_machine() required the cpu hotplug lock:
> The reason is that memory hotplug takes the memory hotplug lock and
> then calls stop_machine() which calls get_online_cpus(). That's the
> reverse lock order to get_online_cpus(); get_online_mems(); in
> mm/slub_common.c
>
> So memory hotplug never really required any cpu lock itself, only
> stop_machine() and lru_add_drain_all() required it. Back then,
> stop_machine_cpuslocked() and lru_add_drain_all_cpuslocked() were used
> as the cpu hotplug lock was now obtained in the caller.
>
> Since commit 11cd8638c37f ("mm, page_alloc: remove stop_machine from build
> all_zonelists"), the stop_machine_cpuslocked() call is gone.
> build_all_zonelists() does no longer require the cpu lock and does no
> longer make use of stop_machine().
>
> Since commit 9852a7212324 ("mm: drop hotplug lock from
> lru_add_drain_all()"), lru_add_drain_all() "Doesn't need any cpu hotplug
> locking because we do rely on per-cpu kworkers being shut down before our
> page_alloc_cpu_dead callback is executed on the offlined cpu.". The
> lru_add_drain_all_cpuslocked() variant was removed.
>
> So there is nothing left that requires the cpu hotplug lock. The memory
> hotplug lock and the device hotplug lock are sufficient.
>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Oscar Salvador <osalvador@xxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
> ---
>
> RFC -> v1:
> - Reword and add more details why the cpu hotplug lock was needed here
> in the first place, and why we no longer require it.
>
> ---
> mm/memory_hotplug.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c3e9aed6023f..5fa30f3010e1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -88,14 +88,12 @@ __setup("memhp_default_state=", setup_memhp_default_state);
>
> void mem_hotplug_begin(void)
> {
> - cpus_read_lock();
> percpu_down_write(&mem_hotplug_lock);
> }
>
> void mem_hotplug_done(void)
> {
> percpu_up_write(&mem_hotplug_lock);
> - cpus_read_unlock();
> }
>
> u64 max_mem_size = U64_MAX;

While at it, it might be a good time to rethink the whole locking over there, as
it right now read files under /sys/kernel/slab/ could trigger a possible
deadlock anyway.

[ÂÂ442.258806][ T5224] WARNING: possible circular locking dependency detected
[ÂÂ442.265678][ T5224] 5.3.0-rc7-mm1+ #6 Tainted: GÂÂÂÂÂÂÂÂÂÂÂÂÂLÂÂÂ
[ÂÂ442.271766][ T5224] ------------------------------------------------------
[ÂÂ442.278635][ T5224] cat/5224 is trying to acquire lock:
[ÂÂ442.283857][ T5224] ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
show_slab_objects+0x94/0x3a8
[ÂÂ442.293189][ T5224]Â
[ÂÂ442.293189][ T5224] but task is already holding lock:
[ÂÂ442.300404][ T5224] b8ff009693eee398 (kn->count#45){++++}, at:
kernfs_seq_start+0x44/0xf0
[ÂÂ442.308587][ T5224]Â
[ÂÂ442.308587][ T5224] which lock already depends on the new lock.
[ÂÂ442.308587][ T5224]Â
[ÂÂ442.318841][ T5224]Â
[ÂÂ442.318841][ T5224] the existing dependency chain (in reverse order) is:
[ÂÂ442.327705][ T5224]Â
[ÂÂ442.327705][ T5224] -> #2 (kn->count#45){++++}:
[ÂÂ442.334413][ T5224]ÂÂÂÂÂÂÂÂlock_acquire+0x31c/0x360
[ÂÂ442.339286][ T5224]ÂÂÂÂÂÂÂÂ__kernfs_remove+0x290/0x490
[ÂÂ442.344428][ T5224]ÂÂÂÂÂÂÂÂkernfs_remove+0x30/0x44
[ÂÂ442.349224][ T5224]ÂÂÂÂÂÂÂÂsysfs_remove_dir+0x70/0x88
[ÂÂ442.354276][ T5224]ÂÂÂÂÂÂÂÂkobject_del+0x50/0xb0
[ÂÂ442.358890][ T5224]ÂÂÂÂÂÂÂÂsysfs_slab_unlink+0x2c/0x38
[ÂÂ442.364025][ T5224]ÂÂÂÂÂÂÂÂshutdown_cache+0xa0/0xf0
[ÂÂ442.368898][ T5224]ÂÂÂÂÂÂÂÂkmemcg_cache_shutdown_fn+0x1c/0x34
[ÂÂ442.374640][ T5224]ÂÂÂÂÂÂÂÂkmemcg_workfn+0x44/0x64
[ÂÂ442.379428][ T5224]ÂÂÂÂÂÂÂÂprocess_one_work+0x4f4/0x950
[ÂÂ442.384649][ T5224]ÂÂÂÂÂÂÂÂworker_thread+0x390/0x4bc
[ÂÂ442.389610][ T5224]ÂÂÂÂÂÂÂÂkthread+0x1cc/0x1e8
[ÂÂ442.394052][ T5224]ÂÂÂÂÂÂÂÂret_from_fork+0x10/0x18
[ÂÂ442.398835][ T5224]Â
[ÂÂ442.398835][ T5224] -> #1 (slab_mutex){+.+.}:
[ÂÂ442.405365][ T5224]ÂÂÂÂÂÂÂÂlock_acquire+0x31c/0x360
[ÂÂ442.410240][ T5224]ÂÂÂÂÂÂÂÂ__mutex_lock_common+0x16c/0xf78
[ÂÂ442.415722][ T5224]ÂÂÂÂÂÂÂÂmutex_lock_nested+0x40/0x50
[ÂÂ442.420855][ T5224]ÂÂÂÂÂÂÂÂmemcg_create_kmem_cache+0x38/0x16c
[ÂÂ442.426598][ T5224]ÂÂÂÂÂÂÂÂmemcg_kmem_cache_create_func+0x3c/0x70
[ÂÂ442.432687][ T5224]ÂÂÂÂÂÂÂÂprocess_one_work+0x4f4/0x950
[ÂÂ442.437908][ T5224]ÂÂÂÂÂÂÂÂworker_thread+0x390/0x4bc
[ÂÂ442.442868][ T5224]ÂÂÂÂÂÂÂÂkthread+0x1cc/0x1e8
[ÂÂ442.447307][ T5224]ÂÂÂÂÂÂÂÂret_from_fork+0x10/0x18
[ÂÂ442.452090][ T5224]Â
[ÂÂ442.452090][ T5224] -> #0 (mem_hotplug_lock.rw_sem){++++}:
[ÂÂ442.459748][ T5224]ÂÂÂÂÂÂÂÂvalidate_chain+0xd10/0x2bcc
[ÂÂ442.464883][ T5224]ÂÂÂÂÂÂÂÂ__lock_acquire+0x7f4/0xb8c
[ÂÂ442.469930][ T5224]ÂÂÂÂÂÂÂÂlock_acquire+0x31c/0x360
[ÂÂ442.474803][ T5224]ÂÂÂÂÂÂÂÂget_online_mems+0x54/0x150
[ÂÂ442.479850][ T5224]ÂÂÂÂÂÂÂÂshow_slab_objects+0x94/0x3a8
[ÂÂ442.485072][ T5224]ÂÂÂÂÂÂÂÂtotal_objects_show+0x28/0x34
[ÂÂ442.490292][ T5224]ÂÂÂÂÂÂÂÂslab_attr_show+0x38/0x54
[ÂÂ442.495166][ T5224]ÂÂÂÂÂÂÂÂsysfs_kf_seq_show+0x198/0x2d4
[ÂÂ442.500473][ T5224]ÂÂÂÂÂÂÂÂkernfs_seq_show+0xa4/0xcc
[ÂÂ442.505433][ T5224]ÂÂÂÂÂÂÂÂseq_read+0x30c/0x8a8
[ÂÂ442.509958][ T5224]ÂÂÂÂÂÂÂÂkernfs_fop_read+0xa8/0x314
[ÂÂ442.515007][ T5224]ÂÂÂÂÂÂÂÂ__vfs_read+0x88/0x20c
[ÂÂ442.519620][ T5224]ÂÂÂÂÂÂÂÂvfs_read+0xd8/0x10c
[ÂÂ442.524060][ T5224]ÂÂÂÂÂÂÂÂksys_read+0xb0/0x120
[ÂÂ442.528586][ T5224]ÂÂÂÂÂÂÂÂ__arm64_sys_read+0x54/0x88
[ÂÂ442.533634][ T5224]ÂÂÂÂÂÂÂÂel0_svc_handler+0x170/0x240
[ÂÂ442.538768][ T5224]ÂÂÂÂÂÂÂÂel0_svc+0x8/0xc
[ÂÂ442.542858][ T5224]Â
[ÂÂ442.542858][ T5224] other info that might help us debug this:
[ÂÂ442.542858][ T5224]Â
[ÂÂ442.552936][ T5224] Chain exists of:
[ÂÂ442.552936][ T5224]ÂÂÂmem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45
[ÂÂ442.552936][ T5224]Â
[ÂÂ442.565803][ T5224]ÂÂPossible unsafe locking scenario:
[ÂÂ442.565803][ T5224]Â
[ÂÂ442.573105][ T5224]ÂÂÂÂÂÂÂÂCPU0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂCPU1
[ÂÂ442.578322][ T5224]ÂÂÂÂÂÂÂÂ----ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ----
[ÂÂ442.583539][ T5224]ÂÂÂlock(kn->count#45);
[ÂÂ442.587545][ T5224]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂlock(slab_mutex);
[ÂÂ442.593898][ T5224]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂlock(kn->count#45);
[ÂÂ442.600433][ T5224]ÂÂÂlock(mem_hotplug_lock.rw_sem);
[ÂÂ442.605393][ T5224]Â
[ÂÂ442.605393][ T5224]ÂÂ*** DEADLOCK ***
[ÂÂ442.605393][ T5224]Â
[ÂÂ442.613390][ T5224] 3 locks held by cat/5224:
[ÂÂ442.617740][ T5224]ÂÂ#0: 9eff00095b14b2a0 (&p->lock){+.+.}, at:
seq_read+0x4c/0x8a8
[ÂÂ442.625399][ T5224]ÂÂ#1: 0eff008997041480 (&of->mutex){+.+.}, at:
kernfs_seq_start+0x34/0xf0
[ÂÂ442.633842][ T5224]ÂÂ#2: b8ff009693eee398 (kn->count#45){++++}, at:
kernfs_seq_start+0x44/0xf0
[ÂÂ442.642477][ T5224]Â
[ÂÂ442.642477][ T5224] stack backtrace:
[ÂÂ442.648221][ T5224] CPU: 117 PID: 5224 Comm: cat Tainted:
GÂÂÂÂÂÂÂÂÂÂÂÂÂLÂÂÂÂ5.3.0-rc7-mm1+ #6
[ÂÂ442.656826][ T5224] Hardware name: HPE Apollo
70ÂÂÂÂÂÂÂÂÂÂÂÂÂ/C01_APACHE_MBÂÂÂÂÂÂÂÂÂ, BIOS L50_5.13_1.11 06/18/2019
[ÂÂ442.667253][ T5224] Call trace:
[ÂÂ442.670391][ T5224]ÂÂdump_backtrace+0x0/0x248
[ÂÂ442.674743][ T5224]ÂÂshow_stack+0x20/0x2c
[ÂÂ442.678750][ T5224]ÂÂdump_stack+0xd0/0x140
[ÂÂ442.682841][ T5224]ÂÂprint_circular_bug+0x368/0x380
[ÂÂ442.687715][ T5224]ÂÂcheck_noncircular+0x248/0x250
[ÂÂ442.692501][ T5224]ÂÂvalidate_chain+0xd10/0x2bcc
[ÂÂ442.697115][ T5224]ÂÂ__lock_acquire+0x7f4/0xb8c
[ÂÂ442.701641][ T5224]ÂÂlock_acquire+0x31c/0x360
[ÂÂ442.705993][ T5224]ÂÂget_online_mems+0x54/0x150
[ÂÂ442.710519][ T5224]ÂÂshow_slab_objects+0x94/0x3a8
[ÂÂ442.715219][ T5224]ÂÂtotal_objects_show+0x28/0x34
[ÂÂ442.719918][ T5224]ÂÂslab_attr_show+0x38/0x54
[ÂÂ442.724271][ T5224]ÂÂsysfs_kf_seq_show+0x198/0x2d4
[ÂÂ442.729056][ T5224]ÂÂkernfs_seq_show+0xa4/0xcc
[ÂÂ442.733494][ T5224]ÂÂseq_read+0x30c/0x8a8
[ÂÂ442.737498][ T5224]ÂÂkernfs_fop_read+0xa8/0x314
[ÂÂ442.742025][ T5224]ÂÂ__vfs_read+0x88/0x20c
[ÂÂ442.746118][ T5224]ÂÂvfs_read+0xd8/0x10c
[ÂÂ442.750036][ T5224]ÂÂksys_read+0xb0/0x120
[ÂÂ442.754042][ T5224]ÂÂ__arm64_sys_read+0x54/0x88
[ÂÂ442.758569][ T5224]ÂÂel0_svc_handler+0x170/0x240
[ÂÂ442.763180][ T5224]ÂÂel0_svc+0x8/0xc