Re: [PATCH v2] block/mq: fix potential deadlock during cpu hotplug

From: Jens Axboe
Date: Sun May 07 2017 - 21:51:23 EST


On 05/07/2017 01:14 AM, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>
> This can be triggered by hot-unplug one cpu.
>
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 4.11.0+ #17 Not tainted
> -------------------------------------------------------
> step_after_susp/2640 is trying to acquire lock:
> (all_q_mutex){+.+...}, at: [<ffffffffb33f95b8>] blk_mq_queue_reinit_work+0x18/0x110
>
> but task is already holding lock:
> (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (cpu_hotplug.lock){+.+.+.}:
> lock_acquire+0x11c/0x230
> __mutex_lock+0x92/0x990
> mutex_lock_nested+0x1b/0x20
> get_online_cpus+0x64/0x80
> blk_mq_init_allocated_queue+0x3a0/0x4e0
> blk_mq_init_queue+0x3a/0x60
> loop_add+0xe5/0x280
> loop_init+0x124/0x177
> do_one_initcall+0x53/0x1c0
> kernel_init_freeable+0x1e3/0x27f
> kernel_init+0xe/0x100
> ret_from_fork+0x31/0x40
>
> -> #0 (all_q_mutex){+.+...}:
> __lock_acquire+0x189a/0x18a0
> lock_acquire+0x11c/0x230
> __mutex_lock+0x92/0x990
> mutex_lock_nested+0x1b/0x20
> blk_mq_queue_reinit_work+0x18/0x110
> blk_mq_queue_reinit_dead+0x1c/0x20
> cpuhp_invoke_callback+0x1f2/0x810
> cpuhp_down_callbacks+0x42/0x80
> _cpu_down+0xb2/0xe0
> freeze_secondary_cpus+0xb6/0x390
> suspend_devices_and_enter+0x3b3/0xa40
> pm_suspend+0x129/0x490
> state_store+0x82/0xf0
> kobj_attr_store+0xf/0x20
> sysfs_kf_write+0x45/0x60
> kernfs_fop_write+0x135/0x1c0
> __vfs_write+0x37/0x160
> vfs_write+0xcd/0x1d0
> SyS_write+0x58/0xc0
> do_syscall_64+0x8f/0x710
> return_from_SYSCALL_64+0x0/0x7a
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(cpu_hotplug.lock);
> lock(all_q_mutex);
> lock(cpu_hotplug.lock);
> lock(all_q_mutex);
>
> *** DEADLOCK ***
>
> 8 locks held by step_after_susp/2640:
> #0: (sb_writers#6){.+.+.+}, at: [<ffffffffb3244aed>] vfs_write+0x1ad/0x1d0
> #1: (&of->mutex){+.+.+.}, at: [<ffffffffb32d3a51>] kernfs_fop_write+0x101/0x1c0
> #2: (s_active#166){.+.+.+}, at: [<ffffffffb32d3a59>] kernfs_fop_write+0x109/0x1c0
> #3: (pm_mutex){+.+...}, at: [<ffffffffb30d2ecd>] pm_suspend+0x21d/0x490
> #4: (acpi_scan_lock){+.+.+.}, at: [<ffffffffb34dc3d7>] acpi_scan_lock_acquire+0x17/0x20
> #5: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffffb306d6d7>] freeze_secondary_cpus+0x27/0x390
> #6: (cpu_hotplug.dep_map){++++++}, at: [<ffffffffb306cfd5>] cpu_hotplug_begin+0x5/0xe0
> #7: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0
>
> stack backtrace:
> CPU: 3 PID: 2640 Comm: step_after_susp Not tainted 4.11.0+ #17
> Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS 1.4.9 09/12/2016
> Call Trace:
> dump_stack+0x99/0xce
> print_circular_bug+0x1fa/0x270
> __lock_acquire+0x189a/0x18a0
> lock_acquire+0x11c/0x230
> ? lock_acquire+0x11c/0x230
> ? blk_mq_queue_reinit_work+0x18/0x110
> ? blk_mq_queue_reinit_work+0x18/0x110
> __mutex_lock+0x92/0x990
> ? blk_mq_queue_reinit_work+0x18/0x110
> ? kmem_cache_free+0x2cb/0x330
> ? anon_transport_class_unregister+0x20/0x20
> ? blk_mq_queue_reinit_work+0x110/0x110
> mutex_lock_nested+0x1b/0x20
> ? mutex_lock_nested+0x1b/0x20
> blk_mq_queue_reinit_work+0x18/0x110
> blk_mq_queue_reinit_dead+0x1c/0x20
> cpuhp_invoke_callback+0x1f2/0x810
> ? __flow_cache_shrink+0x160/0x160
> cpuhp_down_callbacks+0x42/0x80
> _cpu_down+0xb2/0xe0
> freeze_secondary_cpus+0xb6/0x390
> suspend_devices_and_enter+0x3b3/0xa40
> ? rcu_read_lock_sched_held+0x79/0x80
> pm_suspend+0x129/0x490
> state_store+0x82/0xf0
> kobj_attr_store+0xf/0x20
> sysfs_kf_write+0x45/0x60
> kernfs_fop_write+0x135/0x1c0
> __vfs_write+0x37/0x160
> ? rcu_read_lock_sched_held+0x79/0x80
> ? rcu_sync_lockdep_assert+0x2f/0x60
> ? __sb_start_write+0xd9/0x1c0
> ? vfs_write+0x1ad/0x1d0
> vfs_write+0xcd/0x1d0
> SyS_write+0x58/0xc0
> ? rcu_read_lock_sched_held+0x79/0x80
> do_syscall_64+0x8f/0x710
> ? trace_hardirqs_on_thunk+0x1a/0x1c
> entry_SYSCALL64_slow_path+0x25/0x25
>
> The cpu hotplug path will hold cpu_hotplug.lock and then reinit all exiting
> queues for blk mq w/ all_q_mutex, however, blk_mq_init_allocated_queue() will
> contend these two locks in the inversion order. This is due to commit eabe06595d62
> (blk/mq: Cure cpu hotplug lock inversion), it fixes a cpu hotplug lock inversion
> issue because of hotplug rework, however the hotplug rework is still work-in-progress
> and lives in a -tip branch and mainline cannot yet trigger that splat. The commit
> breaks the linus's tree in the merge window, so this patch reverts the lock order
> and avoids to splat linus's tree.

Gah, so it was actively harmful before the hotplug rework got in... I've
added this for pre-rc1.

--
Jens Axboe