Re: [PATCH v2 4/5] clk: Get runtime PM before walking tree during disable_unused

From: Stephen Boyd
Date: Sun Apr 07 2024 - 22:37:01 EST


Quoting Stephen Boyd (2024-03-25 11:41:58)
> Doug reported [1] the following hung task:
>
> INFO: task swapper/0:1 blocked for more than 122 seconds.
> Not tainted 5.15.149-21875-gf795ebc40eb8 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00000008
> Call trace:
> __switch_to+0xf4/0x1f4
> __schedule+0x418/0xb80
> schedule+0x5c/0x10c
> rpm_resume+0xe0/0x52c
> rpm_resume+0x178/0x52c
> __pm_runtime_resume+0x58/0x98
> clk_pm_runtime_get+0x30/0xb0
> clk_disable_unused_subtree+0x58/0x208
> clk_disable_unused_subtree+0x38/0x208
> clk_disable_unused_subtree+0x38/0x208
> clk_disable_unused_subtree+0x38/0x208
> clk_disable_unused_subtree+0x38/0x208
> clk_disable_unused+0x4c/0xe4
> do_one_initcall+0xcc/0x2d8
> do_initcall_level+0xa4/0x148
> do_initcalls+0x5c/0x9c
> do_basic_setup+0x24/0x30
> kernel_init_freeable+0xec/0x164
> kernel_init+0x28/0x120
> ret_from_fork+0x10/0x20
> INFO: task kworker/u16:0:9 blocked for more than 122 seconds.
> Not tainted 5.15.149-21875-gf795ebc40eb8 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u16:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00000008
> Workqueue: events_unbound deferred_probe_work_func
> Call trace:
> __switch_to+0xf4/0x1f4
> __schedule+0x418/0xb80
> schedule+0x5c/0x10c
> schedule_preempt_disabled+0x2c/0x48
> __mutex_lock+0x238/0x488
> __mutex_lock_slowpath+0x1c/0x28
> mutex_lock+0x50/0x74
> clk_prepare_lock+0x7c/0x9c
> clk_core_prepare_lock+0x20/0x44
> clk_prepare+0x24/0x30
> clk_bulk_prepare+0x40/0xb0
> mdss_runtime_resume+0x54/0x1c8
> pm_generic_runtime_resume+0x30/0x44
> __genpd_runtime_resume+0x68/0x7c
> genpd_runtime_resume+0x108/0x1f4
> __rpm_callback+0x84/0x144
> rpm_callback+0x30/0x88
> rpm_resume+0x1f4/0x52c
> rpm_resume+0x178/0x52c
> __pm_runtime_resume+0x58/0x98
> __device_attach+0xe0/0x170
> device_initial_probe+0x1c/0x28
> bus_probe_device+0x3c/0x9c
> device_add+0x644/0x814
> mipi_dsi_device_register_full+0xe4/0x170
> devm_mipi_dsi_device_register_full+0x28/0x70
> ti_sn_bridge_probe+0x1dc/0x2c0
> auxiliary_bus_probe+0x4c/0x94
> really_probe+0xcc/0x2c8
> __driver_probe_device+0xa8/0x130
> driver_probe_device+0x48/0x110
> __device_attach_driver+0xa4/0xcc
> bus_for_each_drv+0x8c/0xd8
> __device_attach+0xf8/0x170
> device_initial_probe+0x1c/0x28
> bus_probe_device+0x3c/0x9c
> deferred_probe_work_func+0x9c/0xd8
> process_one_work+0x148/0x518
> worker_thread+0x138/0x350
> kthread+0x138/0x1e0
> ret_from_fork+0x10/0x20
>
> The first thread is walking the clk tree and calling
> clk_pm_runtime_get() to power on devices required to read the clk
> hardware via struct clk_ops::is_enabled(). This thread holds the clk
> prepare_lock, and is trying to runtime PM resume a device, when it finds
> that the device is in the process of resuming so the thread schedule()s
> away waiting for the device to finish resuming before continuing. The
> second thread is runtime PM resuming the same device, but the runtime
> resume callback is calling clk_prepare(), trying to grab the
> prepare_lock waiting on the first thread.
>
> This is a classic ABBA deadlock. To properly fix the deadlock, we must
> never runtime PM resume or suspend a device with the clk prepare_lock
> held. Actually doing that is near impossible today because the global
> prepare_lock would have to be dropped in the middle of the tree, the
> device runtime PM resumed/suspended, and then the prepare_lock grabbed
> again to ensure consistency of the clk tree topology. If anything
> changes with the clk tree in the meantime, we've lost and will need to
> start the operation all over again.
>
> Luckily, most of the time we're simply incrementing or decrementing the
> runtime PM count on an active device, so we don't have the chance to
> schedule away with the prepare_lock held. Let's fix this immediate
> problem that can be triggered more easily by simply booting on Qualcomm
> sc7180.
>
> Introduce a list of clk_core structures that have been registered, or
> are in the process of being registered, that require runtime PM to
> operate. Iterate this list and call clk_pm_runtime_get() on each of them
> without holding the prepare_lock during clk_disable_unused(). This way
> we can be certain that the runtime PM state of the devices will be
> active and resumed so we can't schedule away while walking the clk tree
> with the prepare_lock held. Similarly, call clk_pm_runtime_put() without
> the prepare_lock held to properly drop the runtime PM reference. We
> remove the calls to clk_pm_runtime_{get,put}() in this path because
> they're superfluous now that we know the devices are runtime resumed.
>
> Reported-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> Closes: https://lore.kernel.org/all/20220922084322.RFC.2.I375b6b9e0a0a5348962f004beb3dafee6a12dfbb@changeid/ [1]
> Closes: https://issuetracker.google.com/328070191
> Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Cc: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> Cc: Krzysztof Kozlowski <krzk@xxxxxxxxxx>
> Fixes: 9a34b45397e5 ("clk: Add support for runtime PM")
> Signed-off-by: Stephen Boyd <sboyd@xxxxxxxxxx>
> ---

Applied to clk-fixes