Re: [PATCH v2 0/5] Fix a deadlock with clk_pm_runtime_get()
From: Stephen Boyd
Date: Mon Mar 25 2024 - 14:52:38 EST
Quoting Stephen Boyd (2024-03-25 11:41:54)
> This patch series fixes a deadlock reported[1] on ChromeOS devices
> (Qualcomm sc7180 Trogdor). To get there, we allow __clk_release() to run
> without the prepare_lock held. Then we add runtime PM enabled clk_core
> structs to a list that we iterate and enable runtime PM for each entry
> before grabbing the prepare_lock to walk the clk tree. The details are
> in patch #4.
>
> The patch after that is based on the analysis in the disable unused
> patch. We similarly resume devices from runtime suspend when walking the
> clk tree for the debugfs clk_summary.
>
> Unfortunately this doesn't fix all problems with the usage of runtime PM
> in the clk framework. We still have a problem if preparing a clk happens
> in parallel to the device providing that clk runtime resuming or
> suspending. In that case, the task will go to sleep waiting for the
> runtime PM state to change, and we'll deadlock. This is primarily a
> problem with the global prepare_lock. I suspect we'll be able to fix
> this by implementing per-clk locking, because then we will be able to
> split up the big prepare_lock into smaller locks that don't deadlock on
> some device runtime PM transitions.
>
> I'll start working on that problem in earnest now because I'm worried
> we're going to run into that problem very soon.
>
> Changes from v1 (https://lore.kernel.org/r/):
Oops this is https://lore.kernel.org/r/20240325054403.592298-1-sboyd@kernelorg