Re: [PATCH v6 1/2] power: domain: handle genpd correctly when needing interrupts

From: Ulf Hansson
Date: Wed Aug 24 2022 - 09:31:37 EST


On Mon, 22 Aug 2022 at 10:38, Martin Kepplinger
<martin.kepplinger@xxxxxxx> wrote:
>
> Am Freitag, dem 19.08.2022 um 16:53 +0200 schrieb Ulf Hansson:
> > On Fri, 19 Aug 2022 at 11:17, Martin Kepplinger
> > <martin.kepplinger@xxxxxxx> wrote:
> > >
> > > Am Dienstag, dem 26.07.2022 um 17:07 +0200 schrieb Ulf Hansson:
> > > > On Tue, 26 Jul 2022 at 10:33, Martin Kepplinger
> > > > <martin.kepplinger@xxxxxxx> wrote:
> > > > >
> > > > > If for example the power-domains' power-supply node (regulator)
> > > > > needs
> > > > > interrupts to work, the current setup with noirq callbacks
> > > > > cannot
> > > > > work; for example a pmic regulator on i2c, when suspending,
> > > > > usually
> > > > > already
> > > > > times out during suspend_noirq:
> > > > >
> > > > > [ 41.024193] buck4: failed to disable: -ETIMEDOUT
> > > > >
> > > > > So fix system suspend and resume for these power-domains by
> > > > > using
> > > > > the
> > > > > "outer" suspend/resume callbacks instead. Tested on the imx8mq-
> > > > > librem5 board,
> > > > > but by looking at the dts, this will fix imx8mq-evk and
> > > > > possibly
> > > > > many other
> > > > > boards too.
> > > > >
> > > > > This is designed so that genpd providers just say "this genpd
> > > > > needs
> > > > > interrupts" (by setting the flag) - without implying an
> > > > > implementation.
> > > > >
> > > > > Initially system suspend problems had been discussed at
> > > > > https://lore.kernel.org/linux-arm-kernel/20211002005954.1367653-8-l.stach@xxxxxxxxxxxxxx/
> > > > > which led to discussing the pmic that contains the regulators
> > > > > which
> > > > > serve as power-domain power-supplies:
> > > > > https://lore.kernel.org/linux-pm/573166b75e524517782471c2b7f96e03fd93d175.camel@xxxxxxx/T/
> > > > >
> > > > > Signed-off-by: Martin Kepplinger <martin.kepplinger@xxxxxxx>
> > > > > ---
> > > > > drivers/base/power/domain.c | 13 +++++++++++--
> > > > > include/linux/pm_domain.h | 5 +++++
> > > > > 2 files changed, 16 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/base/power/domain.c
> > > > > b/drivers/base/power/domain.c
> > > > > index 5a2e0232862e..58376752a4de 100644
> > > > > --- a/drivers/base/power/domain.c
> > > > > +++ b/drivers/base/power/domain.c
> > > > > @@ -130,6 +130,7 @@ static const struct genpd_lock_ops
> > > > > genpd_spin_ops = {
> > > > > #define genpd_is_active_wakeup(genpd) (genpd->flags &
> > > > > GENPD_FLAG_ACTIVE_WAKEUP)
> > > > > #define genpd_is_cpu_domain(genpd) (genpd->flags &
> > > > > GENPD_FLAG_CPU_DOMAIN)
> > > > > #define genpd_is_rpm_always_on(genpd) (genpd->flags &
> > > > > GENPD_FLAG_RPM_ALWAYS_ON)
> > > > > +#define genpd_irq_on(genpd) (genpd->flags &
> > > > > GENPD_FLAG_IRQ_ON)
> > > > >
> > > > > static inline bool irq_safe_dev_in_sleep_domain(struct device
> > > > > *dev,
> > > > > const struct generic_pm_domain *genpd)
> > > > > @@ -2065,8 +2066,15 @@ int pm_genpd_init(struct
> > > > > generic_pm_domain
> > > > > *genpd,
> > > > > genpd->domain.ops.runtime_suspend =
> > > > > genpd_runtime_suspend;
> > > > > genpd->domain.ops.runtime_resume =
> > > > > genpd_runtime_resume;
> > > > > genpd->domain.ops.prepare = genpd_prepare;
> > > > > - genpd->domain.ops.suspend_noirq = genpd_suspend_noirq;
> > > > > - genpd->domain.ops.resume_noirq = genpd_resume_noirq;
> > > > > +
> > > > > + if (genpd_irq_on(genpd)) {
> > > > > + genpd->domain.ops.suspend =
> > > > > genpd_suspend_noirq;
> > > > > + genpd->domain.ops.resume = genpd_resume_noirq;
> > > > > + } else {
> > > > > + genpd->domain.ops.suspend_noirq =
> > > > > genpd_suspend_noirq;
> > > > > + genpd->domain.ops.resume_noirq =
> > > > > genpd_resume_noirq;
> > > >
> > > > As we discussed previously, I am thinking that it may be better
> > > > to
> > > > move to using genpd->domain.ops.suspend_late and
> > > > genpd->domain.ops.resume_early instead.
> > >
> > > Wouldn't that better be a separate patch (on top)? Do you really
> > > want
> > > me to change the current behaviour (default case) to from noirq to
> > > late? Then I'll resend this series with such a patch added.
> >
> > Sorry, I wasn't clear enough, the default behaviour should remain as
> > is.
> >
> > What I meant was, when genpd_irq_on() is true, we should use the
> > genpd->domain.ops.suspend_late and genpd->domain.ops.resume_early.
>
> Testing that shows that this isn't working. I can provide the logs
> later, but suspend fails and I think it makes sense: "suspend_late" is
> simply already too late when i2c (or any needed driver) uses "suspend".

Okay, I see.

The reason why I suggested moving the callbacks to "suspend_late", was
that I was worried that some of the attached devices to genpd could
use "suspend_late" themselves. This is the case for some drivers for
DMA/clock/gpio/pinctrl-controllers, for example. That said, I am
curious to look at the DT files for the platform you are running,
would you mind giving me a pointer?

So, this made me think about this a bit more. In the end, just using
different levels (suspend, suspend_late, suspend_noirq) of callbacks
are just papering over the real *dependency* problem.

What we need for the genpd provider driver, is to be asked to be
suspended under the following conditions:
1. All consumer devices (and child-domains) for its corresponding PM
domain have been suspended.
2. All its supplier devices supplies must remain resumed, until the
genpd provider has been suspended.

Please allow me a few more days to think in more detail about this.

In some way, it looks like we should be able to combine the
information genpd has about its devices and child-domains, use PM
callbacks for the genpd provider driver - so we can rely on the
depency-path the fw_devlinks would give us for its supplier devices.

Kind regards
Uffe