Re: [PATCH v3 0/4] Allow genpd providers to power off domains on sync state

From: Ulf Hansson
Date: Wed Apr 05 2023 - 10:12:34 EST


Abel, Saravana,

On Fri, 31 Mar 2023 at 06:59, Abel Vesa <abel.vesa@xxxxxxxxxx> wrote:
>
> On 23-03-30 12:50:44, Saravana Kannan wrote:
> > On Thu, Mar 30, 2023 at 4:27 AM Abel Vesa <abel.vesa@xxxxxxxxxx> wrote:
> > >
> > > On 23-03-27 17:17:28, Saravana Kannan wrote:
> > > > On Mon, Mar 27, 2023 at 12:38 PM Abel Vesa <abel.vesa@xxxxxxxxxx> wrote:
> > > > >
> > > > > There have been already a couple of tries to make the genpd "disable
> > > > > unused" late initcall skip the powering off of domains that might be
> > > > > needed until later on (i.e. until some consumer probes). The conclusion
> > > > > was that the provider could return -EBUSY from the power_off callback
> > > > > until the provider's sync state has been reached. This patch series tries
> > > > > to provide a proof-of-concept that is working on Qualcomm platforms.
> > > >
> > > > I'm giving my thoughts in the cover letter instead of spreading it
> > > > around all the patches so that there's context between the comments.
> > > >
> > > > 1) Why can't all the logic in this patch series be implemented at the
> > > > framework level? And then allow the drivers to opt into this behavior
> > > > by setting the sync_state() callback.
> > > >
> > > > That way, you can land it only for QC drivers by setting up
> > > > sync_state() callback only for QC drivers, but actually have the same
> > > > code function correctly for non-QC drivers too. And then once we have
> > > > this functionality working properly for QC drivers for one kernel
> > > > version (or two), we'll just have the framework set the device's
> > > > driver's sync_state() if it doesn't have one already.
> > >
> > > I think Ulf has already NACK'ed that approach here:
> > > [1] https://lore.kernel.org/lkml/CAPDyKFon35wcQ+5kx3QZb-awN_S_q8y1Sir-G+GoxkCvpN=iiA@xxxxxxxxxxxxxx/
> >
> > I would have NACK'ed that too because that's an incomplete fix. As I
> > said further below, the fix needs to be at the aggregation level where
> > you aggregate all the current consumer requests. In there, you need to
> > add in the "state at boot" input that gets cleared out after a
> > sync_state() call is received for that power domain.
> >
>
> So, just to make sure I understand your point. You would rather have the
> genpd_power_off check if 'state at boot' is 'on' and return busy and
> then clear then, via a generic genpd sync state you would mark 'state at
> boot' as 'off' and queue up a power off request for each PD from there.
> And as for 'state at boot' it would check the enable bit through
> provider.
>
> Am I right so far?

I am not sure I completely follow what you are suggesting here.

Although, let me point out that there is no requirement from the genpd
API point of view, that the provider needs to be a driver. This means
that the sync_state callback may not even be applicable for all genpd
providers.

In other words, it looks to me that we may need some new genpd helper
functions, no matter what. More importantly, it looks like we need an
opt-in behaviour, unless we can figure out a common way for genpd to
understand whether the sync_state thing is going to be applicable or
not. Maybe Saravana has some ideas around this?

Note that, I don't object to extending genpd to be more clever and to
share common code, of course. We could, for example, make
genpd_power_off() to bail out earlier, rather than calling the
->power_off() callback and waiting for it to return -EBUSY. Both of
you have pointed this out to me, in some of the earlier
replies/discussions too.

>
> > > And suggested this new approach that this patch series proposes.
> > > (Unless I missunderstood his point)
> > >
> > > >
> > > > 2) sync_state() is not just about power on/off. It's also about the
> > > > power domain level. Can you handle that too please?
> > >
> > > Well, this patchset only tries to delay the disabling of unused power
> > > domains until all consumers have had a chance to probe. So we use sync
> > > state only to queue up a power-off request to make sure those unused
> > > ones get disabled.
> >
> > Sure, but the design is completely unusable for a more complete
> > sync_state() behavior. I'm okay if you want to improve the
> > sync_state() behavior in layers, but don't do it in a way where the
> > current design will definitely not work for what you want to add in
> > the future.
>
> But you would still be OK with the qcom_cc sync state wrapper, I guess,
> right? Your concern is only about the sync state callback being not
> genpd generic one, AFAIU.
>
> >
> > > >
> > > > 3) In your GDSC drivers, it's not clear to me if you are preventing
> > > > power off until sync_state() only for GDSCs that were already on at
> > > > boot. So if an off-at-boot GDSC gets turned on, and then you attempt
> > > > to turn it off before all its consumers have probed, it'll fail to
> > > > power it off even though that wasn't necessary?
> > >
> > > I think we can circumvent looking at a GDSC by knowing it there was ever
> > > a power on request since boot. I'll try to come up with something in the
> > > new version.
> >
> > Please no. There's nothing wrong with reading the GDSC values. Please
> > read them and don't turn on GDSC's that weren't on at boot.
>
> Sorry for the typos above, I basically said that for this concern of
> yours, we can add the 'state at boot' thing you mentioned above by
> looking at the GDSC (as in reading reg).
>
> >
> > Otherwise you are making it a hassle for the case where there is a
> > consumer without a driver for a GDSC that was off at boot. You are now
> > forcing the use of timeouts or writing to state_synced file. Those
> > should be absolute last resorts, but you are making that a requirement
> > with your current implementation. If you implement it correctly by
> > reading the GDSC register, things will "just work". And it's not even
> > hard to do.
> >
> > NACK'ed until this is handled correctly.
> >
> > >
> > > >
> > > > 4) The returning -EBUSY when a power off is attempted seems to be
> > > > quite wasteful. The framework will go through the whole sequence of
> > > > trying to power down, send the notifications and then fail and then
> > > > send the undo notifications. Combined with point (2) I think this can
> > > > be handled better at the aggregation level in the framework to avoid
> > > > even going that far into the power off sequence.
> > >
> > > Again, have a look at [1] (above).
> >
> > See my reply above. If you do it properly at the framework level, this
> > can be done in a clean way and will work for all power domains.
> >
> > -Saravana
> >
> > >
> > > Ulf, any thoughts on this 4th point?

Please, see my reply above.

[...]

Kind regards
Uffe