Re: [PATCH v8 07/26] PM / Domains: Add genpd governor for CPUs
From: Lorenzo Pieralisi
Date: Thu Sep 13 2018 - 11:37:40 EST
On Thu, Aug 30, 2018 at 03:36:02PM +0200, Ulf Hansson wrote:
> On 24 August 2018 at 12:38, Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> wrote:
> > On Fri, Aug 24, 2018 at 11:26:19AM +0200, Ulf Hansson wrote:
> >
> > [...]
> >
> >> > That's a good question and it maybe gives a path towards a solution.
> >> >
> >> > AFAICS the genPD governor only selects the idle state parameter that
> >> > determines the idle state at, say, GenPD cpumask level it does not touch
> >> > the CPUidle decision, that works on a subset of idle states (at cpu
> >> > level).
> >> >
> >> > That's my understanding, which can be wrong so please correct me
> >> > if that's the case because that's a bit confusing.
> >> >
> >> > Let's imagine that we flattened out the list of idle states and feed
> >> > CPUidle with it (all of them - cpu, cluster, package, system - as it is
> >> > in the mainline _now_). Then the GenPD governor can run-through the
> >> > CPUidle selection and _demote_ the idle state if necessary since it
> >> > understands that some CPUs in the GenPD will wake up shortly and break
> >> > the target residency hyphothesis the CPUidle governor is expecting.
> >> >
> >> > The whole idea about this series is improving CPUidle decision when
> >> > the target idle state is _shared_ among groups of cpus (again, please
> >> > do correct me if I am wrong).
> >>
> >> Absolutely, this is one of the main reason for the series!
> >>
> >> >
> >> > It is obvious that a GenPD governor must only demote - never promote a
> >> > CPU idle state selection given that hierarchy implies more power
> >> > savings and higher target residencies required.
> >>
> >> Absolutely. I apologize if I have been using the word "promote"
> >> wrongly, I realize it may be a bit confusing.
> >>
> >> >
> >> > This whole series would become more generic and won't depend on
> >> > PSCI OSI at all - actually that would become a hierarchical
> >> > CPUidle governor.
> >>
> >> Well, to me we need a first user of the new infrastructure code in
> >> genpd and PSCI is probably the easiest one to start with. An option
> >> would be to start with an old ARM32 platform, but it seems a bit silly
> >> to me.
> >
> > If the code can be structured as described above as a hierarchical
> > (possibly optional through a Kconfig entry or sysfs tuning) idle
> > decision you can apply it to _any_ PSCI based platform out there,
> > provided that the new governor improves power savings.
> >
> >> In regards to OS-initiated mode vs platform coordinated mode, let's
> >> discuss that in details in the other email thread instead.
> >
> > I think that's crystal clear by now that IMHO PSCI OS-initiated mode is
> > a red-herring, it has nothing to do with this series, it is there just
> > because QC firmware does not support PSCI platform coordinated suspend
> > mode.
>
> I fully agree that the series isn't specific to PSCI OSI mode. On the
> other hand, for PSCI OSI mode, that's where I see this series to fit
> naturally. And in particular for the QCOM 410c board.
>
> When it comes to the PSCI PC mode, it may under certain circumstances
> be useful to deploy this approach for that as well, and I agree that
> it seems reasonable to have that configurable as opt-in, somehow.
>
> Although, let's discuss that separately, in a next step. Or at least
> let's try to keep PSCI related technical discussions to the other
> thread, as that makes it easier to follow.
>
> >
> > You can apply the concept in this series to _any_ arch provided
> > the power domains representation is correct (and again, I would sound
> > like a broken record but the series must improve power savings over
> > vanilla CPUidle menu governor).
>
> I agree, but let me elaborate a bit, to hopefully add some clarity,
> which I may not have been able to communicate earlier.
>
> The goal with the series is to enable platforms to support all its
> available idlestates, which are shared among a group of CPUs. This is
> the case for QCOM 410c, for example.
>
> To my knowledge, we have other ARM32 based platforms that currently
> have disabled some of its cluster idle states. That's because they
> can't know when it's safe to power off the cluster "coherency domain",
> in cases when the platform also have other shared resources in it.
>
> The point is, to see improved power savings, additional platform
> deployment may be needed and that just takes time. For example runtime
> PM support is needed in those drivers that deals with the "shared
> resources", a correctly modeled PM domain topology using genpd, etc,
> etc.
>
> >
> >> > I still think that PSCI firmware and most certainly mwait() play the
> >> > role the GenPD governor does since they can detect in FW/HW whether
> >> > that's worthwhile to switch off a domain, the information is obviously
> >> > there and the kernel would just add latency to the idle path in that
> >> > case but let's gloss over this for the sake of this discussion.
> >>
> >> Yep, let's discuss that separately.
> >>
> >> That said, can I interpret your comments on the series up until this
> >> change, that you seems rather happy with where the series is going?
> >
> > It is something we have been discussing with Daniel since generic idle
> > was merged for Arm a long while back. I have nothing against describing
> > idle states with power domains but it must improve idle decisions
> > against the mainline. As I said before, runtime PM can also be used
> > to get rid of CPU PM notifiers (because with power domains we KNOW
> > what devices eg PMU are switched off on idle entry, we do not guess
> > any longer; replacing CPU PM notifiers is challenging and can be
> > tackled - if required - in a different series).
>
> Yes, we have be talking about the CPU PM and CPU_CLUSTER_PM notifiers
> and I fully agree. It's something that we should look into and in
> future steps.
>
> >
> > Bottom line (talk is cheap, I know and apologise about that): this
> > series (up until this change) adds complexity to the idle path and lots
> > of code; if its usage is made optional and can be switched on on systems
> > where it saves power that's fine by me as long as we keep PSCI
> > OS-initiated idle states out of the equation, that's an orthogonal
> > discussion as, I hope, I managed to convey.
> >
> > Thanks,
> > Lorenzo
>
> Lorenzo, thanks for your feedback!
>
> Please, when you have time, could you also reply to the other thread
> we started, I would like to understand how I should proceed with this
> series.
OK, thanks, I will, sorry for the delay in responding.
Lorenzo