Re: [PATCH v2] opp: Power on (virtual) power domains managed by the OPP core

From: Viresh Kumar
Date: Fri Sep 11 2020 - 05:26:07 EST


On 11-09-20, 10:34, Stephan Gerhold wrote:
> On Fri, Aug 28, 2020 at 11:57:28AM +0200, Stephan Gerhold wrote:
> > It seems to me that there is more work needed to make such a use case
> > really work, but it's hard to speculate without a real example.
> >
>
> So it seems like we have a real example now. :)

We told you :)

> As mentioned in my other mail [1] it turns out I actually have such a
> use case. I briefly explained it in [2], basically the clock that
> provides higher CPU frequencies has some voltage requirements that
> should be voted for using a power domain.
>
> The clock that provides the lower CPU frequencies has no such
> requirement, so I need to scale (and power on) a power domain only for
> some of the OPPs.
>
> [1]: https://lore.kernel.org/linux-pm/20200831154938.GA33622@xxxxxxxxxxx/
> [2]: https://lore.kernel.org/linux-arm-msm/20200910162610.GA7008@xxxxxxxxxxx/
>
> So I think it would be good to discuss this use case first before we
> decide on this patch (how to enable power domains managed by the OPP
> core). I think there are two problems that need to be solved:
>
> 1. How can we drop our performance state votes for some of the OPPs?
> I explained that problem earlier already:
>
> >
> > I was thinking about something like that, but can you actually drop
> > your performance state vote for one of the power domains using
> > "required-opps"?
> >
> > At the moment it does not seem possible. I tried adding a special OPP
> > using opp-level = <0> to reference that from required-opps, but the OPP
> > core does not allow this:
> >
> > vddcx: Not all nodes have performance state set (7: 6)
> > vddcx: Failed to add OPP table for index 0: -2

This should fix it.

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 214c1619b445..2483e765318a 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -2117,9 +2117,6 @@ int dev_pm_opp_xlate_performance_state(struct opp_table *src_table,
int dest_pstate = -EINVAL;
int i;

- if (!pstate)
- return 0;
-
/*
* Normally the src_table will have the "required_opps" property set to
* point to one of the OPPs in the dst_table, but in some cases the
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index e72753be7dc7..1a9cb96484bb 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -842,7 +842,7 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
{
struct device_node *np;
- int ret, count = 0, pstate_count = 0;
+ int ret, count = 0;
struct dev_pm_opp *opp;

/* OPP table is already initialized for the device */
@@ -876,20 +876,13 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
goto remove_static_opp;
}

- list_for_each_entry(opp, &opp_table->opp_list, node)
- pstate_count += !!opp->pstate;
-
- /* Either all or none of the nodes shall have performance state set */
- if (pstate_count && pstate_count != count) {
- dev_err(dev, "Not all nodes have performance state set (%d: %d)\n",
- count, pstate_count);
- ret = -ENOENT;
- goto remove_static_opp;
+ list_for_each_entry(opp, &opp_table->opp_list, node) {
+ if (opp->pstate) {
+ opp_table->genpd_performance_state = true;
+ break;
+ }
}

- if (pstate_count)
- opp_table->genpd_performance_state = true;
-
return 0;

remove_static_opp:


> Not sure if it makes sense but I think somehow allowing the additional
> opp-level = <0> would be a simple solution. For example:
>
> rpmpd: power-controller {
> rpmpd_opp_table: opp-table {
> compatible = "operating-points-v2";
>
> /*
> * This one can be referenced to drop the performance state
> * vote within required-opps.
> */
> rpmpd_opp_none: opp0 {
> opp-level = <0>;
> };
>
> rpmpd_opp_retention: opp1 {
> opp-level = <1>;
> };
>
> /* ... */
> };
> };
>
> cpu_opp_table: cpu-opp-table {
> compatible = "operating-points-v2";
> opp-shared;
>
> /* Power domain is only needed for frequencies >= 998 MHz */
> opp-200000000 {
> opp-hz = /bits/ 64 <200000000>;
> required-opps = <&rpmpd_opp_none>; /* = drop perf state */
> };
> opp-998400000 {
> opp-hz = /bits/ 64 <998400000>;
> required-opps = <&rpmpd_opp_svs_soc>;
> };
> opp-1209600000 {
> opp-hz = /bits/ 64 <1209600000>;
> required-opps = <&rpmpd_opp_nominal>;
> };
> };

Yes, makes sense.

> 2. Where/when to enable the power domains: I need to enable the power
> domain whenever I have a vote for a performance state. In the example
> above the power domain should get enabled for >= 998 MHz and disabled
> otherwise.

- Why do you need to enable these power domains like this ?
- What will happen if you don't enable them at all ?
- What will happen if you enable them for ever ?

AFAIU, these are kind of virtual domains which are there just to vote in behalf
of the OS. Only if the accumulated vote is greater than zero, the actual power
domain will start consuming power. Otherwise it should be disabled.

Or is that wrong ?

> At least for the CPUFreq case the "virt_devs" parameter does not
> really help in this case...
> dev_pm_opp_set_rate() is called within cpufreq-dt which is supposed
> to be generic. So I can't enable the power domains myself from there.
> Even if I made a custom cpufreq driver that has control over the
> dev_pm_opp_set_rate() call - I don't really know exactly in the
> driver for which frequencies a power domain is needed.
>
> On the other hand, the OPP core does have that information.
> This brings me back to my PATCH v1 (where I used runtime PM functions
> instead of device links). If I modify it to enable the power domain
> whenever we have a performance state vote > 0 when setting an OPP,
> it would do exactly what I need...
>
> I don't think it makes sense to do performance state votes without
> enabling a power domain, so this approach sounds good to me...

--
viresh