Re: [RFC][PATCH v0.1 3/6] PM: EM: Add special case to em_dev_register_perf_domain()

From: Rafael J. Wysocki
Date: Tue Nov 19 2024 - 09:04:15 EST


On Mon, Nov 18, 2024 at 4:25 PM Hongyan Xia <hongyan.xia2@xxxxxxx> wrote:
>
> On 08/11/2024 16:38, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > Allow em_dev_register_perf_domain() to register a cost-only stub
> > perf domain with one-element states table if the .active_power()
> > callback is not provided.
> >
> > Subsequently, this will be used by the intel_pstate driver to register
> > stub perf domains for CPUs on hybrid systems.
> >
> > No intentional functional impact.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > ---
> > kernel/power/energy_model.c | 26 +++++++++++++++++++++++---
> > 1 file changed, 23 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/kernel/power/energy_model.c
> > ===================================================================
> > --- linux-pm.orig/kernel/power/energy_model.c
> > +++ linux-pm/kernel/power/energy_model.c
> > @@ -426,9 +426,11 @@ static int em_create_pd(struct device *d
> > if (!em_table)
> > goto free_pd;
> >
> > - ret = em_create_perf_table(dev, pd, em_table->state, cb, flags);
> > - if (ret)
> > - goto free_pd_table;
> > + if (cb->active_power) {
> > + ret = em_create_perf_table(dev, pd, em_table->state, cb, flags);
> > + if (ret)
> > + goto free_pd_table;
> > + }
> >
> > ret = em_compute_costs(dev, em_table->state, cb, nr_states, flags);
> > if (ret)
> > @@ -561,11 +563,20 @@ int em_dev_register_perf_domain(struct d
> > {
> > unsigned long cap, prev_cap = 0;
> > unsigned long flags = 0;
> > + bool stub_pd = false;
> > int cpu, ret;
> >
> > if (!dev || !nr_states || !cb)
> > return -EINVAL;
> >
> > + if (!cb->active_power) {
> > + if (!cb->get_cost || nr_states > 1 || microwatts)
> > + return -EINVAL;
> > +
> > + /* Special case: a stub perf domain. */
> > + stub_pd = true;
> > + }
> > +
>
> I wonder if the only purpose of stub_pd is to just skip the capacity
> check below, which doesn't look very nice.

It is.

I guess I could just skip it if nr_states == 1 because that case means
the same cost for all frequency values.

>
> I may be echoing Dietmar's comments here. What's the problem of just
> having 3 domains?

The energy-efficiency of a CPU is not strictly related to its capacity.

It's about the cases when there are some special CPUs that can
turbo-up higher, but there's no other difference between them and the
other CPUs in the domain.

> Or, could you just specify the same capacities so that the same-capacity
> check won't fail, but just to use hardware load or CPU pressure to model
> the slight difference in real capacities? This way you'd re-use a lot of
> existing infrastructure.

That would have been confusing though, so thanks, but no thanks.

> > /*
> > * Use a mutex to serialize the registration of performance domains and
> > * let the driver-defined callback functions sleep.
> > @@ -590,6 +601,15 @@ int em_dev_register_perf_domain(struct d
> > ret = -EEXIST;
> > goto unlock;
> > }
> > +
> > + /*
> > + * The capacity need not be the same for all CPUs in a
> > + * stub perf domain, so long as the average cost of
> > + * running on each of them is approximately the same.
> > + */
> > + if (stub_pd)
> > + continue;
> > +
> > /*
> > * All CPUs of a domain must have the same
> > * micro-architecture since they all share the same
> >
> >
> >
>
>