Re: [PATCH 1/4] PM / EM: and devices to Energy Model

From: Lukasz Luba
Date: Mon Jan 20 2020 - 09:52:22 EST


Hi Quentin,

On 1/17/20 10:54 AM, Quentin Perret wrote:
Hey Lukasz,

Still reading through this, but with small changes, this looks pretty
good to me.

On Thursday 16 Jan 2020 at 15:20:29 (+0000), lukasz.luba@xxxxxxx wrote:
+int em_register_perf_domain(struct device *dev, unsigned int nr_states,
+ struct em_data_callback *cb)
{
unsigned long cap, prev_cap = 0;
struct em_perf_domain *pd;
- int cpu, ret = 0;
+ struct em_device *em_dev;
+ cpumask_t *span = NULL;
+ int cpu, ret;
- if (!span || !nr_states || !cb)
+ if (!dev || !nr_states || !cb || !cb->active_power)

Nit: you check !cb->active_power in em_create_pd() too I think, so only
one of the two is needed.

good point, thanks


return -EINVAL;
- /*
- * Use a mutex to serialize the registration of performance domains and
- * let the driver-defined callback functions sleep.
- */
mutex_lock(&em_pd_mutex);
- for_each_cpu(cpu, span) {
- /* Make sure we don't register again an existing domain. */
- if (READ_ONCE(per_cpu(em_data, cpu))) {
+ if (_is_cpu_device(dev)) {
+ span = kzalloc(cpumask_size(), GFP_KERNEL);
+ if (!span) {
+ mutex_unlock(&em_pd_mutex);
+ return -ENOMEM;
+ }
+
+ ret = dev_pm_opp_get_sharing_cpus(dev, span);
+ if (ret)
+ goto free_cpumask;

That I think should be changed. This creates some dependency on PM_OPP
for the EM framework. And in fact, the reason we came up with PM_EM was
precisely to not depend on PM_OPP which was deemed too Arm-specific.

Suggested alternative: have two registration functions like so:

int em_register_dev_pd(struct device *dev, unsigned int nr_states,
struct em_data_callback *cb);
int em_register_cpu_pd(cpumask_t *span, unsigned int nr_states,
struct em_data_callback *cb);

Interesting, in the internal review Dietmar asked me to remove these two
functions. I had the same idea, which would simplify a bit the
registration and it does not need to check the dev->bus if it is CPU.

Unfortunately, we would need also two function in drivers/opp/of.c:
dev_pm_opp_of_register_cpu_em(policy->cpus);
and
dev_pm_opp_of_register_dev_em(dev);

Thus, I have created only one registration function, which you can see
in this patch set.

What do you think Dietmar?


where em_register_cpu_pd() does the CPU-specific work and then calls
em_register_dev_pd() (instead of having that big if (_is_cpu_device(dev))
as you currently have). Would that work ?

Yes, I think you made a good point with this OPP dependency, which we
could avoid when we implement these two registration functions.


Another possibility would be to query CPUFreq instead of PM_OPP to get
the mask, but I'd need to look again at the driver registration path in
CPUFreq to see if the policy masks have been populated when we enter
PM_EM ... I am not sure if this is the case, but it's worth having a
look too.

The policy mask is populated, our registration function is called at
the end of the init code of CPUfreq drivers. I will check this option.


Thanks,
Quentin


Thank you for your comments.

Regards,
Lukasz