[PATCH] PM / EM: consult something about cpumask in em_dev_register_perf_domain

From: zhuguangqing83
Date: Mon Oct 12 2020 - 08:42:24 EST


From: zhuguangqing <zhuguangqing@xxxxxxxxxx>

Hi, Lukasz, Quentin
I have three questions to consult about cpumask in energy_model.c.

1, The first one is about the meanings of the following two parameters,
[1] and [2].
[1]: "cpumask_t *cpus" in function em_dev_register_perf_domain(): Pointer
to cpumask_t, which in case of a CPU device is obligatory. It can be taken
from i.e. 'policy->cpus'.
[2]: "unsigned long cpus[]" in struct em_perf_domain: Cpumask covering the
CPUs of the domain. It's here for performance reasons to avoid potential
cache misses during energy calculations in the scheduler and simplifies
allocating/freeing that memory region.

>From the current code, we see [2] is copied from [1]. But from comments,
accorinding to my understanding, [2] and [1] have different meanings.
[1] can be taken from i.e. 'policy->cpus', according to the comment in the
defination of struct cpufreq_policy, it means Online CPUs only. Actually,
'policy->cpus' is not always Online CPUs.
[2] means each_possible_cpus in the same domain, including phycical
hotplug cpus(just possible), logically hotplug cpus(just present) and
online cpus.

So, the first question is, what are the meanings of [1] and [2]?
I guess maybe there are the following 4 possible choices.
A), for_each_possible_cpu in the same domain, maybe phycical hotplug
B), for_each_present_cpu in the same domain, maybe logically hotplug
C), for_each_online_cpu in the same domain, online
D), others

2, The second question is about the function em_dev_register_perf_domain().
If multiple clients register the same performance domain with different
*dev or *cpus, how should we handle?

int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
struct em_data_callback *cb, cpumask_t *cpus)

For example, say cpu0 and cpu1 are in the same performance domain,
cpu0 is registered first. As part of the init process,
em_dev_register_perf_domain() is called, then *dev = cpu0_dev,
*cpus = 01b(cpu1 is initially offline). It creates a em_pd for cpu0_dev.
After a while, cpu1 is online, em_dev_register_perf_domain() is called
again as part of init process for cpu1, then *dev =cpu1_dev,
*cpus = 11b(cpu1 is online). In this case, for the current code,
cpu1_dev can not get its em_pd.

3, The third question is, how can we ensure cpu_dev as follows is not
NULL? If we can't ensure that, maybe we should add a check before using
it.
/kernel/power/energy_model.c
174) static int em_create_pd(struct device *dev, int nr_states,
175) struct em_data_callback *cb, cpumask_t *cpus)
176) {
199) if (_is_cpu_device(dev))
200) for_each_cpu(cpu, cpus) {
201) cpu_dev = get_cpu_device(cpu);
202) cpu_dev->em_pd = pd;
203) }

Signed-off-by: zhuguangqing <zhuguangqing@xxxxxxxxxx>
---
kernel/power/energy_model.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index c1ff7fa030ab..addf2f400184 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -199,7 +199,13 @@ static int em_create_pd(struct device *dev, int nr_states,
if (_is_cpu_device(dev))
for_each_cpu(cpu, cpus) {
cpu_dev = get_cpu_device(cpu);
- cpu_dev->em_pd = pd;
+ if (cpu_dev)
+ cpu_dev->em_pd = pd;
+ else {
+ cpumask_clear_cpu(cpu, em_span_cpus(pd));
+ dev_dbg(dev, "EM: failed to get cpu%d device\n",
+ cpu);
+ }
}

dev->em_pd = pd;
--
2.17.1