Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

From: Saravana Kannan
Date: Fri Jul 11 2014 - 22:45:21 EST


On 07/11/2014 03:52 AM, Viresh Kumar wrote:

Just responding to one comment. The one about policy->cpu.


diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy)
{
- unsigned int j;
+ unsigned int j, first_cpu = cpumask_first(policy->related_cpus);
int ret = 0;

- for_each_cpu(j, policy->cpus) {
+ for_each_cpu(j, policy->related_cpus) {
struct device *cpu_dev;

- if (j == policy->cpu)
+ if (j == first_cpu)

why?

The first CPU is a cluster always own the real nodes.

What I meant was, why not use policy->cpu?

+static int cpufreq_add_dev_interface(struct cpufreq_policy *policy)
{
struct freq_attr **drv_attr;
+ struct device *dev;
int ret = 0;

+ dev = get_cpu_device(cpumask_first(policy->related_cpus));
+ if (!dev)
+ return -EINVAL;
+

Why?

I'm just always adding the real nodes to the first CPU in a cluster
independent of which CPU gets added first. Makes it easier to know which
ones to symlink. See comment next to policy->cpu for full context.

Yeah, and that is the order in which CPUs will boot and cpufreq_add_dev()
will be called. So, isn't policy->cpu the right CPU always?

No, the "first" cpu in a cluster doesn't need to be the first one to be added. An example is 2x2 cluster system where the system is booted with max cpus = 2 and then cpu3 could be onlined first by userspace.


- if (has_target()) {
+ cpus = cpumask_weight(policy->cpus);
+ policy->cpu = cpumask_first(policy->cpus);

why update it at all? Also, as per your logic what if cpus == 0?

Yeah, I didn't write it this way at first. But the governors are making
the assumption that policy->cpu is always an online CPU. So, they try to

Are you sure? I had a quick look and failed to see that..

queue work there and use data structs of that CPU (even if they free it in
the STOP event since it went offline).

So, it queues work on all policy->cpus, not policy->cpu.
And the data structures
are just allocated with a CPU number, its fine if its offline.

And where are we freeing that stuff in STOP ?

Sorry if I am really really tired and couldn't read it correctly.

Yeah, it is pretty convolution. But pretty much anywhere in the gov code where policy->cpu is used could cause this. The specific crash I hit was in this code:

static void od_dbs_timer(struct work_struct *work)
{
struct od_cpu_dbs_info_s *dbs_info =
container_of(work, struct od_cpu_dbs_info_s, cdbs.work.work);
unsigned int cpu = dbs_info->cdbs.cur_policy->cpu;

======= CPU is policy->cpu here.

struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info,
cpu);

======= Picks the per CPU struct of an offline CPU

<snip>

mutex_lock(&core_dbs_info->cdbs.timer_mutex);

======= Dies trying to lock a destroyed mutex


Another option is to leave policy->cpu unchanged and then fix all the
governors. But this patch would get even more complicated. So, we can
leave this as is, or fix that up in a separate patch.

Since we are simplifying it here, I think we should NOT change policy->cpu
at all. It will make life simple (probably).

I agree, but then I would have to fix up the governors. In the interest of keeping this patch small. I'll continue with what I'm doing and fix it up in another patch.

-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/