Re: [PATCH v3 2/2] cpufreq: Simplify and fix mutual exclusion with hotplug

From: Saravana Kannan
Date: Wed Jul 16 2014 - 15:34:31 EST


On 07/16/2014 01:48 AM, Viresh Kumar wrote:
On 16 July 2014 04:17, Saravana Kannan <skannan@xxxxxxxxxxxxxx> wrote:

Again, just too many things in a single patch. That's not acceptable.
Few of these might be bug fixes, which must go in before any other updates.
And so it must have been added as first patch.

Even the other stuff you are trying to fix (by checking policy->cpus) should go
before 1/2, otherwise 1/2 will actually break things inbetween, i.e. show values
even when no CPUs of a cluster are online.

Well, it's no worse that what it does today. The existing code actually causes a crash when you try to show while hotplugging a CPU. I'm keeping the 1/2 as small as possible. You clearly want to smaller, so I don't want to add this to that.

Also, the current add/remove path is complicated with many cases. So, I'm not comfortable saying I'm sure policy->cpus check would be sufficient. I'm willing to throw out this change if you think this is still wrong when it comes after 1/2.

Since we no longer alloc and destroy/freeze policy and sysfs nodes during
hotplug and suspend, we don't need to lock sysfs with hotplug. We can
achieve the same effect by checking if policy->cpus is empty.

Are you talking about the changes in store()?

Yes.


Hotplug mutual exclusion was only done for sysfs writes. But reads need the
same protection too. So, this patch adds that too.

How? How is checking for policy->cpus enough?

Because when all the CPUs in a policy are hotplugged off, the policy->cpus would be empty? So, it's functionally the same without having to get hotplug lock. This way, CPUs of other policies could be hotplugged while your are show/store on one policy.

But I'm sure you already understood this. So, not sure what you are really asking.


Also, cpufreq driver (un)register can race with hotplug since CPU online
state can change between adding/removing the currently online devices and
registering/unregistering for hotplug notifiers. So, fix that by
registering for hotplug notifiers first before adding devices and
unregistering from hotplug notifiers first before removing devices.

Couldn't get it, tell us an example race and what will go wrong due to it.
Also this should have had a separate patch for itself.

I assumed we go a lot of down_write()s and that would cause a down_read_trylock() to fail. But we really do that only for cpufreq driver register/unregister. So, my previous statement is not really a very useful/common.

But I do hate that we do "trylock". It always makes one wonder if it will silently fail (since we return NULL, which is same as policy with "offline" policy). Technically, we could do down_read(), but lockdep is throwing warnings when it's really not an issue (doing down read twice). So, I'm guessing all these trylocks are just to keep lockdep happy?



Signed-off-by: Saravana Kannan <skannan@xxxxxxxxxxxxxx>
---
drivers/cpufreq/cpufreq.c | 44 ++++++++++++++++++++------------------------
1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index a0a2ec2..f72b2b7 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -748,17 +748,18 @@ static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)
{
struct cpufreq_policy *policy = to_policy(kobj);
struct freq_attr *fattr = to_attr(attr);
- ssize_t ret;
+ ssize_t ret = -EINVAL;

if (!down_read_trylock(&cpufreq_rwsem))
- return -EINVAL;
-
+ return ret;
down_read(&policy->rwsem);

- if (fattr->show)
- ret = fattr->show(policy, buf);
- else
- ret = -EIO;
+ if (!cpumask_empty(policy->cpus)) {
+ if (fattr->show)
+ ret = fattr->show(policy, buf);
+ else
+ ret = -EIO;
+ }

Makes sense upto this point.

up_read(&policy->rwsem);
up_read(&cpufreq_rwsem);
@@ -773,26 +774,19 @@ static ssize_t store(struct kobject *kobj, struct attribute *attr,
struct freq_attr *fattr = to_attr(attr);
ssize_t ret = -EINVAL;

- get_online_cpus();
-
- if (!cpu_online(policy->cpu))
- goto unlock;
-

@Srivatsa: what do you say?

if (!down_read_trylock(&cpufreq_rwsem))
- goto unlock;
-
+ return ret;
down_write(&policy->rwsem);

- if (fattr->store)
- ret = fattr->store(policy, buf, count);
- else
- ret = -EIO;
+ if (!cpumask_empty(policy->cpus)) {
+ if (fattr->store)
+ ret = fattr->store(policy, buf, count);
+ else
+ ret = -EIO;
+ }

up_write(&policy->rwsem);
-
up_read(&cpufreq_rwsem);
-unlock:
- put_online_cpus();

return ret;
}
@@ -2270,6 +2264,8 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
}
}

+ register_hotcpu_notifier(&cpufreq_cpu_notifier);
+
ret = subsys_interface_register(&cpufreq_interface);
if (ret)
goto err_boost_unreg;
@@ -2293,13 +2289,13 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
}
}

- register_hotcpu_notifier(&cpufreq_cpu_notifier);
pr_debug("driver %s up and running\n", driver_data->name);

return 0;
err_if_unreg:
subsys_interface_unregister(&cpufreq_interface);
err_boost_unreg:
+ unregister_hotcpu_notifier(&cpufreq_cpu_notifier);
if (cpufreq_boost_supported())
cpufreq_sysfs_remove_file(&boost.attr);
err_null_driver:
@@ -2327,12 +2323,12 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver)

pr_debug("unregistering driver %s\n", driver->name);

+ unregister_hotcpu_notifier(&cpufreq_cpu_notifier);
+
subsys_interface_unregister(&cpufreq_interface);
if (cpufreq_boost_supported())
cpufreq_sysfs_remove_file(&boost.attr);

- unregister_hotcpu_notifier(&cpufreq_cpu_notifier);
-
down_write(&cpufreq_rwsem);
write_lock_irqsave(&cpufreq_driver_lock, flags);

Normally the order of register/unregister should be just opposite.
Isn't that true here? Yeah, it was broken earlier as well...

Generally agreed, but as explained in the commit text, we need to keep it this way to avoid races with hotplug/unregister.

-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/