Re: Locking issues with cpufreq and sysfs

From: Rafael J. Wysocki
Date: Mon Oct 13 2014 - 10:49:34 EST

Next message: Mark Brown: "Re: [PATCH v2 1/4] regulator: Add ena_gpio_initialized to regulator_config"
Previous message: Eric Dumazet: "Re: Regarding tx-nocache-copy in the Sheevaplug"
In reply to: Prarit Bhargava: "Re: Locking issues with cpufreq and sysfs"
Next in thread: Viresh Kumar: "Re: Locking issues with cpufreq and sysfs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Monday, October 13, 2014 09:22:49 AM Prarit Bhargava wrote:
>
> On 10/13/2014 09:11 AM, Prarit Bhargava wrote:
> > There are several issues with the current locking design of cpufreq. Most
> > notably is the panic reported here:
> >
> > http://marc.info/?l=linux-kernel&m=140622451625236&w=2
> >
> > which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
> > cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces
> > a race in the changing of the cpufreq policy. This change was introduced
> > because of a lockdep deadlock warning that can be reproduced (on x86 with
> > the acpi_cpufreq driver) via the following debug patch:
> >
> > iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> > index b0c18ed..366cfb7 100644
> > --- a/drivers/cpufreq/acpi-cpufreq.c
> > +++ b/drivers/cpufreq/acpi-cpufreq.c
> > @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = {
> >
> > static struct cpufreq_driver acpi_cpufreq_driver = {
> > .verify = cpufreq_generic_frequency_table_verify,
> > + .flags = CPUFREQ_HAVE_GOVERNOR_PER_POLICY,
> > .target_index = acpi_cpufreq_target,
> > .bios_limit = acpi_processor_get_bios_limit,
> > .init = acpi_cpufreq_cpu_init,
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 61190f6..4cb488a 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *polic
> > /* end old governor */
> > if (old_gov) {
> > __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
> > - up_write(&policy->rwsem);
> > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
> > - down_write(&policy->rwsem);
> > }
> >
> > /* start new governor */
> > @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *polic
> > if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
> > goto out;
> >
> > - up_write(&policy->rwsem);
> > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
> > - down_write(&policy->rwsem);
> > }
> >
> > /* new governor failed, so re-start old one */
> >
> > (which causes the acpi-cpufreq driver to emulate the behaviour of the arm
> > cpufreq driver), and by doing
> >
> > echo ondemand > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
> > cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/*
> > echo conservative > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
> > exit 0
> >
> > [Question: was the original reported deadlock "real"? Did it really happen or
> > did lockdep only report it (I may have asked this question previously and
> > forgot the answer)? The reason I ask is that this situation is very similar to
> > USB's device removal in which the sysfs attributes are removed for a device but
> > not the device it was called for. I actually think that's part of the problem
> > here.]
> >
> > The above, obviously, is a complete hack of the code but in a sense does
> > mimic a proper locking fix. However, even with this fix we are still left
> > with a race in accessing the sysfs files. Consider the following example,
> >
> > CPU 1: accesses scaling_setspeed to set cpu speed
> >
> > simultaneously,
> >
> > CPU 2: accesses scaling_governor to set governor to ondemand
> >
> > CPU 1 & 2 race ... and this can result in different critical situations.
> > The first is that CPU 1 holds the scalling_setspeed open while CPU attempts
> > to change the governor. This results in a syfs warning about creating a
> > file with an existing file name which in some cases can lead to additional
> > corruption and a panic. The second case is that CPU 1's setting of the speed
> > is now done on the new governor -- which may or may not be correct. In any
> > case an argument could be made that the userspace program doing this type
> > of action should be "smart" enough to confirm simultaneous changes... but
> > in any case the kernel should not panic or corrupt data.
> >
> > The locking is insufficient here, Viresh. I no longer believe that fixes
> > to this locking scheme are the right way to move forward here. I'm wondering
> > if we can look at other alternatives such as maintaining a refcount or
> > perhaps using a queuing mechanism for governor and policy related changes.
> >
>
> Uh ... I meant this as "I'm willing to modify the code to do this but I'd like
> to know what everyone else thinks before I do anything" ;)

OK, that's constructive. :-)

Can we discuss the target design first, please? You certainly have something
in mind, so can you describe it?

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mark Brown: "Re: [PATCH v2 1/4] regulator: Add ena_gpio_initialized to regulator_config"
Previous message: Eric Dumazet: "Re: Regarding tx-nocache-copy in the Sheevaplug"
In reply to: Prarit Bhargava: "Re: Locking issues with cpufreq and sysfs"
Next in thread: Viresh Kumar: "Re: Locking issues with cpufreq and sysfs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]