Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

From: Viresh Kumar
Date: Mon Jun 01 2015 - 03:19:52 EST


On 01-06-15, 01:40, Preeti U Murthy wrote:

I have to mention that this is somewhat inspired by:

https://git.linaro.org/people/viresh.kumar/linux.git/commit/1e37f1d6ae12f5896e4e216f986762c3050129a5

and I was waiting to finish some core-changes to make all this simple.

I am fine to you trying to finish it though :)

> The problem showed up when running hotplug operations and changing
> governors in parallel. The crash would be at:
>
> [ 174.319645] Unable to handle kernel paging request for data at address 0x00000000
> [ 174.319782] Faulting instruction address: 0xc00000000053b3e0
> cpu 0x1: Vector: 300 (Data Access) at [c000000003fdb870]
> pc: c00000000053b3e0: __bitmap_weight+0x70/0x100
> lr: c00000000085a338: need_load_eval+0x38/0xf0
> sp: c000000003fdbaf0
> msr: 9000000100009033
> dar: 0
> dsisr: 40000000
> current = 0xc000000003151a40
> paca = 0xc000000007da0980 softe: 0 irq_happened: 0x01
> pid = 842, comm = kworker/1:2
> enter ? for help
> [c000000003fdbb40] c00000000085a338 need_load_eval+0x38/0xf0
> [c000000003fdbb70] c000000000856a10 od_dbs_timer+0x90/0x1e0
> [c000000003fdbbe0] c0000000000f489c process_one_work+0x24c/0x910
> [c000000003fdbc90] c0000000000f50dc worker_thread+0x17c/0x540
> [c000000003fdbd20] c0000000000fed70 kthread+0x120/0x140
> [c000000003fdbe30] c000000000009678 ret_from_kernel_thread+0x5c/0x64
>
> While debugging the issue, other problems in this area were uncovered,
> all of them necessitating serialized calls to __cpufreq_governor(). One
> potential race condition that can happen today is the below:
>
> CPU0 CPU1
>
> cpufreq_set_policy()
>
> __cpufreq_governor
> (CPUFREQ_GOV_POLICY_EXIT)
> __cpufreq_remove_dev_finish()
>
> free(dbs_data) __cpufreq_governor
> (CPUFRQ_GOV_START)
>
> dbs_data->mutex <= NULL dereference
>
> The issue here is that calls to cpufreq_governor_dbs() is not serialized
> and they can conflict with each other in numerous ways. One way to sort
> this out would be to serialize all calls to cpufreq_governor_dbs()
> by setting the governor busy if a call is in progress and
> blocking all other calls. But this approach will not cover all loop
> holes. Take the above scenario: CPU1 will still hit a NULL dereference if
> care is not taken to check for a NULL dbs_data.
>
> To sort such scenarios, we could filter out the sequence of events: A
> CPUFREQ_GOV_START cannot be called without an INIT, if the previous
> event was an EXIT. However this results in analysing all possible
> sequence of events and adding each of them as a filter. This results in
> unmanagable code. There is high probability of missing out on a race
> condition. Both the above approaches were tried out earlier [1]

I agree.

> Let us therefore look at the heart of the issue.

Yeah, we should :)

> It is not really about
> serializing calls to cpufreq_governor_dbs(), it seems to be about
> serializing entire sequence of CPUFREQ_GOV* operations. For instance, in
> cpufreq_set_policy(), we STOP,EXIT the old policy and INIT and START the
> new policy. Between the EXIT and INIT, there must not be
> anybody else starting the policy. And between INIT and START, there must
> be nobody stopping the policy.

Hmm..

> A similar argument holds for the CPUFREQ_GOV* operations in
> __cpufreq_policy_dev_{prepare|finish} and cpufreq_add_policy(). Hence
> until each of these functions complete in totality, none of the others
> should run in parallel. The interleaving of the individual calls to
> cpufreq_governor_dbs() is resulting in invalid operations. This patch
> therefore tries to serialize entire cpufreq functions calling CPUFREQ_GOV*
> operations, with respect to each other.

We were forced to put band-aids until this time and I am really
looking into getting this fixed at the root.

The problem is that we drop policy locks before calling
__cpufreq_governor() and that's the root cause of all these problems
we are facing. We did that because we were getting warnings about
circular locks (955ef4833574 ("cpufreq: Drop rwsem lock around
CPUFREQ_GOV_POLICY_EXIT"))..

I have explained that problem here (Never sent this upstream, as I was
waiting for some other patches to get included first):
https://git.linaro.org/people/viresh.kumar/linux.git/commit/57714d5b1778f2f610bcc5c74d85b29ba1cc1995

The actual problem was:
If we hold any locks, that the attribute operations grab, when
removing the attribute, then it can result in a ABBA deadlock.

show()/store() holds the policy->rwsem lock while accessing any sysfs
attributes under cpu/cpuX/cpufreq/ directory.

But something like what I have done is the real way to tackle all
these problems.

These band-aid wouldn't take us anywhere.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/