Re: [patch 08/13] ACPI/processor: Replace racy task affinity logic.
From: Peter Zijlstra
Date: Thu Apr 13 2017 - 07:40:07 EST
On Wed, Apr 12, 2017 at 10:07:34PM +0200, Thomas Gleixner wrote:
> acpi_processor_get_throttling() requires to invoke the getter function on
> the target CPU. This is achieved by temporarily setting the affinity of the
> calling user space thread to the requested CPU and reset it to the original
> affinity afterwards.
>
> That's racy vs. CPU hotplug and concurrent affinity settings for that
> thread resulting in code executing on the wrong CPU and overwriting the
> new affinity setting.
>
> acpi_processor_get_throttling() is invoked in two ways:
>
> 1) The CPU online callback, which is already running on the target CPU and
> obviously protected against hotplug and not affected by affinity
> settings.
>
> 2) The ACPI driver probe function, which is not protected against hotplug
> during modprobe.
>
> Switch it over to work_on_cpu() and protect the probe function against CPU
> hotplug.
>
> +static int acpi_processor_get_throttling(struct acpi_processor *pr)
> +{
> if (!pr)
> return -EINVAL;
>
> if (!pr->flags.throttling)
> return -ENODEV;
>
> + * This is either called from the CPU hotplug callback of
> + * processor_driver or via the ACPI probe function. In the latter
> + * case the CPU is not guaranteed to be online. Both call sites are
> + * protected against CPU hotplug.
> */
> + if (!cpu_online(pr->id))
> return -ENODEV;
>
> + return work_on_cpu(pr->id, __acpi_processor_get_throttling, pr);
> }
That makes my machine sad...
[ 9.583030] =============================================
[ 9.589053] [ INFO: possible recursive locking detected ]
[ 9.595079] 4.11.0-rc6-00385-g5aee78a-dirty #678 Not tainted
[ 9.601393] ---------------------------------------------
[ 9.607418] kworker/0:0/3 is trying to acquire lock:
[ 9.612954] ((&wfc.work)){+.+.+.}, at: [<ffffffff8110c172>] flush_work+0x12/0x2a0
[ 9.621406]
[ 9.621406] but task is already holding lock:
[ 9.627915] ((&wfc.work)){+.+.+.}, at: [<ffffffff8110df17>] process_one_work+0x1e7/0x670
[ 9.637044]
[ 9.637044] other info that might help us debug this:
[ 9.644330] Possible unsafe locking scenario:
[ 9.644330]
[ 9.650934] CPU0
[ 9.653660] ----
[ 9.656386] lock((&wfc.work));
[ 9.659987] lock((&wfc.work));
[ 9.663586]
[ 9.663586] *** DEADLOCK ***
[ 9.663586]
[ 9.670189] May be due to missing lock nesting notation
[ 9.670189]
[ 9.677765] 2 locks held by kworker/0:0/3:
[ 9.682332] #0: ("events"){.+.+.+}, at: [<ffffffff8110df17>] process_one_work+0x1e7/0x670
[ 9.691654] #1: ((&wfc.work)){+.+.+.}, at: [<ffffffff8110df17>] process_one_work+0x1e7/0x670
[ 9.701267]
[ 9.701267] stack backtrace:
[ 9.706127] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.11.0-rc6-00385-g5aee78a-dirty #678
[ 9.715545] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[ 9.726999] Workqueue: events work_for_cpu_fn
[ 9.731860] Call Trace:
[ 9.734591] dump_stack+0x86/0xcf
[ 9.738290] __lock_acquire+0x790/0x1620
[ 9.742667] ? __lock_acquire+0x4a5/0x1620
[ 9.747237] lock_acquire+0x100/0x210
[ 9.751319] ? lock_acquire+0x100/0x210
[ 9.755596] ? flush_work+0x12/0x2a0
[ 9.759583] flush_work+0x47/0x2a0
[ 9.763375] ? flush_work+0x12/0x2a0
[ 9.767362] ? queue_work_on+0x47/0xa0
[ 9.771545] ? __this_cpu_preempt_check+0x13/0x20
[ 9.776792] ? trace_hardirqs_on_caller+0xfb/0x1d0
[ 9.782139] ? trace_hardirqs_on+0xd/0x10
[ 9.786610] work_on_cpu+0x82/0x90
[ 9.790404] ? __usermodehelper_disable+0x110/0x110
[ 9.795846] ? __acpi_processor_get_throttling+0x20/0x20
[ 9.801773] acpi_processor_set_throttling+0x199/0x220
[ 9.807506] ? trace_hardirqs_on_caller+0xfb/0x1d0
[ 9.812851] acpi_processor_get_throttling_ptc+0xec/0x180
[ 9.818876] __acpi_processor_get_throttling+0xf/0x20
[ 9.824511] work_for_cpu_fn+0x14/0x20
[ 9.828692] process_one_work+0x261/0x670
[ 9.833165] worker_thread+0x21b/0x3f0
[ 9.837348] kthread+0x108/0x140
[ 9.840947] ? process_one_work+0x670/0x670
[ 9.845611] ? kthread_create_on_node+0x40/0x40
[ 9.850667] ret_from_fork+0x31/0x40