Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock

From: Srivatsa S. Bhat
Date: Wed Jan 22 2014 - 04:21:33 EST

Next message: Mika Westerberg: "Re: [PATCH] pwm: add support for Intel Low Power Subsystem PWM"
Previous message: David Laight: "RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur withina USB payload burst [NEW HARDWARE]"
In reply to: Srivatsa S. Bhat: "Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock"
Next in thread: Oleg Nesterov: "Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote:
> Hi Paul,
>
> On 01/22/2014 11:22 AM, Paul Mackerras wrote:
>> This arises out of a report from a tester that offlining a CPU never
>> finished on a system they were testing. This was on a POWER8 running
>> a 3.10.x kernel, but the issue is still present in mainline AFAICS.
>>
>> What I found when I looked at the system was this:
>>
>> * There was a ppc64_cpu process stuck inside cpu_hotplug_begin(),
>> called from _cpu_down(), from cpu_down(). This process was holding
>> the cpu_add_remove_lock mutex, since cpu_down() calls
>> cpu_maps_update_begin() before calling _cpu_down(). It was stuck
>> there because cpu_hotplug.refcount == 1.
>>
>> * There was a mdadm process trying to acquire the cpu_add_remove_lock
>> mutex inside register_cpu_notifier(), called from
>> raid5_alloc_percpu() in drivers/md/raid5.c. That process had
>> previously called get_online_cpus, which is why cpu_hotplug.refcount
>> was 1.
>>
>> Result: deadlock.
>>
>> Thus it seems that the following code is not safe:
>>
>> get_online_cpus();
>> register_cpu_notifier(&...);
>> put_online_cpus();
>>
>
> Yes, this is a known problem, and I had proposed an elaborate solution
> some time ago: https://lkml.org/lkml/2012/3/1/39
> But that won't work for all cases, so that solution is a no-go.
>

Wait a min, that _will_ actually work for all cases because I have provided
an option to invoke _any_ arbitrary function as the "setup" routine.

So, taking the example of raid5 that you mentioned below, instead of doing
this:

static int raid5_alloc_percpu()
{
...
get_online_cpus();
for_each_present_cpu(cpu) {
...
}
...
register_cpu_notifier();
put_online_cpus();
}

We can do this:

void func()
{
for_each_present_cpu(cpu) {
...
}
...
}

static int raid5_alloc_percpu()
{
...
register_allcpu_notifier(..., true, func);
}

The other, simpler alternative fix is to use cpu_hotplug_disable/enable()
in place of get/put_online_cpus() around the callback registration code.
Something like this:

cpu_hotplug_disable();
register_cpu_notifier();
cpu_hotplug_enable();

But the problem with this is that a hotplug operation that tries to run
concurrently with this will get a -EBUSY, which is kinda undesirable.
Also, this will only synchronize with hotplug operations initiated via
calls to cpu_up/down() (such as those that are initiated by writing to the
online file in sysfs). It won't synchronize with the hotplug operations
invoked by disable/enable_nonboot_cpus(), which by-pass cpu_up/down() and
directly call _cpu_up/down() by ignoring the cpu_hotplug_disabled flag.

The latter is a more controlled environment though, since its mostly used
by the suspend/hibernate code, in a state where the entire userspace is
frozen. So it might not be that bad.

Regards,
Srivatsa S. Bhat

> If we forget the CPU_POST_DEAD stage for a moment, we can just replace the
> calls to cpu_maps_update_begin/done() with get/put_online_cpus() in both
> register_cpu_notifier() as well as unregister_cpu_notifier(). After all,
> the callback registration code needs to synchronize only with the actual
> hotplug operations, and not the update of cpu-maps. So they don't really
> need to acquire the cpu_add_remove_lock.
>
> However, CPU_POST_DEAD notifications run with the hotplug lock dropped.
> So we can't simply replace cpu_add_remove_lock with hotplug lock in the
> registration routines, because notifier invocations and notifier registration
> needs to be synchronized.
>
> Hmm...
>
>> There are a few different places that do that sort of thing; besides
>> drivers/md/raid5.c, there are instances in arch/x86/kernel/cpu,
>> arch/x86/oprofile, drivers/cpufreq/acpi-cpufreq.c,
>> drivers/oprofile/nmi_timer_int.c and kernel/trace/ring_buffer.c.
>>
>> My question is this: is it reasonable to call register_cpu_notifier
>> inside a get/put_online_cpus block?
>
> Ideally, we would want that to work. Because there is no other race-free
> way of registering a notifier.
>
>> If so, the deadlock needs to be
>> fixed; if not, the callers need to be fixed, and the restriction
>> should be documented.
>
> Fixing the callers is a last resort. I'm thinking of ways to fix the
> deadlock itself, and allow the callers to call register_cpu_notifier
> within a get/put_online_cpus() block...
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mika Westerberg: "Re: [PATCH] pwm: add support for Intel Low Power Subsystem PWM"
Previous message: David Laight: "RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur withina USB payload burst [NEW HARDWARE]"
In reply to: Srivatsa S. Bhat: "Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock"
Next in thread: Oleg Nesterov: "Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]