Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomicreaders to prevent CPU offline
From: Srivatsa S. Bhat
Date: Thu Dec 06 2012 - 14:18:57 EST
On 12/07/2012 12:18 AM, Srivatsa S. Bhat wrote:
> On 12/06/2012 09:48 PM, Oleg Nesterov wrote:
>> On 12/06, Srivatsa S. Bhat wrote:
>>>
>>> +void get_online_cpus_atomic(void)
>>> +{
>>> + int c, old;
>>> +
>>> + preempt_disable();
>>> + read_lock(&hotplug_rwlock);
>>
>> Confused... Why it also takes hotplug_rwlock?
>
> To avoid ABBA deadlocks.
>
> hotplug_rwlock was meant for the "light" readers.
> The atomic counters were meant for the "heavy/full" readers.
> I wanted them to be able to nest in any manner they wanted,
> such as:
>
> Full inside light:
>
> get_online_cpus_atomic_light()
> ...
> get_online_cpus_atomic_full()
> ...
> put_online_cpus_atomic_full()
> ...
> put_online_cpus_atomic_light()
>
> Or, light inside full:
>
> get_online_cpus_atomic_full()
> ...
> get_online_cpus_atomic_light()
> ...
> put_online_cpus_atomic_light()
> ...
> put_online_cpus_atomic_full()
>
> To allow this, I made the two sets of APIs take the locks
> in the same order internally.
>
> (I had some more description of this logic in the changelog
> of 2/10; the only difference there is that instead of atomic
> counters, I used rwlocks for the full-readers as well.
> https://lkml.org/lkml/2012/12/5/320)
>
One of the reasons why I changed everybody to global rwlocks
instead of per-cpu atomic counters was to avoid lock ordering
related deadlocks associated with per-cpu locking.
Eg:
CPU 0 CPU 1
------ ------
1. Acquire lock A Increment CPU1's
atomic counter
2. Increment CPU0's Try to acquire lock A
atomic counter
Now consider what happens if a hotplug writer (cpu_down) begins,
and starts looking at CPU0, to try to decrement its atomic counter,
in between steps 1 and 2.
The hotplug writer will be successful in CPU0 because CPU0 hasn't
yet incremented its counter. So, now CPU0 spins waiting for the
hotplug writer to reset the atomic counter again.
When the hotplug writer looks at CPU1, it can't decrement the
atomic counter because CPU1 has already incremented it. So the
hotplug writer waits. CPU1 goes ahead, only to start spinning on
lock A which was acquired by CPU0. So we end up in a deadlock due
to circular locking dependency between the 3 entities.
One way to deal with this would be that the writer should abort
its loop of trying to atomic_dec per-cpu counters, whenever it has
to wait. But that might prove to be too wasteful (and overly
paranoid) in practice.
So, instead, if we have global locks (like global rwlocks that I
finally used when posting out this v2), then we won't end up in
such messy issues.
And why exactly was I concerned about all this?
Because, the current code uses the extremely flexible preempt_disable()
/preempt_enable() pair which impose absolutely no ordering
restrictions. And probably the existing code _depends_ on that.
So if our new APIs that replace preempt_disable/enable start
imposing ordering restrictions, I feared that they might become
unusable. Hence I used global rwlocks, thereby trading cache-line
bouncing (due to global rwlocks) for lock-safety and flexibility.
Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/