Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offlinefrom atomic context

From: Srivatsa S. Bhat
Date: Thu Dec 13 2012 - 10:27:31 EST


On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
> On 12/13/2012 12:18 AM, Oleg Nesterov wrote:
>> On 12/13, Srivatsa S. Bhat wrote:
>>>
>>> On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
>>>> And _perhaps_ get_ can avoid it too?
>>>>
>>>> I didn't really try to think, probably this is not right, but can't
>>>> something like this work?
>>>>
>>>> #define XXXX (1 << 16)
>>>> #define MASK (XXXX -1)
>>>>
>>>> void get_online_cpus_atomic(void)
>>>> {
>>>> preempt_disable();
>>>>
>>>> // only for writer
>>>> __this_cpu_add(reader_percpu_refcnt, XXXX);
>>>>
>>>> if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
>>>> __this_cpu_inc(reader_percpu_refcnt);
>>>> } else {
>>>> smp_wmb();
>>>> if (writer_active()) {
>>>> ...
>>>> }
>>>> }
>>>>
>>>> __this_cpu_dec(reader_percpu_refcnt, XXXX);
>>>> }
>>>>
>>>
>>> Sorry, may be I'm too blind to see, but I didn't understand the logic
>>> of how the mask helps us avoid disabling interrupts..
>>
>> Why do we need cli/sti at all? We should prevent the following race:
>>
>> - the writer already holds hotplug_rwlock, so get_ must not
>> succeed.
>>
>> - the new reader comes, it increments reader_percpu_refcnt,
>> but before it checks writer_active() ...
>>
>> - irq handler does get_online_cpus_atomic() and sees
>> reader_nested_percpu() == T, so it simply increments
>> reader_percpu_refcnt and succeeds.
>>
>> OTOH, why do we need to increment reader_percpu_refcnt the counter
>> in advance? To ensure that either we see writer_active() or the
>> writer should see reader_percpu_refcnt != 0 (and that is why they
>> should write/read in reverse order).
>>
>> The code above tries to avoid this race using the lower 16 bits
>> as a "nested-counter", and the upper bits to avoid the race with
>> the writer.
>>
>> // only for writer
>> __this_cpu_add(reader_percpu_refcnt, XXXX);
>>
>> If irq comes and does get_online_cpus_atomic(), it won't be confused
>> by __this_cpu_add(XXXX), it will check the lower bits and switch to
>> the "slow path".
>>
>
> This is a very clever scheme indeed! :-) Thanks a lot for explaining
> it in detail.
>
>>
>> But once again, so far I didn't really try to think. It is quite
>> possible I missed something.
>>
>
> Even I don't spot anything wrong with it. But I'll give it some more
> thought..

Since an interrupt handler can also run get_online_cpus_atomic(), we
cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
right?

To maintain the integrity of the update itself, we will have to use the
this_cpu_* variant, which basically plays spoil-sport on this whole
scheme... :-(

But still, this scheme is better, because the reader doesn't have to spin
on the read_lock() with interrupts disabled. That way, interrupt handlers
that are not hotplug readers can continue to run on this CPU while taking
another CPU offline.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/