Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offlinefrom atomic context

From: Srivatsa S. Bhat
Date: Wed Dec 12 2012 - 14:14:20 EST


On 12/13/2012 12:18 AM, Oleg Nesterov wrote:
> On 12/13, Srivatsa S. Bhat wrote:
>>
>> On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
>>> And _perhaps_ get_ can avoid it too?
>>>
>>> I didn't really try to think, probably this is not right, but can't
>>> something like this work?
>>>
>>> #define XXXX (1 << 16)
>>> #define MASK (XXXX -1)
>>>
>>> void get_online_cpus_atomic(void)
>>> {
>>> preempt_disable();
>>>
>>> // only for writer
>>> __this_cpu_add(reader_percpu_refcnt, XXXX);
>>>
>>> if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
>>> __this_cpu_inc(reader_percpu_refcnt);
>>> } else {
>>> smp_wmb();
>>> if (writer_active()) {
>>> ...
>>> }
>>> }
>>>
>>> __this_cpu_dec(reader_percpu_refcnt, XXXX);
>>> }
>>>
>>
>> Sorry, may be I'm too blind to see, but I didn't understand the logic
>> of how the mask helps us avoid disabling interrupts..
>
> Why do we need cli/sti at all? We should prevent the following race:
>
> - the writer already holds hotplug_rwlock, so get_ must not
> succeed.
>
> - the new reader comes, it increments reader_percpu_refcnt,
> but before it checks writer_active() ...
>
> - irq handler does get_online_cpus_atomic() and sees
> reader_nested_percpu() == T, so it simply increments
> reader_percpu_refcnt and succeeds.
>
> OTOH, why do we need to increment reader_percpu_refcnt the counter
> in advance? To ensure that either we see writer_active() or the
> writer should see reader_percpu_refcnt != 0 (and that is why they
> should write/read in reverse order).
>
> The code above tries to avoid this race using the lower 16 bits
> as a "nested-counter", and the upper bits to avoid the race with
> the writer.
>
> // only for writer
> __this_cpu_add(reader_percpu_refcnt, XXXX);
>
> If irq comes and does get_online_cpus_atomic(), it won't be confused
> by __this_cpu_add(XXXX), it will check the lower bits and switch to
> the "slow path".
>

This is a very clever scheme indeed! :-) Thanks a lot for explaining
it in detail.

>
> But once again, so far I didn't really try to think. It is quite
> possible I missed something.
>

Even I don't spot anything wrong with it. But I'll give it some more
thought..

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/