Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPUoffline from atomic context

From: Oleg Nesterov
Date: Mon Dec 10 2012 - 12:24:05 EST


On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 01:52 AM, Oleg Nesterov wrote:
> > On 12/10, Srivatsa S. Bhat wrote:
> >>
> >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> >>
> >>> But yes, it is easy to blame somebody else's code ;) And I can't suggest
> >>> something better at least right now. If I understand correctly, we can not
> >>> use, say, synchronize_sched() in _cpu_down() path
> >>
> >> We can't sleep in that code.. so that's a no-go.
> >
> > But we can?
> >
> > Note that I meant _cpu_down(), not get_online_cpus_atomic() or take_cpu_down().
> >
>
> Maybe I'm missing something, but how would it help if we did a
> synchronize_sched() so early (in _cpu_down())? Another bunch of preempt_disable()
> sections could start immediately after our call to synchronize_sched() no?
> How would we deal with that?

Sorry for confusion. Of course synchronize_sched() alone is not enough.
But we can use it to synchronize with preempt-disabled section and avoid
the barriers/atomic in the fast-path.

For example,

bool writer_pending;
DEFINE_RWLOCK(writer_rwlock);
DEFINE_PER_CPU(int, reader_ctr);

void get_online_cpus_atomic(void)
{
preempt_disable();

if (likely(!writer_pending) || __this_cpu_read(reader_ctr)) {
__this_cpu_inc(reader_ctr);
return;
}

read_lock(&writer_rwlock);
__this_cpu_inc(reader_ctr);
read_unlock(&writer_rwlock);
}

// lacks release semantics, but we don't care
void put_online_cpus_atomic(void)
{
__this_cpu_dec(reader_ctr);
preempt_enable();
}

Now, _cpu_down() does

writer_pending = true;
synchronize_sched();

before stop_one_cpu(). When synchronize_sched() returns, we know that
every get_online_cpus_atomic() must see writer_pending == T. And, if
any CPU incremented its reader_ctr we must see it is not zero.

take_cpu_down() does

write_lock(&writer_rwlock);

for_each_online_cpu(cpu) {
while (per_cpu(reader_ctr, cpu))
cpu_relax();
}

and takes the lock.

However. This can lead to the deadlock we already discussed. So
take_cpu_down() should do

retry:
write_lock(&writer_rwlock);

for_each_online_cpu(cpu) {
if (per_cpu(reader_ctr, cpu)) {
write_unlock(&writer_rwlock);
goto retry;
}
}

to take the lock. But this is livelockable. However, I do not think it
is possible to avoid the livelock.

Just in case, the code above is only for illustration, perhaps it is not
100% correct and perhaps we can do it better. cpu_hotplug.active_writer
is ignored for simplicity, get/put should check current == active_writer.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/