Re: [RFC][PATCH] sched: Start stopper early

From: Oleg Nesterov
Date: Wed Oct 07 2015 - 08:34:12 EST


On 10/07, Peter Zijlstra wrote:
>
> So Heiko reported some 'interesting' fail where stop_two_cpus() got
> stuck in multi_cpu_stop() with one cpu waiting for another that never
> happens.
>
> It _looks_ like the 'other' cpu isn't running and the current best
> theory is that we race on cpu-up and get the stop_two_cpus() call in
> before the stopper task is running.
>
> This _is_ possible because we set 'online && active'

Argh. Can't really comment this change right now, but this reminds me
that stop_two_cpus() path should not rely on cpu_active() at all. I mean
we should not use this check to avoid the deadlock, migrate_swap_stop()
can check it itself. And cpu_stop_park()->cpu_stop_signal_done() should
be replaced by BUG_ON().

Probably slightly off-topic, but what do you finally think about the old
"[PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double_lock/unlock()"
we discussed in http://marc.info/?t=143750670300014 ?

I won't really insist if you still dislike it, but it seems we both
agree that "lg_lock stop_cpus_lock" must die in any case, and after that
we can the cleanups mentioned above.


And, Peter, I see a lot of interesting emails from you, but currently
can't even read them. I hope very much I will read them later and perhaps
even reply ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/