Re: [RFC][PATCH] sched: Start stopper early

From: Peter Zijlstra
Date: Wed Oct 07 2015 - 08:39:03 EST


On Wed, Oct 07, 2015 at 02:30:46PM +0200, Oleg Nesterov wrote:
> On 10/07, Peter Zijlstra wrote:
> >
> > So Heiko reported some 'interesting' fail where stop_two_cpus() got
> > stuck in multi_cpu_stop() with one cpu waiting for another that never
> > happens.
> >
> > It _looks_ like the 'other' cpu isn't running and the current best
> > theory is that we race on cpu-up and get the stop_two_cpus() call in
> > before the stopper task is running.
> >
> > This _is_ possible because we set 'online && active'
>
> Argh. Can't really comment this change right now, but this reminds me
> that stop_two_cpus() path should not rely on cpu_active() at all. I mean
> we should not use this check to avoid the deadlock, migrate_swap_stop()
> can check it itself. And cpu_stop_park()->cpu_stop_signal_done() should
> be replaced by BUG_ON().
>
> Probably slightly off-topic, but what do you finally think about the old
> "[PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double_lock/unlock()"
> we discussed in http://marc.info/?t=143750670300014 ?
>
> I won't really insist if you still dislike it, but it seems we both
> agree that "lg_lock stop_cpus_lock" must die in any case, and after that
> we can the cleanups mentioned above.

Yes, I was looking at that, this issue reminded me we still had that
issue open.

> And, Peter, I see a lot of interesting emails from you, but currently
> can't even read them. I hope very much I will read them later and perhaps
> even reply ;)

Sure, take your time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/