Re: softlockups in multi_cpu_stop

From: Davidlohr Bueso
Date: Fri Mar 06 2015 - 22:41:49 EST


On Sat, 2015-03-07 at 11:19 +0800, Ming Lei wrote:
> On Sat, Mar 7, 2015 at 11:10 AM, Davidlohr Bueso <dave@xxxxxxxxxxxx> wrote:
> > On Sat, 2015-03-07 at 10:55 +0800, Ming Lei wrote:
> >> On Sat, Mar 7, 2015 at 10:29 AM, Davidlohr Bueso <dave@xxxxxxxxxxxx> wrote:
> >> > On Fri, 2015-03-06 at 18:26 -0800, Davidlohr Bueso wrote:
> >> >> That's not what this is about. New lock _owners_ need to worry about
> >> > ^^^ make that "need not"
> >>
> >> Sorry, could you explain a bit why new owner can't be scheduled
> >> out(on_cpu becomes zero)? If that is possible, it still can cause
> >> soft lockup like current problem.
> >
> > Oh its not that it can't be scheduled out. The point is we don't care
> > what happens with the lock owner itself (new or not). We care about, and
> > the point of this discussion, how _other_ threads handle themselves when
> > trying to take that lock (a lock having an owner implies the lock is not
> > free, of course). So if a lock owner gets scheduled out... so what?
> > That's already taken into account by spinners.
>
> Not exactly, current problem is just in spinner because it
> ignores scheduled out owner and continues to spin, then
> cause lockup, isn't it?

Exactly my point, Ming. It's the _spinner_ that has the problem, hence
the fix in the part of the code that must decide just that. By the time
we're doing this:

if (READ_ONCE(sem->owner))
return true; /* new owner, continue spinning */

We need to have already taken into account the owner->on_cpu situation.
We fix spinners, not lock owners.

I'm really running out of ways to explain this, and you are going in
circles, which is getting annoying given that you haven't even tried the
other patch.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/