Re: sched: softlockups in multi_cpu_stop

From: Davidlohr Bueso
Date: Fri Mar 06 2015 - 14:20:56 EST


On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote:
> On Fri, Mar 6, 2015 at 10:57 AM, Jason Low <jason.low2@xxxxxx> wrote:
> >
> > Right, the can_spin_on_owner() was originally added to the mutex
> > spinning code for optimization purposes, particularly so that we can
> > avoid adding the spinner to the OSQ only to find that it doesn't need to
> > spin. This function needing to return a correct value should really only
> > affect performance, so yes, lockups due to this seems surprising.
>
> Well, softlockups aren't about "correct behavior". They are about
> certain things not happening in a timely manner.
>
> Clearly the mutex code now tries to hold on to the CPU too aggressively.

This patch was a performance "fix" for rwsems, where it works well
mutexes.

>
> At some point people need to admit that busy-looping isn't always a
> good idea. Especially if
>
> (a) we could idle the core instead
>
> (b) the tuning has been done based on som especial-purpose benchmark
> that is likely not realistic
>
> (c) we get reports from people that it causes problems.
>
> In other words: Let's just undo that excessive busy-looping. The
> performance numbers were dubious to begin with. Real scalability comes
> from fixing the locking, not from trying to play games with the locks
> themselves. Particularly games that then cause problems.

I obviously agree with all those points, however fyi most of the testing
on rwsems I do includes scaling address space ops stressing the
mmap_sem, which is a real world concern. So while it does include
microbenchmarks, it is not guided by them.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/