Re: [PATCH RFC/TEST] sched: make sync affine wakeups work

From: Rik van Riel
Date: Fri May 02 2014 - 02:08:45 EST


On 05/02/2014 01:58 AM, Mike Galbraith wrote:
> On Fri, 2014-05-02 at 07:32 +0200, Mike Galbraith wrote:
>> On Fri, 2014-05-02 at 00:42 -0400, Rik van Riel wrote:
>>> Currently sync wakeups from the wake_affine code cannot work as
>>> designed, because the task doing the sync wakeup from the target
>>> cpu will block its wakee from selecting that cpu.
>>>
>>> This is despite the fact that whether or not the wakeup is sync
>>> determines whether or not we want to do an affine wakeup...
>>
>> If the sync hint really did mean we ARE going to schedule RSN, waking
>> local would be a good thing. It is all too often a big fat lie.
>
> One example of that is say pgbench. The mother of all work (server
> thread) for that load wakes with sync hint. Let the server wake the
> first of a small herd CPU affine, and that first wakee then preempt the
> server (mother of all work) that drives the entire load.
>
> Byebye throughput.
>
> When there's only one wakee, and there's really not enough overlap to at
> least break even, waking CPU affine is a great idea. Even when your
> wakees only run for a short time, if you wake/get_preempted repeat, the
> load will serialize.

I see a similar issue with specjbb2013, with 4 backend and
4 frontend JVMs on a 4 node NUMA system.

The NUMA balancing code nicely places the memory of each JVM
on one NUMA node, but then the wake_affine code will happily
run all of the threads anywhere on the system, totally ruining
memory locality.

The front end and back end only exchange a few hundred messages
a second, over loopback tcp, so the switching rate between
threads is quite low...

I wonder if it would make sense for wake_affine to be off by
default, and only switch on when the right conditions are
detected, instead of having it on by default like we have now?

I have some ideas on that, but I should probably catch some
sleep before trying to code them up :)

Meanwhile, the test patch that I posted may help us figure out
whether the "sync" option in the current wake_affine code does
anything useful.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/