Re: [patch v2 1/2] sched: check for prev_cpu == this_cpu beforecalling wake_affine()

From: Mike Galbraith
Date: Fri Apr 02 2010 - 02:20:53 EST


On Thu, 2010-04-01 at 14:04 -0700, Suresh Siddha wrote:

> Consider this scenario. Today we do balance on fork() and exec(). This
> will cause the tasks to start far away. On systems like NHM-EP, tasks
> will start on two different sockets/nodes(as each socket is a numa node)
> and allocate their memory locally etc. Task A starting on Node-0 and
> Task B starting on Node-1. Once task B sleeps and if Task A or something
> else wakes up task B on Node-0, (with the recent change) just because
> there is an idle HT sibling on node-0 we endup waking the task on
> node-0. This is wrong. We should first atleast go through wake_affine()
> and if wake_affine() says ok to move the task to node-0, then we can
> look at the cache siblings for node-0 and select an appropriate cpu.

Yes, if task A and task B are more or less unrelated, you'd want them to
stay in separate domains, you'd not want some random event to pull. The
other side of the coin is tasks which fork off partners that they will
talk to at high frequency. They land just as far away, and desperately
need to move into a shared cache domain. There's currently no
discriminator, so while always asking wake_affine() may reduce the risk
of moving a task with a large footprint, it also increases the risk of
leaving buddies jabbering cross cache. You can tweak it in either
direction, and neither can be called "wrong", it's all compromise.

Do you have a compute load bouncing painfully which this patch cures?

I have no strong objections, and the result is certainly easier on the
eye. If I were making the decision, I'd want to see some numbers.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/