Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

From: Nick Piggin
Date: Thu Sep 11 2003 - 08:54:30 EST

Next message: Linus Torvalds: "Re: [PATCH] 2.6 workaround for Athlon/Opteron prefetch errata"
Previous message: viro: "Re: [RFC][PATCH] kmalloc + memset(foo, 0, bar) = kmalloc0"
In reply to: Andrew Theurer: "Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms"
Next in thread: Andrew Theurer: "Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Andrew Theurer wrote:

On Thursday 11 September 2003 06:04, Nick Piggin wrote:

Andrew Theurer wrote:

Robert Love <rml@xxxxxxxxx> wrote:

There are a _lot_ of scheduler changes in 2.6-mm, and who knows which
ones are an improvement, a detriment, and a noop?

We know that sched-2.6.0-test2-mm2-A3.patch caused the regression, and
we now that sched-CAN_MIGRATE_TASK-fix.patch mostly fixed it up.

What we don't know is whether the thing which
sched-CAN_MIGRATE_TASK-fix.patch
fixed was the thing which sched-2.6.0-test2-mm2-A3.patch broke.

Sorry for jumping into this late. I didn't even know the can_migrate
patch was being discussed, let alone in -mm :). And to be fair, this
really is Ingo's aggressive idle steal patch.

Anyway, these patches are somewhat related. It would seem that A3's
shortening the tasks' run time would not only slow performance beacuse of
cache thrash, but could possibly break CAN_MIGRATE's cache warmth check,
right? That in turn would stop load balancing from working well, leading
to more idle time, which the CAN_MIGRATE patch sort of bypassed for idle
cpus.

Yeah thats probably right. Good thinking.

I see Nick's balance patch as somewhat harmless, at least combined with A3
patch. However, one concern is that the "ping-pong" steal interval is not
really 200ms, but 200ms/(nr_cpus-1), which without A3, could show up as a
problem, especially on an 8 way box. In addition, I do think there's a
problem with num tasks we steal. It should not be imbalance/2, it should
be: max_load - (node_nr_running / num_cpus_node). If we steal any more
than this, which is quite possible with imbalance/2, then it's likely
this_cpu now has too many tasks, and some other cpu will steal again. Using *imbalance/2 works fine on 2-way smp, but I'm pretty sure we "over
steal" tasks on 4 way and up. Anyway, I'm getting off topic here...

IIRC max_load is supposed to be the number of tasks on the runqueue
being stolen from, isn't it?

Yes, but I think I still got this wrong. Ideally, once we finish stealing, the busiest runqueue should not have more than node_nr_runing/nr_cpus_node, but more importantly, this_cpu should not have more than node_nr_running/nr_cpus_node, so maybe it should be:

min(a,b) where
a = max_load - load_average How much we are over the load_average
b = load_average - this_load How much we are under the load_average
load_average = node_nr_runing / nr_cpus_node.
node_nr_running can be summed as we look for the busiest queue, so it should not be too costly.
if min(a,b) is neagtive (this_cpu's runqueue length was greater than load_average) we don't steal at all.

Oh OK you're thinking about balancing across the entire NUMA. I was just
thinking it will eventually settle down, but you're right: its probably
better to overdampen the balancing than to underdampen it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Linus Torvalds: "Re: [PATCH] 2.6 workaround for Athlon/Opteron prefetch errata"
Previous message: viro: "Re: [RFC][PATCH] kmalloc + memset(foo, 0, bar) = kmalloc0"
In reply to: Andrew Theurer: "Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms"
Next in thread: Andrew Theurer: "Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]